Re: Build system idea

28 Aug 2008

      On Thu, Aug 28, 2008 at 02:59:16PM +0100, Simon Marlow wrote:
...
The important thing about Cabal's way of specifying dependencies is that  
they can be made sound with not much difficulty.  If I say that my 
package depends on base==3.0 and network==1.0, then I can guarantee that 
as long as those dependencies are present then my package will build.  
("but but but..." I hear you say - don't touch that keyboard yet!)
I can easily achieve this with autoconf or even nothing, I can simply do
a test to see if a system is running fedora core 9 using ghc 6.8.2 and
be assured that my package will build properly. But this misses the
entire point, I want my package to build not on my exact system, I want
it to build on _other_ peoples systems. People running with compilers and
libraries and on operating systems I never heard of.

However, this has the huge flaw of requiring a closed universe. A
complete and universal definition of what 'network == 1.0' means for all
time that all future compilers must agree on. It places a huge burden on
implementors to provide a 'network=1.0' compatible interface, simply so
cabal doesn't complain even though all programs would be happy with a
jhc-network 0.7 or a internet-5.0 package. It means that with
jhc-network which has 90% of the functionality of network, including
everything that 99.9% of all programs need every program will have to
either know about jhc-network to edit their cabal file to include it
conditionally, or they just won't work at all.

Note, this is similar to the problem of symbol versioning placed on
shared libraries. There is a fair amount of literature on the subject,
most unix's .so's used to have something similar to the current cabal model, a
version number with a minor/major part. it was found to lead to dll
hell. (well, .so hell) and we don't want to be in the place with haskell
(package hell?). Linux hence switched to its current system that
has an individual version number for every api function. I am not saying
that is the solution for haskell, but I do not see the current cabal
approach scaling any better than the old unix one and leading to the
same problems.
...
Suppose you used autoconf tests instead.  You might happen to know that  
Network.Socket.blah was added at some point and write a test for that, 
but alas if you didn't also write a test for Network.Socket.foo (which 
your code uses but ends up getting removed in network-1.1) then your code 
breaks.  Autoconf doesn't help you make your configuration sound, and you 
get no prior guarantee that your code will build.
And with cabal it breaks there in addition to another 80% of times when
it could have worked just fine. The autoconf feature test is strictly
superior here.
...
Now, Cabal's dependencies have the well-known problem that they're  
exceptionally brittle, because they either overspecify or underspecify, 
and it's not possible to get it "just right".  On the other hand, 
autoconf configurations tend to underspecify dependencies, because you 
typically only write an autoconf test for something that you know has 
changed in the past - you don't know what's going to change in the 
future, so you usually just hope for the best.  For Cabal I can ask the 
question "if I modify the API of package P, which other packages might be 
broken as a result?", but I can't do that with autoconf.
But the only reason they are broken is due to cabal's sledgehammer
approach to package versioning. There is no reason an autoconf style
system couldn't do the same thing.

And again, you are assuming you can even enumerate all the packages that
exist to find out which might be broken and what does that really give
you in any case? By changing the API you know you are going to break
some things, but what about all the company internal software out there
that uses haskell? you can't look at all their packages. It just does
not seem like a very useful thing to ask. as it is a question that can
be answered by 'grep'.
...
Both systems are flawed, but neither fundamentally.  For Cabal I think it 
would be interesting to look into using more precise dependencies  
(module.identifier::type, rather than package-version) and have them  
auto-generated.  But this has difficult implications: implementing  
cabal-install's installation plans becomes much harder, for example.
Again, I would like to see this as another option. I think there are
interesting ideas in cabal about configuration management. But there
needs to be room for alternates including old standby's like autoconf
...
...
...
So I accept that we do not yet cover the range of configuration choices
that are needed by the more complex packages (cf darcs), but I think
that we can and that the approach is basically sound. The fact that we
can automatically generate distro packages for hundreds of packages is
not insignificant. This is just not possible with the autoconf approach.
This is just utterly untrue. autoconfed packages that generate rpms,
debs, etc are quite common. The only reason cabal can autogenerate
distro packages for so many is that many interesting or hard ones just
_arn't possible with cabal at all_.
Exactly!  Cabal is designed so that a distro packager can write a program 
that takes a Cabal package and generates a distro package for their 
distro.  It has to do distro-specific stuff, but it doesn't typically 
need to do package-specific stuff.
To generate a distro package from an autoconf package either the package  
author has to include support for that distro, or a distro packager has 
to write specific support for that package.  There's no way to do generic 
autoconf->distro package generation, like there is with Cabal.
In cabal you only get it because you convinced the cabal people to put
in code to support your distro. Which isn't much different than asking
the package manager too.

And besides, this ability has nothing to do with cabal's configuration
management capabilities, simply its metadata format. which can easily be
abstracted out and not tied to cabal. (which I would love to see.
cabal has a lot of good ideas, but due to its design, its bad ideas are
complete showstoppers rather than things you can replace)

and there are many automatic package managers for autoconf style
packages.

http://www.toastball.net/toast/ is a good one, it even downloads
dependencies from freshmeat when needed. in fact, your projects can
probably be auto installed by 'toast projectname' and you didn't even
know it!

http://encap.org/ - one I use on pretty much all my systems. since it is
distro independent.
...
Yes this means that Cabal is less general than autoconf.  It was quite a  
revelation when we discovered this during the design of Cabal - 
originally we were going to have everything done programmatically in the 
Setup.hs file, but then we realised that having the package configuration 
available *as data* gave us a lot more scope for automation, albeit at 
the expense of some generality.
Note, I wholeheartedly agree with the idea of package configuration as
data. In fact, when cabal first started, I was a huge advocate of it and
in fact _lost interest_ in the project because of the decision to go
with the programatic Setup.hs rather than a declarative approach.

However, I think cabal is a _poor execution_ of the idea. And this
problem is compounded by the fact it is being promoted as the haskell
way to do things, it's design decisions are affecting development and
evolution of the base libraries. And it's monolithic nature and attitude
of wanting to take over your whole projects build cycle means that
alternate approaches cannot be explored.
...
That's the tradeoff - but there's still nothing stopping you from using  
autoconf and your own build system instead if you need to!
But it is a false tradeoff. the only reason one needs to make that
tradeoff is because cabals design doesn't allow the useful ability to
mix-n-match parts of it. I would prefer to see cabal improved so I _can_
use its metadata format, its configuration manager for simple projects,
autoconf's for more complex ones (with full knowledge of the tradeoffs)
and without jumping through hoops.
...
...
As for programs written in haskell, I don't want people's first
impression of haskell being "oh crap, I gotta learn a new way to build
things just because this program is written in some odd language called
'haskell'" I don't care how awesome a language is, I am going to be
annoyed by having to deal with it when I just want to compile/install a
program. It will leave a bad taste in my mouth. I would much rather
peoples first impression be "oh wow, this program is pretty sweet. I
wonder what it is written in?" hence they all use ./configure && make by
design rather than necessity.
Python packages don't have ./configure or make...
Some don't. And it bugs the hell out of me. They don't work with my
autopackaging tools.
...
...
I sometimes hear that I just shouldn't use cabal for some projects but,
when it comes down to it. If cabal is a limited build/configuration
system in any way, why would I ever choose it when starting a project
when I know it is either putting a limit on my projects ability to
innovate or knowing that at some point in the future I am going to have
to switch build systems?
Because if you *can* use Cabal, you get a lot of value-adds for free  
(distro packages, cabal-install, Haddock, source distributions, Hackage). 
What's more, it's really cheap to use Cabal: a .cabal file is typically  
less than a screenful, so it's no big deal to switch to something else  
later if you need to.
except suddenly you can't use hackage and have to come up with a new
build system and perhaps upset my users as they have to learn a new way
to build the project. 

The fact is that it _is_ a big deal to replace cabal is the main issue I
have. switching involves changing your build system completely. you
can't replace just parts of it easily. Or integrate cabal from the
bottom up rather than the top down. And it wants to be the _one true_
build system in your project.

I'd like to see a standardized meta-info format for just haskell
libraries, based on the current cabal format without the cabal specific
build information. (this is what jhc uses, and franchise too I think)
Just like the 'lsm' linux software map files. Preferably YAML, we are
pretty darn close already and it will give us parsers in many languages
for free. We already have several tools that can use the meta-info, jhc,
cabal, franchise, hackage (for the web site layout) so it seems like
abstracting it from the build info would be a useful step in the right
direction.

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈