Re: Build system idea

28 Aug 2008

      On Wed, Aug 27, 2008 at 10:18:59PM +0100, Duncan Coutts wrote:
...
On Wed, 2008-08-27 at 06:13 -0700, John Meacham wrote:
...
The problem with the way cabal wants to mix with make/autoconf is that
it is the wrong way round. make is very good at managing pre-processors,
dependency tracking and calling external programs in the right order, in
parallel, and as needed. cabal is generally good at building a single
library or executable given relatively straightforward haskell source.
(I know it _can_ do more, but this is mainly what it is good at).
The way this should work is that make determines what haskell libraries
need to be built, and what haskell files need to be generated to allow
cabal to run and calls cabal to build just the ones needed. cabal as a
build tool that make calls is much more flexible and in tune with each
tools capabilities.
I'd say if you're using make for all that, then use it to build the
haskell modules too. That gives the advantage of incremental and
parallel builds, which Cabal does not do yet (though we've got a GSoC
project just coming to an end which does this).
So, don't use cabal at all? that is the solution I have been going with so
far and am trying to remedy.
...
...
The other issue is with cabal files themselves which are somewhat
conflicted in purpose. on one hand, you have declarative stuff about a
package. name, version, etc... information you want before you start to
build something. but then you have build-depends, which is something
that you cannot know until after your configuration manager (whatever it
may be, autoconf being a popular one) is run.
Ah, but that's where the autoconf and Cabal models part ways.
...
What packages you depend on are going to depend on things like what
compiler you have installed, your configuration options, which
packages are installed, what operating system you are running on,
which kernel version you are running, which c libraries you have
installed. etc. things that cannot be predicted before the
configuration is actually run.
So Cabal takes the view that the relationship between features and
dependencies should be declarative. autoconf is essentially a function
from a platform environment to maybe a configuration. That's a very
flexible approach, the function is opaque and can do whatever feature
tests it likes. The downside is that it is not possible to work out what
the dependencies are. It might be able to if autoconf explained the
result of its decisions, but even then, it's not possible to work out
what dependencies are required to get a particular feature enabled. With
the Cabal approach these things are explicit.
unfortunately the cabal approach doesn't work. note, I am not saying a
declarative configuration manager won't work. in fact, I have sketched a
design for one on occasion. but cabal's particular choices are broken.
It is treading the same waters that made 'imake' fail.

the ideas of forwards and backwards compatability are _the_ defining
features of a configuration manager. Think about this, I can take my old
sunsite CD, burned _ten years_ ago and take the unchanged tarballs off
that CD and ./configure && make and in general most will work. many were
written before linux even existed, many were written with non gcc
compilers, yet they work today. The cabal way wasn't able to handle a
single release of ghc and keep forwards or backwards compatability.

That any project ever had to be changed to use the flag 'split-base' is
a travesty. What about all the projects on burnt cds or that don't have
someone to update them? 20 years from now when we are all using 'fhc'
(Fred's Haskell Compiler) will we still have this reference to
'split-base' in our cabal files? how many more flags will have
accumulated by then? Sure it's declarative, but in a language that
doesn't make sense without the rule-book.  autoconf tests things like
'does a library named foo exist and export bar'. 'is char signed or
unsigned on the target system'. those are declarative statement and
have a defined meaning through all time. (though, implemented in a
pretty ugly imperative way) That is what allows autoconfed packages to
be compiled by compilers on systems that were never dreamed of when the
packages were written.
...
The conditionals in a .cabal file can be read in either direction so it
is possible for a package manager to automatically work out what deps
would be needed for that optional libcurl feature, or GUI.
In the cabal framework Will cabal be able to do things like cross
compile a c file to an object file, and deconstruct the generated ELF
file to determine parameters needed for an unknown embedded platform
_and_ do so without me requiring the user to upgrade their cabal? This
is an example of the type of autoconf test that comes up in the real
world. You can never come up with a language that will have every needed
primitive, any restricted set will ultimately not be enough for someone.
and the only alternative is pretty much to not use cabal at all or hack
around it in odd ways.
...
The other principle is that the packager, the environment is in control
over what things the package 'sees'. With autoconf, the script can take
into account anything it likes, even if you'd rather it did not. Eg it's
important to be able to build a package that does not have that optional
dependency, even though the C lib is indeed installed on the build
machine, because I may be configuring it for a machine without the C
lib. Sure, some good packages allow those automagic decisions to be
overridden, but many don't and of course there is no easy way to tell if
it's picking up deps it should not. So one of the principles in Cabal
configuration is that all decisions about how to configure the package
are transparent to the packager and can be overridden.
I am not sure what you mean by this. autoconf's flexibility in this
regard is pretty exceptional when written properly. Native
cross-compilation is one of autoconfs strengths and a big motivating
factor in its design.
...
Now currently, Cabal only has a partial implementation of the concept
because when it tries to find a configuration that works in the current
environment (which it only does if the configuration is not already
fully specified by the packager) it only considers dependencies on
haskell packages. Obviously there are a range of other dependencies
specified in the .cabal file and it should use them all, in particular
external C libs.
And there are many other possible implementations of configuration
managers. I fully believe that the next big one will come out of the
haskell community, we are a good bunch of people. But it won't if
innovation is stifled by cabal _insisting_ on using its own
configuration manager and cabal being promoted as 'the way' to do
things.

This is completely independent of my opinions of cabal as a
configuration manager, I would just hate to see such an enticing area of
research be cut off prematurely. 

If cabal is going to be the way to do things with haskell, that means it
cannot be the place to try out ones own pet projects about how one
thinks things should be. A declarative configuration manager is an
intruiging project. one I want to see people work on in different
directions, but it is new research.
...
So I accept that we do not yet cover the range of configuration choices
that are needed by the more complex packages (cf darcs), but I think
that we can and that the approach is basically sound. The fact that we
can automatically generate distro packages for hundreds of packages is
not insignificant. This is just not possible with the autoconf approach.
This is just utterly untrue. autoconfed packages that generate rpms,
debs, etc are quite common. The only reason cabal can autogenerate
distro packages for so many is that many interesting or hard ones just
_arn't possible with cabal at all_. Cabal's inflexibility puts a huge
selection bias on the population of cabalized programs.
...
...
Then you have cabal as a packaging system (or perhaps hackage/cabal
considered together). Which has its own warts, if it is meant to live in
the niche of package managers such as rpm or deb, where are the
'release' version numbers that rpms and debs have for one example? If it is
meant to be a tarball like format, where is the distinction between
'distribution' and 'source' tarballs?
Right, it's supposed to be the upstream release format, tarballs. Distro
packages obviously have their additional revision numbers.
one might say hackage is a distro in and of itself, so should have
similar numbers. reusing the same file directly for the packager and the
build system makes things like this trickier than they need to be.
...
...
For instance, jhc from darcs for developers requires
perl,ghc,DrIFT,pandoc,autotools, and happy.  however the jhc
tarball requires _only_ ghc. nothing else. This is because the make
dist target is more interesting than just taring up the source. (and
posthooks/prehooks don't really help. they are sort of equivalent to
saying 'write your own build system'.)
Right. Cabal does that too (or strictly speaking, the Simple build
system can do this). For pre-processors that are platform independent
(like alex, happy etc) it puts the pre-processed source into the release
tarball. It's also possible to make tarballs without the pre-generated
files if it's important.
Sort of. but cabal can only do these things because they are _built in_
to cabal. make will happily use DrIFT, figure out dependencies for ghc,
gcc, and jhc, and build my rpms without having to be modified itself.
Because it was designed that way.
...
...
One of the biggest sources of conflict arise from using cabal as a
configuration manager. A configuration managers entire purpose is to
examine the system and figure out how to adapt your programs build to
the system.
Well, that's the autoconf view. It's not the only way of looking at it
as I explained above (perhaps not very clearly). I'd say a configuration
manager should negotiate between the package and the
packager/user/environment to find a configuration that is satisfactory
to all (which requires information flow in both directions).
...
this is completely 100% at odds with the idea of users
having to 'upgrade' cabal. Figuring out how to adapt your build to
whatever cabal is installed or failing gracefully if you can't is
exactly the job of the configuration manager. something like autoconf.
This is why _users_ need not install autoconf, just developers. since
autoconf generates a portable script is so that users are never told to
upgrade their autoconf. if a developer wants to use new features, he
gets the new autoconf and reruns 'autoreconf'. The user is never
asked to update anything that isn't actually needed for the project
itself. This distinction is key fora configuration manager and really
conflicts with cabal wanting to also be a build system and package
manager. It is also what is needed for forwards and backwards
compatibility.
I suppose in principle it'd be possible to ship the build system in
every package like autoconf/automake does. Perhaps we should allow that
as an option. It's doable since the Setup.hs can import local modules.
I don't see what you mean, autoconf doesn't "ship the build system" with
the package any more than ghc ships ghc with every binary it produces.

autoconf is a _compiler_ of a domain specific language to a portable
intermediate language by design. This means that autoconf need not be
upgraded or installed by users, yet developers are free to take
advantage of autoconf's newest features without troubling their users
because what is distributed is autoconfs compiled output.

If a user has to upgrade their cabal to install a package, then cabal is
broken as a configuration manager by design. If I were willing to make a
user upgrade their system, there would be _no need_ for a configuration
manager at all. The problem of building the most recent and updated
library with the most recent and updated compiler on a fully up to date
system is a _non problem_.
...
...
All in all, I think these conflicting goals of cabal make it hard to use
in projects and have led to very odd design choices. I think external
tools should not be the exception but rather the rule. Not that cabal
shouldn't come with a full set of said tools. But as long as they are
integrated I don't see cabal's design problems being fixed, meerly
augmented with various work-arounds.
One issue, with a pick and mix approach is what is the top level
interface that users/package managers use? The current choice (which I'm
not at all sure is the right one) is a Setup.hs file that imports its
build system from a library that's already on the system (or a custom
one implemented locally). So a system that uses make underneath still
has to present the Setup.hs interface so that package managers can use
it in a uniform way. You mention at the top that you think the
make/cabal relationship is the wrong way round, but the Cabal/Setup.hs
interface has to be the top level one (at least at the moment) so you'd
have Setup.hs call make and make call it back again to build various
bits like libs etc?
Right now I just have ./configure && make be the way to build things,
and the ./configure generates an appropriate cabal file when needed. But
the 'cabal proxy' stub cabal file similar to what you are saying is also
something I have considered (only for haskell libraries I want to put on
hackage) but it is far from ideal. 

As for programs written in haskell, I don't want people's first
impression of haskell being "oh crap, I gotta learn a new way to build
things just because this program is written in some odd language called
'haskell'" I don't care how awesome a language is, I am going to be
annoyed by having to deal with it when I just want to compile/install a
program. It will leave a bad taste in my mouth. I would much rather
peoples first impression be "oh wow, this program is pretty sweet. I
wonder what it is written in?" hence they all use ./configure && make by
design rather than necessity.
...
Do you think that separating the Simple build system from the
declarative part of Cabal would help? It'd make it more obvious that the
build system part really is replaceable which currently is not so
obvious since they're in the same package. I'm not averse to splitting
them if it'd help. They're already completely partitioned internally.
Yes it would help signifigantly if it were its own program, invoked by
cabal just like hmake or make or mk or cook or bake would be. It would
be a step in the right direction. But what I'd really like to see is a
split of the configuration management from the parts that meerly
describe the package.

I sometimes hear that I just shouldn't use cabal for some projects but,
when it comes down to it. If cabal is a limited build/configuration
system in any way, why would I ever choose it when starting a project
when I know it is either putting a limit on my projects ability to
innovate or knowing that at some point in the future I am going to have
to switch build systems?

If cabal isn't suitable or convinient for some projects (which we all
admit) and cabal is the haskell way of doing things then the perception
will be that _haskell_ is not suitable for said projects. And that is
what I fear.

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈