Specifying dependencies on Haskell code

Duncan Coutts

20 Apr 2008 20 Apr '08

8:22 p.m.

All, In the initial discussions on a common architecture for building applications and libraries one of the goals was to reduce or eliminate untracked dependencies. The aim being that you could reliably deploy a package from one machine to another. We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code. An obvious alternative model is embodied in ghc --make and in autoconf style systems where you look in the environment not for packages but rather for specific modules or functions. Both models have passionate advocates. There are of course advantages and disadvantages to each. Both models seem to get implemented as reactions having the other model inflicted on the author. For example the current Cabal model of package names and versions was a reaction to the perceived problem of untracked dependencies with the ghc --make system. One could see implementations such as searchpath and franchise as reactions in the opposite direction. The advantages and disadvantages of specifying dependencies on module names vs package names and versions are mostly inverses. Module name clashes between packages are problematic with one system and not a problem with the other. Moving modules between packages is not a problem for one system and a massive pain for the other. The fact is that both module name and package name + version are being used as proxies to represent some vague combination of required Haskell interface and implementation thereof. Sometimes people intend only to specify an interface and sometimes people really want to specify (partial) semantics (eg to require a version of something including some bug fix / semantic change). In this situation the package version is being used to specify an implementation as a proxy for semantics. Neither are very good ways of identifying an interface or implementation/semantics. Modules do move from one package to another without fundamentally changing. Modules do change interface and semantics without changing name. There is no guarantee about the relationship between a package's version and its interface or semantics though there are some conventions. Another view would be to try and identify the requirements about dependent code more accurately. For example to view modules as functors and look at what interface they require of the modules they import. Then we can say that they depend on any module that provides a superset of that interface. It doesn't help with semantics of course. Dependencies like these are not so compact and easy to write down. I don't have any point here exactly, except that there is no obvious solution. I guess I'd like to provoke a bit of a discussion on this, though hopefully not just rehashing known issues. In particular if people have any ideas about how we could improve either model to address their weak points then that'd be well worth discussing. For example the package versioning policy attempts to tighten the relationship between a package version and changes in its interface and semantics. It still does not help at all with modules moving between packages. Duncan

Show replies by date

Thomas Schilling

1 May 1 May

10:28 p.m.

On 20 apr 2008, at 22.22, Duncan Coutts wrote:

...

All,

In the initial discussions on a common architecture for building applications and libraries one of the goals was to reduce or eliminate untracked dependencies. The aim being that you could reliably deploy a package from one machine to another.

We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code.

An obvious alternative model is embodied in ghc --make and in autoconf style systems where you look in the environment not for packages but rather for specific modules or functions.

Both models have passionate advocates. There are of course advantages and disadvantages to each. Both models seem to get implemented as reactions having the other model inflicted on the author. For example the current Cabal model of package names and versions was a reaction to the perceived problem of untracked dependencies with the ghc --make system. One could see implementations such as searchpath and franchise as reactions in the opposite direction.

The advantages and disadvantages of specifying dependencies on module names vs package names and versions are mostly inverses. Module name clashes between packages are problematic with one system and not a problem with the other. Moving modules between packages is not a problem for one system and a massive pain for the other.

The fact is that both module name and package name + version are being used as proxies to represent some vague combination of required Haskell interface and implementation thereof. Sometimes people intend only to specify an interface and sometimes people really want to specify (partial) semantics (eg to require a version of something including some bug fix / semantic change). In this situation the package version is being used to specify an implementation as a proxy for semantics.

Neither are very good ways of identifying an interface or implementation/semantics. Modules do move from one package to another without fundamentally changing. Modules do change interface and semantics without changing name. There is no guarantee about the relationship between a package's version and its interface or semantics though there are some conventions.

Another view would be to try and identify the requirements about dependent code more accurately. For example to view modules as functors and look at what interface they require of the modules they import. Then we can say that they depend on any module that provides a superset of that interface. It doesn't help with semantics of course. Dependencies like these are not so compact and easy to write down.

I don't have any point here exactly, except that there is no obvious solution. I guess I'd like to provoke a bit of a discussion on this, though hopefully not just rehashing known issues. In particular if people have any ideas about how we could improve either model to address their weak points then that'd be well worth discussing.

For example the package versioning policy attempts to tighten the relationship between a package version and changes in its interface and semantics. It still does not help at all with modules moving between packages.

Duncan

[Replying so late as I only saw this today.] I believe that using tight version constraints in conjunction with the PVP to be a good solution. For now. I don't quite know how Searchpath works (the website is rather taciturn), but I think that we should strive for a better approximation to real dependencies, specifically, name, interface, and semantics of imported functions. As I see it, what's missing is proper tool support to do it practically for both library authors and users. Library users really shouldn't need to do anything except to run a tool to determine all dependencies of a given package. Library authors should be able to run a tool that determines what's new and what might have changed. The package author then merely decides whether semantics was changed and if so, in what way (i.e., compatible or not to previous semantics). Packages will still carry versions, but they are only used to mark changes. Semantic information is provided via a "change database" which contains enough information to determine whether a version of a package contains appropriate implementations of the functions (or, more generally, entities) used in a dependent package. For example, if we write a program that uses the function 'Foo.foo' contained in package 'foo' and we happen to have used 'foo-0.42' for testing of our program. Then, given the knowledge that 'Foo.foo' was introduced in 'foo-0.23' and changed semantics in 'foo-2.0' then we know that 'foo >= 0.23 && < 2.0' is the correct and complete dependency description. That's the ideal, maybe we can work towards this? Or does this sound crazy? / Thomas -- "Today a young man on acid realized that all matter is merely energy condensed to a slow vibration, that we are all one consciousness experiencing itself subjectively, there is no such thing as death, life is only a dream, and we are the imagination of ourselves." -- Bill Hicks

Duncan Coutts

11:49 p.m.

On Fri, 2008-05-02 at 00:28 +0200, Thomas Schilling wrote:

...

On 20 apr 2008, at 22.22, Duncan Coutts wrote:

...

[Replying so late as I only saw this today.]

I believe that using tight version constraints in conjunction with the PVP to be a good solution. For now.

I think I tend to agree.

...

I don't quite know how Searchpath works (the website is rather taciturn), but I think that we should strive for a better approximation to real dependencies, specifically, name, interface, and semantics of imported functions. As I see it, what's missing is proper tool support to do it practically for both library authors and users.

Yes, we can make package name and version a better approximation of the package interface with tools to enforce the versioning policy.

...

Library users really shouldn't need to do anything except to run a tool to determine all dependencies of a given package. Library authors should be able to run a tool that determines what's new and what might have changed. The package author then merely decides whether semantics was changed and if so, in what way (i.e., compatible or not to previous semantics). Packages will still carry versions, but they are only used to mark changes. Semantic information is provided via a "change database" which contains enough information to determine whether a version of a package contains appropriate implementations of the functions (or, more generally, entities) used in a dependent package.

For example, if we write a program that uses the function 'Foo.foo' contained in package 'foo' and we happen to have used 'foo-0.42' for testing of our program. Then, given the knowledge that 'Foo.foo' was introduced in 'foo-0.23' and changed semantics in 'foo-2.0' then we know that 'foo >= 0.23 && < 2.0' is the correct and complete dependency description.

That's the ideal, maybe we can work towards this? Or does this sound crazy?

I think extracting package APIs and comparing them across versions is an excellent thing to do. It'd help users see what has changed and it'd let us enforce the versioning policy (at least for interface changes, not for semantic changes). Having a central collection of those interfaces and using that to work out which versions of which packages would be compatible with the program I just wrote is quite an interesting idea. It's related to what I was saying about identifying code by it's full interface, as a functor, but then using that to map back to packages that provide the interface. Something like that might go some way to addressing David Roundy's quite legitimate criticism that the system of specifying deps on package names and versions requires one to know the full development history of that code, eg to track it across package renames. However it would only help for the development _history_, we still have no solution for the problem of packages being renamed (or modules moving between packages) breaking other existing packages. Though similarly we have no solution to the problem of modules being renamed. Perhaps it's just that we have not done much module renaming recently so people don't see it as an issue. Duncan

apfelmus

2 May 2 May

9:27 a.m.

Duncan Coutts wrote:

...

Thomas Schilling wrote:

...
For example, if we write a program that uses the function 'Foo.foo' contained in package 'foo' and we happen to have used 'foo-0.42' for testing of our program. Then, given the knowledge that 'Foo.foo' was introduced in 'foo-0.23' and changed semantics in 'foo-2.0' then we know that 'foo >= 0.23 && < 2.0' is the correct and complete dependency description.

I would go even further and simply use "my program 'bar' compiles with foo-0.42" as dependency description. In other words, whether the package foo-0.23 can be used to supply this dependency or not will be determined when somebody else tries to compile Bar with it. In both cases, the basic idea is that the library user should *not* think about library versions, he just uses the one that is in scope on his system. Figuring out which other versions can be substituted is the job of the library author. In other words, the burden of proof is shifted from the user ("will my program compile with foo-1.1?") to the author ("which versions of my library are compatible?"), where it belongs.

...

However it would only help for the development _history_, we still have no solution for the problem of packages being renamed (or modules moving between packages) breaking other existing packages. Though similarly we have no solution to the problem of modules being renamed. Perhaps it's just that we have not done much module renaming recently so people don't see it as an issue.

With the approach above, it's possible to handle package/module renaming. For instance, if the package 'foo' is split into 'f-0.1' and 'oo-0.1' at some point, we can still use the union of these two to fulfill the old dependency 'foo-0.42'. In other words, the basic model is that a module/package like 'bar' with a dependency like 'foo-0.42' as just a function that maps a value of the same type (= export list) as 'foo-0.42' to another value (namely the set of exports of 'bar'). So, we can compile for instance bar (foo-0.42) or bar (f-0.1 `union` oo-0.1) Of course, the problems are 1) specifying the types of the parameters, 2) automatically choosing good parameters. For 1), one could use a very detailed import list, but I think that this feels wrong. I mean, if I have to specify the imports myself, why did I import foo-0.42 in the first place? Put differently, when I say 'import Data.Map' I want to import both its implementation and the interface. So, I argue that the goal is to allow type specifications of the form 'same type as foo-0.42'. Problem 2) exists because if I have foo-0.5 on in scope on my system and a package lists foo-0.42 as a dependency, the compiler should somehow figure out that he can use foo-0.5 as argument. Of course, it will be tricky/impossible to figure out that f-0.1 `union` oo-0.1 is a valid argument, too. So, the task would be to develop a formalism, i.e. some kind of "lambda calculus for modules" that can handle problems 1) and 2). The formalism should be simple to understand and use yet powerful, just like our beloved lambda calculus. A potential pitfall to any solution is that name and version number don't identify a compiled package uniquely! For instance, foo-0.3 (bytestring-1.1) is very different from foo-0.3 (bytestring-1.2) if foo exports the ByteString type. That's the diamond import problem. In other words, foo-0.3 is always the same function, but the evaluated results are not. Regards, apfelmus

Thomas Schilling

3 May 3 May

1:35 p.m.

On 2 maj 2008, at 11.27, apfelmus wrote:

...

Duncan Coutts wrote:

...
Thomas Schilling wrote:

...
For example, if we write a program that uses the function 'Foo.foo' contained in package 'foo' and we happen to have used 'foo-0.42' for testing of our program. Then, given the knowledge that 'Foo.foo' was introduced in 'foo-0.23' and changed semantics in 'foo-2.0' then we know that 'foo >= 0.23 && < 2.0' is the correct and complete dependency description.

I would go even further and simply use "my program 'bar' compiles with foo-0.42" as dependency description. In other words, whether the package foo-0.23 can be used to supply this dependency or not will be determined when somebody else tries to compile Bar with it.

In both cases, the basic idea is that the library user should *not* think about library versions, he just uses the one that is in scope on his system. Figuring out which other versions can be substituted is the job of the library author. In other words, the burden of proof is shifted from the user ("will my program compile with foo-1.1?") to the author ("which versions of my library are compatible?"), where it belongs.

I think we mean the same thing. If I write a program and test it against a specific version of a library then my program's source code and knowledge about which specific versions of libraries I used, most of the time, contains *all* the information necessary to determine which other library versions it can be built with. From the source code we need information about what is imported, from the library author we need a *formal* changelog. This changelog describes for each released version what part of the interface and semantics have changed. The problem here is, of course, that this is a lot of information to provide. Furthermore, I think we need information about imports from the library user, if we ignore this, then the PVP is *exactly* what we need. The PVP describes when things *could* break, but it does so in an extremely pessimistic way. If we have information about what exactly changed and what is used by a particular library, we can find out what the exact version range is. For example, if we build our package against foo-0.42 and bar-2.3 and both packages follow the PVP then the following will trivially be true: build-depends: foo-0.42.*, bar-2.3.* where "-X.Y.*" is a shortcut for ">= X.Y && < X.(Y+1)". The problem is that this is extremely pessimistic, so we have to manually check whenever a new version of a dependency comes out and update the "known-to-work-with"-range. With more information (obtained mostly by tools) we can automate this process, and, in fact, both approaches can co-exist.

...

...
However it would only help for the development _history_, we still have no solution for the problem of packages being renamed (or modules moving between packages) breaking other existing packages. Though similarly we have no solution to the problem of modules being renamed. Perhaps it's just that we have not done much module renaming recently so people don't see it as an issue.

With the approach above, it's possible to handle package/module renaming. For instance, if the package 'foo' is split into 'f-0.1' and 'oo-0.1' at some point, we can still use the union of these two to fulfill the old dependency 'foo-0.42'.

This is kind of the same like using a "virtual package" that is simply a re-export of other packages. This would help a lot with our current problems with the base split (which will continue, as base will be split up even further).

...

In other words, the basic model is that a module/package like 'bar' with a dependency like 'foo-0.42' as just a function that maps a value of the same type (= export list) as 'foo-0.42' to another value (namely the set of exports of 'bar'). So, we can compile for instance

bar (foo-0.42)

or

bar (f-0.1 `union` oo-0.1)

Of course, the problems are

1) specifying the types of the parameters, 2) automatically choosing good parameters.

For 1), one could use a very detailed import list, but I think that this feels wrong. I mean, if I have to specify the imports myself, why did I import foo-0.42 in the first place? Put differently, when I say 'import Data.Map' I want to import both its implementation and the interface. So, I argue that the goal is to allow type specifications of the form 'same type as foo-0.42'.

Problem 2) exists because if I have foo-0.5 on in scope on my system and a package lists foo-0.42 as a dependency, the compiler should somehow figure out that he can use foo-0.5 as argument. Of course, it will be tricky/impossible to figure out that f-0.1 `union` oo-0.1 is a valid argument, too.

So, the task would be to develop a formalism, i.e. some kind of "lambda calculus for modules" that can handle problems 1) and 2). The formalism should be simple to understand and use yet powerful, just like our beloved lambda calculus.

A potential pitfall to any solution is that name and version number don't identify a compiled package uniquely! For instance,

foo-0.3 (bytestring-1.1)

is very different from

foo-0.3 (bytestring-1.2)

if foo exports the ByteString type. That's the diamond import problem. In other words, foo-0.3 is always the same function, but the evaluated results are not.

I think a formal changelog can also help with renaming (even of exported entities), but, I agree, for all this to work we need to formalise it first, and then build tools to automate most of the work. / Thomas -- My shadow / Change is coming. / Now is my time. / Listen to my muscle memory. / Contemplate what I've been clinging to. / Forty-six and two ahead of me.

David Roundy

2 May 2 May

4:55 p.m.

On Sun, Apr 20, 2008 at 09:22:56PM +0100, Duncan Coutts wrote:

...

In the initial discussions on a common architecture for building applications and libraries one of the goals was to reduce or eliminate untracked dependencies. The aim being that you could reliably deploy a package from one machine to another.

We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code.

Do you actually have any precedent for such a system? I've never heard of one, but then I've been sort of sheltered, due to living in the linux world where there is a distinction between packagers and upstream authors. I consider this a useful distinction. But that's probably because I'm lazy, or perhaps because I care about my users--and thus like to give them options and reduce the dependencies of my software. I know there is a long history of the autoconf-style approach being successful. Can you point to any success stories of the approach chosen for cabal? David

Ian Lynagh

3 May 3 May

1:51 p.m.

On Fri, May 02, 2008 at 09:55:32AM -0700, David Roundy wrote:

...

On Sun, Apr 20, 2008 at 09:22:56PM +0100, Duncan Coutts wrote:

...
We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code.

Do you actually have any precedent for such a system?

I know there is a long history of the autoconf-style approach being successful. Can you point to any success stories of the approach chosen for cabal?

LaTeX does things like \RequirePackage{longtable}[1995/01/01] According to http://peak.telecommunity.com/DevCenter/PythonEggs, with python eggs you do things like from pkg_resources import require require("FooBar>=1.2") According to http://blogs.cocoondev.org/crafterm/archives/004653.html, with Ruby gems you do things like s.add_dependency("dependency", ">= 0.x.x") (URLs found by googling for "how to make a <foo>") Those were just the first 3 things I thought of. I don't know what you would consider a success, though. Thanks Ian

David Roundy

6:30 p.m.

On Sat, May 03, 2008 at 02:51:33PM +0100, Ian Lynagh wrote:

...

On Fri, May 02, 2008 at 09:55:32AM -0700, David Roundy wrote:

...
On Sun, Apr 20, 2008 at 09:22:56PM +0100, Duncan Coutts wrote:

...
We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code.

Do you actually have any precedent for such a system?

I know there is a long history of the autoconf-style approach being successful. Can you point to any success stories of the approach chosen for cabal?

LaTeX does things like \RequirePackage{longtable}[1995/01/01]

I wouldn't call LaTeX a build system, although it's certainly a wonderful typesetting system.

...

According to http://peak.telecommunity.com/DevCenter/PythonEggs, with python eggs you do things like from pkg_resources import require require("FooBar>=1.2")

...

From what I can tell, python eggs aren't a build system either, but rather a binary package format.

...

According to http://blogs.cocoondev.org/crafterm/archives/004653.html, with Ruby gems you do things like s.add_dependency("dependency", ">= 0.x.x")

It seems that a ruby gem is also a binary package.

...

(URLs found by googling for "how to make a <foo>")

Those were just the first 3 things I thought of. I don't know what you would consider a success, though.

I'd definitely call LaTeX a success, have no idea about gems or eggs (which I'd never heard of before this email), but none of these are build systems, so far as I can tell. -- David Roundy Department of Physics Oregon State University

Ian Lynagh

4 May 4 May

4:20 p.m.

On Sat, May 03, 2008 at 11:30:44AM -0700, David Roundy wrote:

...

On Sat, May 03, 2008 at 02:51:33PM +0100, Ian Lynagh wrote:

...
According to http://peak.telecommunity.com/DevCenter/PythonEggs, with python eggs you do things like from pkg_resources import require require("FooBar>=1.2")

...
From what I can tell, python eggs aren't a build system either, but rather a binary package format.

To install a trac plugin you download a tarball and do something like python setup.py bdist_egg to create the .egg file, which you can then put in the appropriate place. I think in general you can also do python setup.py install to have it installed as a python library. I know virtually nothing about eggs, and even less about gems, but I am under the impression that they aim to solve the same problem as Cabal. Thanks Ian

David Roundy

5 May 5 May

10:50 a.m.

On Sun, May 04, 2008 at 05:20:54PM +0100, Ian Lynagh wrote:

...

On Sat, May 03, 2008 at 11:30:44AM -0700, David Roundy wrote:

...
On Sat, May 03, 2008 at 02:51:33PM +0100, Ian Lynagh wrote:

...
According to http://peak.telecommunity.com/DevCenter/PythonEggs, with python eggs you do things like from pkg_resources import require require("FooBar>=1.2")

...
From what I can tell, python eggs aren't a build system either, but rather a binary package format.

To install a trac plugin you download a tarball and do something like python setup.py bdist_egg to create the .egg file, which you can then put in the appropriate place. I think in general you can also do python setup.py install to have it installed as a python library.

I know virtually nothing about eggs, and even less about gems, but I am under the impression that they aim to solve the same problem as Cabal.

Maybe the problem is that noone seems to know what problem cabal is supposed to be solving. What problem is that? Some say it's a configuration/build system. Others say it's a packaging system. I think it's the latter. David

Duncan Coutts

6 May 6 May

11:13 a.m.

On Mon, 2008-05-05 at 03:50 -0700, David Roundy wrote:

...

Maybe the problem is that noone seems to know what problem cabal is supposed to be solving. What problem is that? Some say it's a configuration/build system. Others say it's a packaging system. I think it's the latter.

I'd say that Cabal is a build system but one that provides enough information to enable package management. That's the reason for the slight blurring/confusion with packaging systems. There is a much clearer division with autoconf/automake because it is a build system that does not provide enough information to enable package management. Cabal interfaces with package management systems in a similar way to ./configure && make && make install as one can see from the scripts that the distros use to build packages from source. Tools like cabal-rpm, hackport and dh_haskell use the information provided by cabal packages to make distro packages semi-automatically (It does not eliminate the QA job). cabal-install is a package manager for those Cabal packages that are not already packaged by the distros. It seems likely that there will always be a significant number of such packages as there is with CPAN etc. Hackage is an archive and distribution point for Cabal packages. Duncan

Simon Marlow

9 May 9 May

noon

David Roundy wrote:

...

On Sun, May 04, 2008 at 05:20:54PM +0100, Ian Lynagh wrote:

...
...
On Sat, May 03, 2008 at 02:51:33PM +0100, Ian Lynagh wrote:

...
According to http://peak.telecommunity.com/DevCenter/PythonEggs, with python eggs you do things like from pkg_resources import require require("FooBar>=1.2") From what I can tell, python eggs aren't a build system either, but rather a binary package format. To install a trac plugin you download a tarball and do something like

On Sat, May 03, 2008 at 11:30:44AM -0700, David Roundy wrote: python setup.py bdist_egg to create the .egg file, which you can then put in the appropriate place. I think in general you can also do python setup.py install to have it installed as a python library.

I know virtually nothing about eggs, and even less about gems, but I am under the impression that they aim to solve the same problem as Cabal.

Maybe the problem is that noone seems to know what problem cabal is supposed to be solving. What problem is that? Some say it's a configuration/build system. Others say it's a packaging system. I think it's the latter.

Does it matter? It's fine for a system to not fit entirely into one of the predefined boxes that you know about (e.g. is ZFS a file system or a volume manager?). Cabal solves a specific problem, which is: it allows a package to be built from source, and installed, on a system with only a Haskell compiler (and Cabal). the last part is important for people on Windows who don't want to install Cygwin or MSYS just to build Haskell packages. Now, we discovered that by adding bits here and there we could solve other problems too: e.g. Cabal also builds programs. But the above statement was originally the main reason for Cabal's existence. Cheers, Simon

David Roundy

12:28 p.m.

On Fri, May 09, 2008 at 01:00:11PM +0100, Simon Marlow wrote:

...

David Roundy wrote:

...
Maybe the problem is that noone seems to know what problem cabal is supposed to be solving. What problem is that? Some say it's a configuration/build system. Others say it's a packaging system. I think it's the latter.

Does it matter? It's fine for a system to not fit entirely into one of the predefined boxes that you know about (e.g. is ZFS a file system or a volume manager?). Cabal solves a specific problem, which is:

it allows a package to be built from source, and installed, on a system with only a Haskell compiler (and Cabal).

the last part is important for people on Windows who don't want to install Cygwin or MSYS just to build Haskell packages.

Now, we discovered that by adding bits here and there we could solve other problems too: e.g. Cabal also builds programs. But the above statement was originally the main reason for Cabal's existence.

I guess my problem is that some of the advocates of cabal don't seem to understand this, and seem to think that it's some sort of a general-purpose build system. The trouble is that it isn't an autoconf-replacement or a make-replacement, but folks keep comparing it with those programs and arguing that it should replace them. Indeed, it can replace them for simple packages, as you note, but it doesn't compete in terms of either generality or flexibility. David

Simon Marlow

12:59 p.m.

David Roundy wrote:

...

On Fri, May 09, 2008 at 01:00:11PM +0100, Simon Marlow wrote:

...
...
Maybe the problem is that noone seems to know what problem cabal is supposed to be solving. What problem is that? Some say it's a configuration/build system. Others say it's a packaging system. I think it's the latter. Does it matter? It's fine for a system to not fit entirely into one of the

David Roundy wrote: predefined boxes that you know about (e.g. is ZFS a file system or a volume manager?). Cabal solves a specific problem, which is:

it allows a package to be built from source, and installed, on a system with only a Haskell compiler (and Cabal).

the last part is important for people on Windows who don't want to install Cygwin or MSYS just to build Haskell packages.

Now, we discovered that by adding bits here and there we could solve other problems too: e.g. Cabal also builds programs. But the above statement was originally the main reason for Cabal's existence.

I guess my problem is that some of the advocates of cabal don't seem to understand this, and seem to think that it's some sort of a general-purpose build system. The trouble is that it isn't an autoconf-replacement or a make-replacement, but folks keep comparing it with those programs and arguing that it should replace them. Indeed, it can replace them for simple packages, as you note, but it doesn't compete in terms of either generality or flexibility.

The problem we found before Cabal was that people would appear and ask how to build a Haskell package, and they generally didn't know enough make or autoconf to do it alone. Even if they did, it's still a daunting task. Cabal just automates all this nicely. Before Cabal I could count on one hand the number of third-party Haskell packages available, and they all had their own hand-written build systems, which were often flaky. Now we have hundreds of packages that just work. We designed Cabal so that you could use it with autoconf as your configuration tool, and many packages do this. But you can't use autoconf to configure Haskell dependencies, because we want to know dependencies up front for things like cabal-install. So Cabal was never designed to replace autoconf or make, except for the particular case of building Haskell packages and programs. Generally the approach has been that if we can get rid of the need for autoconf by adding a tiny bit to Cabal, then that's a trade worth making, but I don't think anyone's saying we should re-implement autoconf in Cabal. However, re-implementing make in Cabal isn't nearly such a bad idea :-) Cheers, Simon

Duncan Coutts

6 May 6 May

11 a.m.

On Fri, 2008-05-02 at 09:55 -0700, David Roundy wrote:

...

On Sun, Apr 20, 2008 at 09:22:56PM +0100, Duncan Coutts wrote:

...
In the initial discussions on a common architecture for building applications and libraries one of the goals was to reduce or eliminate untracked dependencies. The aim being that you could reliably deploy a package from one machine to another.

We settled on a fairly traditional model, where one specifies the names and versions of packages of Haskell code.

Do you actually have any precedent for such a system?

I would count all the distro packaging systems as precedent. There are a few others but those are the most significant.

...

I've never heard of one, but then I've been sort of sheltered, due to living in the linux world where there is a distinction between packagers and upstream authors. I consider this a useful distinction.

I agree it is a useful distinction. I was a packager for gentoo for three years. The jobs have roughly the same goal -- to deliver great software to users -- but there is certainly a different focus.

...

But that's probably because I'm lazy, or perhaps because I care about my users--and thus like to give them options and reduce the dependencies of my software.

We are actually very lucky to have people doing the packaging job for us. It takes time and because of that only the most important bits of software get packaged. If we could significantly reduce the amount of time that packing people have to spend on each package then we could increase the number of packages that could benefit. So that's what Cabal's model of specifying dependencies is for, to provide enough information to enable package management. Without that information provided up front the packaging people have to spend much more time manually discovering the dependencies by reading through README and configure.ac files. With Cabal packages we have the possibility of generating distro packages automatically. Several distros have tools to do this automatic translation. This is something that is essentially impossible with autoconf. When we started using our translation tool in Gentoo we were able to increase the number of packages we provided by an order of magnitude. Of course we do not expect every little Haskell package to appear in every distro but the information provided by packages makes it possible to provide package management (in the form of cabal-install) even for the packages that do not meet the popularity or QA standards for the distros.

...

I know there is a long history of the autoconf-style approach being successful. Can you point to any success stories of the approach chosen for cabal?

Again I'd point to all the package management systems. If you want examples of build systems that provide enough information for package management then admittedly there are fewer. Ian already pointed out Python eggs and Ruby Gems. I think CPAN also has some method for tracking dependencies though I don't know if or how CPAN modules specify dependencies. Duncan

Roman Leshchinskiy

9 May 9 May

4:33 a.m.

Duncan Coutts wrote:

...

In the initial discussions on a common architecture for building applications and libraries one of the goals was to reduce or eliminate untracked dependencies. The aim being that you could reliably deploy a package from one machine to another.

Sorry for jumping in so late but here are my two cents anyway. IMO, a package is absolutely the wrong thing to depend on. Essentially, a package is an implementation of an interface and depending on implementations is a bad thing. Code should only depend on interfaces which are completely independent entities. I suspect that a lot of the problems with packages occur because the current system doesn't follow this simple principle. It would be nice if Cabal had an explicit concept of interfaces, with the idea that code depends on them and packages implement them. In the simplest case, an interface is just a name. Ideally, it would be a combination of type signatures, Quickcheck properties, proof obligations etc. The important thing is that it has an explicit definition which is completely independent of any concrete implementation and which never changes. Something like this would immediately solve a lot of problems. Several packages could implement the same interface and we could pick which one we want when building stuff. We could have much more fine-grained dependencies (if all I need is an AVL tree, I don't want to depend on the entire containers package, but rather just on the AVL part of it). One package could implement several versions of an interface to ensure compatibility with old code (I could imagine module names like AVL_1.Data.AVLTree, AVL_2.Data.AVLTree etc., where AVL_1 and AVL_2 are interface names; Cabal could then map the right module to Data.AVLTree when building). If interface definitions include something like Quickcheck properties, we would have at least some assurance that a package actually does implement its interfaces. Moreover, this would also make the properties themselves reusable. Note that I don't propose that we automatically extract interfaces from code. In fact, I think that would be precisely the wrong way to go. An interface is not a by-product of implementing a package. It should be defined explicitly. In general, I don't think that existing package management systems do a very good job of specifying dependencies. They sort of work for distributing software but do they really work for versioning libraries? In any case, we ought to have something better for Haskell where we (hopefully) have somewhat different standards when it comes to correctness and ease of use. It might be more worthwhile to look at systems such as Corba, Microsoft's OLE or whatever that's called nowadays, Java's equivalent, whatever that is and, of course, ML's modules. None of these is quite right for what we want but IMO they are much closer to our problem domain than something like RPM. Roman

Simon Marlow

12 May 12 May

11:04 a.m.

Roman Leshchinskiy wrote:

...

IMO, a package is absolutely the wrong thing to depend on. Essentially, a package is an implementation of an interface and depending on implementations is a bad thing. Code should only depend on interfaces which are completely independent entities. I suspect that a lot of the problems with packages occur because the current system doesn't follow this simple principle.

It would be nice if Cabal had an explicit concept of interfaces, with the idea that code depends on them and packages implement them. In the simplest case, an interface is just a name. Ideally, it would be a combination of type signatures, Quickcheck properties, proof obligations etc. The important thing is that it has an explicit definition which is completely independent of any concrete implementation and which never changes.

Something like this would immediately solve a lot of problems. Several packages could implement the same interface and we could pick which one we want when building stuff. We could have much more fine-grained dependencies (if all I need is an AVL tree, I don't want to depend on the entire containers package, but rather just on the AVL part of it). One package could implement several versions of an interface to ensure compatibility with old code (I could imagine module names like AVL_1.Data.AVLTree, AVL_2.Data.AVLTree etc., where AVL_1 and AVL_2 are interface names; Cabal could then map the right module to Data.AVLTree when building). If interface definitions include something like Quickcheck properties, we would have at least some assurance that a package actually does implement its interfaces. Moreover, this would also make the properties themselves reusable.

We already have interfaces, in the sense that a package *is* an interface. You're suggestion decoupling these notions, which I believe would add a lot of extra complexity without enough benefit to make it worthwhile. Let's take the examples you gave above: 1. several packages could implement the same interface. This can be done by having a single package that exports an interface and depends on one of the underlying "providers" selected at build-time. 2. fine-grained dependencies: just split up packages, or define new packages that just re-export parts of existing packages, if that's what you want. 3. one package could implement several versions of an interface. This is no different from having several versions of a package that can all be installed and used together. 4. interface definitions could have QuickCheck properties. Absolutely! And packages can have QuickCheck properties too. Admittedly in order to do most of this we need to be able to define packages that re-export the contents of other packages. You can in fact already do this, but it's clumsy, we just need some tool and compiler support to make it smoother. I'm already convinced that we need this, and I believe we should do it in the GHC 6.10 timeframe in order to allow better backwards compatibility. Using the Package Versioning Policy we have a clear way to know when a package's interface has changed, or when the interface has remained the same but the implementation has changed. We need tool support to check that the author is adhering to the PVP, though. Basically the current scheme minimizes the cognitive load by having a single concept (the package) that embodies several units: distribution, licensing, dependency, linking (amongst others). Cheers, Simon

Roman Leshchinskiy

1:53 p.m.

Simon Marlow wrote:

...

We already have interfaces, in the sense that a package *is* an interface. You're suggestion decoupling these notions, which I believe would add a lot of extra complexity without enough benefit to make it worthwhile.

Well, a package is an interface in the sense that it incidentially defines one. However, I have no idea how to extract the interface from the package. I also don't really understand how to depend on it (see below). In general, I think it's fair to say that what we have now is somewhat akin to dynamic typing of packages. I'd like to have static typing. I expect the benefits to be similar to those of static over dynamic typing. As to the complexity, I don't think having explicit interfaces would be much more complex than implementing a tool which checks that the package versioning policy is being followed.

...

Let's take the examples you gave above:

1. several packages could implement the same interface. This can be done by having a single package that exports an interface and depends on one of the underlying "providers" selected at build-time.

By build-time, do you mean when building the application which depends on the interface or when building the interface package? I'm interested in the former but I suspect you mean the latter. As an example, suppose Alice writes a package and Bob writes an application that depends on it. Now, Chris doesn't like Alice and forks her package. He now wants to build Bob's application with his package instead of Alice's. How can he do it at the moment without having Alice and/or Bob apply any patches to their stuff? What if instead of an app, Bob writes a package which other apps depend on?

...

2. fine-grained dependencies: just split up packages, or define new packages that just re-export parts of existing packages, if that's what you want.

I don't think that splitting up packages is really an option since the packages' authors wouldn't always agree to it and in any case, we don't want to have a lot of artificially small packages. Reexporting would work, I guess, but do I now have to distribute all those packages which just reexport along with my stuff? I general, IMO packages and interfaces are simply at different levels of granularity. To make this more concrete: suppose Alice implements package trees and Bob package containers. Both contain compatible implementations of AVL trees. Chris need AVL trees in his application but doesn't care where they come from and wants his users to be able to pick one when building the app. What does he do? And how do Alice and Bob ensure that their implementations are, in fact, compatible?

...

3. one package could implement several versions of an interface. This is no different from having several versions of a package that can all be installed and used together.

I have to disagree here. There is quite a difference between maintaining a legacy interface in new versions of software and maintaining several versions of that software. Suppose I write a package and a lot of applications (not necessarily mine) depend on version 1 of that package. Now, for version 2 I completely redesign it such that the interface becomes incompatible with version 1. Unless there is a way for me to somehow provide the version 1 interface in my version 2 package, I'll have to keep maintaining the old version until all those applications migrate. This can be done now, I suppose, but not without jumping through some hoops.

...

4. interface definitions could have QuickCheck properties. Absolutely! And packages can have QuickCheck properties too.

Of course, but these properties can't be easily shared between packages which implement the same interface. Suppose Alice implements a package and Bob wants to reimplement it. He can copy Alice's QuickCheck properties into his code, of course, but that means that he we now have two sets of those properties which will be maintained separately. Also, these properties aren't really visible to a package's client at the moment since we don't even have a convention where to put those. IIUC, they aren't even considered part of the interface when it comes to package versioning. This is bad!

...

Using the Package Versioning Policy we have a clear way to know when a package's interface has changed, or when the interface has remained the same but the implementation has changed. We need tool support to check that the author is adhering to the PVP, though.

What will this tool check? That the set of exported names doesn't change? That they have the same signatures? That the QuickCheck properties remain the same? That the package satisfies the same theories (once we have support for formal reasoning)? If the answer to any of the above is no, then why not? On the other hand, if it will do all of the above, then wouldn't it be much easier to explicitly specify the interface instead of having to extract it from the package somehow? Also, I don't really understand how PVP is supposed to work, to be honest. If all I need is function doThis from package foo 1.0, what do I depend on? foo 1.*? What if foo 2.0 has a slightly different interface but still exports doThis? Finally, I was always under the impression that PVP is something we want to have for the core packages. I didn't realise that it is supposed to be a universally accepted policy and I don't think that would work. Almost all software companies, for instance, have their own versioning policy. We can't make them use ours. In general, the larger the Haskell community becomes the less likely it will be that any kind of convention will work. We ought to plan for the future and adopt an approach that scales.

...

Basically the current scheme minimizes the cognitive load by having a single concept (the package) that embodies several units: distribution, licensing, dependency, linking (amongst others).

I have my doubts about minimizing the cognitive load. To return to the dynamic vs. static typing analogy: although static typing requires more bookkeeping and introduces more concepts, it certainly leads to less cognitive load for me. Abstraction barriers are good! Well, as long as there aren't too many but at the moment, we don't have *any*. Roman

6263

Age (days ago)

6285

Last active (days ago)

List overview

Download

17 comments

7 participants

participants (7)

apfelmus
David Roundy
Duncan Coutts
Ian Lynagh
Roman Leshchinskiy
Simon Marlow
Thomas Schilling

Specifying dependencies on Haskell code

tags

participants (7)