Re: Specifying dependencies on Haskell code

3 May 2008

      On 2 maj 2008, at 11.27, apfelmus wrote:
...
Duncan Coutts wrote:
...
Thomas Schilling wrote:
...
For example, if we write a program that uses the function  
'Foo.foo'  contained in package 'foo' and we happen to have used  
'foo-0.42' for  testing of our program.  Then, given the  
knowledge that 'Foo.foo' was  introduced in 'foo-0.23' and  
changed semantics in 'foo-2.0' then we  know that 'foo >= 0.23 &&  
< 2.0' is the correct and complete  dependency description.
I would go even further and simply use "my program 'bar' compiles  
with foo-0.42" as dependency description. In other words, whether  
the package foo-0.23 can be used to supply this dependency or not  
will be determined when somebody else tries to compile Bar with it.
In both cases, the basic idea is that the library user should *not*  
think about library versions, he just uses the one that is in scope  
on his system. Figuring out which other versions can be substituted  
is the job of the library author. In other words, the burden of  
proof is shifted from the user ("will my program compile with  
foo-1.1?") to the author ("which versions of my library are  
compatible?"), where it belongs.
I think we mean the same thing.  If I write a program and test it  
against a specific version of a library then my program's source code  
and knowledge about which specific versions of libraries I used, most  
of the time, contains *all* the information necessary to determine  
which other library versions it can be built with.

 From the source code we need information about what is imported,  
from the library author we need a *formal* changelog.  This changelog  
describes for each released version what part of the interface and  
semantics have changed.

The problem here is, of course, that this is a lot of information to  
provide.  Furthermore, I think we need information about imports from  
the library user, if we ignore this, then the PVP is *exactly* what  
we need. The PVP describes when things *could* break, but it does so  
in an extremely pessimistic way.  If we have information about what  
exactly changed and what is used by a particular library, we can find  
out what the exact version range is.  For example, if we build our  
package against foo-0.42 and bar-2.3 and both packages follow the PVP  
then the following will trivially be true:

   build-depends: foo-0.42.*, bar-2.3.*

where "-X.Y.*" is a shortcut for ">= X.Y && < X.(Y+1)".  The problem  
is that this is extremely pessimistic, so we have to manually check  
whenever a new version of a dependency comes out and update the  
"known-to-work-with"-range.  With more information (obtained mostly  
by tools) we can automate this process, and, in fact, both approaches  
can co-exist.
...
...
However it would only help for the development _history_, we still  
have
no solution for the problem of packages being renamed (or modules  
moving
between packages) breaking other existing packages. Though  
similarly we
have no solution to the problem of modules being renamed. Perhaps  
it's
just that we have not done much module renaming recently so people  
don't
see it as an issue.
With the approach above, it's possible to handle package/module  
renaming. For instance, if the package 'foo' is split into 'f-0.1'  
and 'oo-0.1' at some point, we can still use the union of these two  
to fulfill the old dependency 'foo-0.42'.
This is kind of the same like using a "virtual package" that is  
simply a re-export of other packages.  This would help a lot with our  
current problems with the base split (which will continue, as base  
will be split up even further).
...
In other words, the basic model is that a module/package like 'bar'  
with a dependency like 'foo-0.42' as just a function that maps a  
value of the same type (= export list) as 'foo-0.42' to another  
value (namely the set of exports of 'bar'). So, we can compile for  
instance
bar (foo-0.42)
or
bar (f-0.1 `union` oo-0.1)
Of course, the problems are
1) specifying the types of the parameters,
 2) automatically choosing good parameters.
For 1), one could use a very detailed import list, but I think that  
this feels wrong. I mean, if I have to specify the imports myself,  
why did I import foo-0.42 in the first place? Put differently, when  
I say 'import Data.Map' I want to import both its implementation  
and the interface. So, I argue that the goal is to allow type  
specifications of the form 'same type as foo-0.42'.
Problem 2) exists because if I have foo-0.5 on in scope on my  
system and a package lists foo-0.42 as a dependency, the compiler  
should somehow figure out that he can use foo-0.5 as argument. Of  
course, it will be tricky/impossible to figure out that  f-0.1  
`union` oo-0.1  is a valid argument, too.
So, the task would be to develop a formalism, i.e. some kind of  
"lambda calculus for modules" that can handle problems 1) and 2).  
The formalism should be simple to understand and use yet powerful,  
just like our beloved lambda calculus.
A potential pitfall to any solution is that name and version number  
don't identify a compiled package uniquely! For instance,
foo-0.3 (bytestring-1.1)
is very different from
foo-0.3 (bytestring-1.2)
if foo exports the ByteString type. That's the diamond import  
problem. In other words,  foo-0.3  is always the same function, but  
the evaluated results are not.
I think a formal changelog can also help with renaming (even of  
exported entities), but, I agree, for all this to work we need to  
formalise it first, and then build tools to automate most of the work.

/ Thomas
--
My shadow / Change is coming. / Now is my time. / Listen to my muscle  
memory. / Contemplate what I've been clinging to. / Forty-six and two  
ahead of me.