Thoughts on Procedure

Hi guys, in my understanding, our current update procedure works like this: 1) We notice that a package was updated (or added) on Hackage by means of RSS. 2) A maintainer runs cabal2arch to generate an updated PKGBUILD. 3) If the generated PKGBUILD looks good, the file is committed to the Git repository and uploaded to AUR. There are a few things worth noting about that procedure: - A maintainer must perform 1 manual step per updated package: that is linear complexity O(n). - There is no mechanism to guarantee that the updated set of PKGBUILD files actually works. - It's common practice to use version control systems like Git to track original source code. Our setup, however, tracks generated files: the PKGBUILDs are produced automatically by cabal2arch. So why do we track them? Shouldn't we rather track the Cabal files? Naturally, one wonders how to improve the update process. There are a few possible optimizations: - The simplest way to verify whether all PKGBUILDs compile is to, well, compile them. Given a set of updated packages, all packages that directly or indirectly depend on any of the updated packages need re-compilation, and the current set of PKGBUILDs is to be considered valid only if all those builds succeed. - It is possible to download the entire state of Hackage in a single tarball. Given all the Cabal files, a Makefile can automatically re-generate those PKGBUILDs that need updating. The same Makefile can also run the necessary builds, and it also perform the necessary uploads to AUR. Based on these thoughts, I would like to propose an improved procedure for discussion. Let our Git repository track a set of Cabal files. Then an update would work like this: 1) A maintainer downloads http://hackage.haskell.org/packages/archive/00-index.tar.gz and extracts the Cabal files into a checked-out Git repository. 2) Optionally, inspect changes with "git status" and "git diff". 3) Run "make all" to re-build all PKGBUILD files that need updating. 4) Run "make check" to perform all necessary re-builds of binary packages. If all builds succeed, proceed with (5). Otherwise, figure out which package broke the build and revert the changes in the corresponding Cabal file. Go back to (3). 5) Run "make upload" and "git commit" the changes. Now, this procedure is supposed to update AUR, but "make upload" can be easily extended to copy the generated packages into a binary repository as well. The worst case scenario occurs when every single available update breaks during "make check". In that case, the procedure has linear complexity O(n). The best case scenario, on the other hand, is the one where every single update succeeds. That case is handled by running "make all && make check && make upload", which gives constant complexity O(1). More importantly, however, the "make check" phase would guarantee that we never ever publish a configuration that doesn't compile. How do you feel about the idea? Take care, Peter

simons:
Hi guys,
in my understanding, our current update procedure works like this:
1) We notice that a package was updated (or added) on Hackage by means of RSS.
2) A maintainer runs cabal2arch to generate an updated PKGBUILD.
3) If the generated PKGBUILD looks good, the file is committed to the Git repository and uploaded to AUR.
There are a few things worth noting about that procedure:
- A maintainer must perform 1 manual step per updated package: that is linear complexity O(n).
- There is no mechanism to guarantee that the updated set of PKGBUILD files actually works.
Under my old system, each package was * converted via cabal2arch type check and compile the result: * makepkg -f So anything into AUR has to at least have type checked at one point (even though each upload could have broken downstream packages). It will be a stronger position if you aim for always having a "consistent" package set -- i.e. all packages are guaranteed to be consistent with respect to their version needs. A binary approach will demand this.
Naturally, one wonders how to improve the update process. There are a few possible optimizations:
- The simplest way to verify whether all PKGBUILDs compile is to, well, compile them. Given a set of updated packages, all packages that directly or indirectly depend on any of the updated packages need re-compilation, and the current set of PKGBUILDs is to be considered valid only if all those builds succeed.
3) Run "make all" to re-build all PKGBUILD files that need updating.
4) Run "make check" to perform all necessary re-builds of binary packages. If all builds succeed, proceed with (5). Otherwise, figure out which package broke the build and revert the changes in the corresponding Cabal file. Go back to (3).
5) Run "make upload" and "git commit" the changes.
So we'll always have a consistent set? This would be a valuable list -- no other distro aims for this, other than Debian, which aims only to support a few hundred packages. I think a full rebuild of the dependency tree each time is the only way to be sure -- to force typechecking of the affected parts of the tree. Additionally, once 'cabal check' is more widespread, you may also do functional testing of the tree. -- Don

Hi Don,
So we'll always have a consistent set? This would be a valuable list -- no other distro aims for this, other than Debian, which aims only to support a few hundred packages.
the way I see it, Hackage represents the "known universe" of Haskell packages. That distribution can be accessed by means of Cabal. Unfortunately, build instructions for Cabal are too sophisticated to be used directly by other package managers, like Pacman, because these package managers lack the ability to install several versions of the same package at the same time. Now, our task is to map Cabal to Pacman, i.e. we find the most recent subset of Hackage that can be installed within the boundaries of Pacman. This ought to be possible in pure computation: the Cabal files contain everything we need to know. In theory, this "simple" subset of Hackage can be translated into build instructions for any number of package managers. If we can generate PKGBUILD files, chances are that we can generate build instructions for Gentoo, NixOS, FreeBSD, and whatnot else just the same. Maybe we should write a cabal2prolog converter? If we had Hackage available in Prolog, then figuring out those dependencies would be no problem at all! :-) Take care, Peter

On 2010/10/14 Peter Simons
More importantly, however, the "make check" phase would guarantee that we never ever publish a configuration that doesn't compile.
How do you feel about the idea?
My project is to implement in pure Haskell formal consistency checks (PKGBUILDs requiring versions not existent in provided PKGBUILDs). I think I already posted example scripts that work with Hackage's tarball. -- Rémy.

On 2010/10/14 Rémy Oudompheng
On 2010/10/14 Peter Simons
wrote: More importantly, however, the "make check" phase would guarantee that we never ever publish a configuration that doesn't compile.
How do you feel about the idea?
My project is to implement in pure Haskell formal consistency checks (PKGBUILDs requiring versions not existent in provided PKGBUILDs). I think I already posted example scripts that work with Hackage's tarball.
Done: checkout my example script [*]. Running it on a habs checkout gives the following list of problems. darcs-monitor haskell-happstack-ixset haskell-numerals haskell-restng haskell-safe-freeze haskell-simpleirc haskell-vect-opengl hfractal [*] http://github.com/remyoudompheng/archhaskell/blob/pkgbuild-parser/tests/list... -- Rémy.

Hi Rémy,
My project is to implement in pure Haskell formal consistency checks (PKGBUILDs requiring versions not existent in provided PKGBUILDs).
are those checks based on the PKGBUILD files? I wonder, because those files contain only a subset of the available information since cabal2arch cannot translate all version specifications in dependencies. Such a check would spot some errors, i.e. it would recognize that PKGBUILDS are flat-out missing, but it can't verify whether a dependency like "foo == 1.*" or "bar >= 3 && < 5" is fulfilled. To implement those kind of checks, it would be necessary to work with the original Cabal files. Does your tool do that? If it does, then it has the potential to speed up "make check" quite a lot! Take care, Peter

simons:
Hi Rémy,
My project is to implement in pure Haskell formal consistency checks (PKGBUILDs requiring versions not existent in provided PKGBUILDs).
are those checks based on the PKGBUILD files? I wonder, because those files contain only a subset of the available information since cabal2arch cannot translate all version specifications in dependencies. Such a check would spot some errors, i.e. it would recognize that PKGBUILDS are flat-out missing, but it can't verify whether a dependency like "foo == 1.*" or "bar >= 3 && < 5" is fulfilled.
To implement those kind of checks, it would be necessary to work with the original Cabal files. Does your tool do that? If it does, then it has the potential to speed up "make check" quite a lot!
You can use cabal-install to do consistency checks, as described here: http://www.well-typed.com/blog/24 E.g. cabal install haxml-1.19 darcsmonitor --dry -v Where haxml-1.19 is a package we have "fixed". Resolving dependencies... selecting HaXml-1.19 (hackage) and discarding polyparse-1.0 selecting HaXml-1.13.2 and HaXml-1.13.3. However none of them are available. HaXml-1.13.2 was excluded because HaXml-1.19 was selected instead HaXml-1.13.2 was excluded because of the top level dependency HaXml ==1.19 HaXml-1.13.3 was excluded because HaXml-1.19 was selected instead HaXml-1.13.3 was excluded because of the top level dependency HaXml ==1.19 cabal: cannot configure darcs-monitor-0.4.0. It requires HaXml <1.14 Indicating that those are inconsistent. You could take the list of things in AUR, with version numbers, and ask cabal install if they're consistent (and then, when they're not, iterate, removing inconsistent packages).

On 2010/10/14 Don Stewart
You can use cabal-install to do consistency checks, as described here:
http://www.well-typed.com/blog/24
[...]
Indicating that those are inconsistent. You could take the list of things in AUR, with version numbers, and ask cabal install if they're consistent (and then, when they're not, iterate, removing inconsistent packages).
That's certainly a good idea, but we would have to encode the package name in the PKGBUILD to retrieve it easily. The best option would be to have cabal-install as a library so that we could access it from Haskell. I don't want to use IOs for something that should be a pure function. Do you know why cabal-install contains libraries not in cabal ? -- Rémy.

remyoudompheng:
On 2010/10/14 Don Stewart
wrote: You can use cabal-install to do consistency checks, as described here:
http://www.well-typed.com/blog/24
[...]
Indicating that those are inconsistent. You could take the list of things in AUR, with version numbers, and ask cabal install if they're consistent (and then, when they're not, iterate, removing inconsistent packages).
That's certainly a good idea, but we would have to encode the package name in the PKGBUILD to retrieve it easily. The best option would be to have cabal-install as a library so that we could access it from Haskell. I don't want to use IOs for something that should be a pure function.
Well, its not the end of the world, and it is a pure function over the parsed set of PKGBUILDs. The Haskell/Cabal package name for a given Arch package is computed from the Hackage URL in the PKGBUILD, like so: import Distribution.ArchLinux.AUR import Distribution.ArchLinux.PkgBuild import System.Environment import System.FilePath main = do [p] <- getArgs k <- package p case k of (Right aur, _) | not (null (packageURL aur)) -> do putStrLn $ takeFileName (packageURL aur) -- haskell package name _ -> putStrLn $ "Package "++show p++" doesn't have an associated PKGBUILD" An example: $ ./get-hackage-name.hs haskell-haxml HaXml -- Don

On 2010/10/14 Peter Simons
are those checks based on the PKGBUILD files? I wonder, because those files contain only a subset of the available information since cabal2arch cannot translate all version specifications in dependencies. Such a check would spot some errors, i.e. it would recognize that PKGBUILDS are flat-out missing, but it can't verify whether a dependency like "foo == 1.*" or "bar >= 3 && < 5" is fulfilled.
To implement those kind of checks, it would be necessary to work with the original Cabal files. Does your tool do that? If it does, then it has the potential to speed up "make check" quite a lot!
Some remarks: 1. it may be possible that makepkg recognises dependencies looking like ("package>=3" "package<5") I don't really know how pacman will deal with that. The important thing is that the source repository is consistent. 2. I don't know how to handle non-existent dependencies: for the moment I assume they are fulfilled, because for example they are system packages. -- Rémy.

On 2010/10/14 Peter Simons
Hi Rémy,
> My project is to implement in pure Haskell formal consistency checks > (PKGBUILDs requiring versions not existent in provided PKGBUILDs).
are those checks based on the PKGBUILD files? I wonder, because those files contain only a subset of the available information since cabal2arch cannot translate all version specifications in dependencies. Such a check would spot some errors, i.e. it would recognize that PKGBUILDS are flat-out missing, but it can't verify whether a dependency like "foo == 1.*" or "bar >= 3 && < 5" is fulfilled.
To implement those kind of checks, it would be necessary to work with the original Cabal files. Does your tool do that? If it does, then it has the potential to speed up "make check" quite a lot!
I implemented what you say [*]. I attached a list of conflicts in the current habs tree (the tool works from the PKGLIST, which is generated from the ABS tree using a small shell script that is in the same repo). [*] http://github.com/remyoudompheng/archhaskell-build/blob/master/scripts/findc... -- Rémy.

Hi Rémy,
http://github.com/remyoudompheng/archhaskell-build/blob/master/scripts/findc...
this is great stuff, thank you very much for writing that tool! It appears that the number of conflicts we have in our habs tree is massive. How do we remedy those conflicts? Does anyone have a suggestion what we could do to improve the situation? Take care, Peter

On Thu, Oct 14, 2010 at 20:04, Peter Simons
Hi guys,
in my understanding, our current update procedure works like this:
1) We notice that a package was updated (or added) on Hackage by means of RSS.
2) A maintainer runs cabal2arch to generate an updated PKGBUILD.
3) If the generated PKGBUILD looks good, the file is committed to the Git repository and uploaded to AUR.
There are a few things worth noting about that procedure:
- A maintainer must perform 1 manual step per updated package: that is linear complexity O(n).
- There is no mechanism to guarantee that the updated set of PKGBUILD files actually works.
- It's common practice to use version control systems like Git to track original source code. Our setup, however, tracks generated files: the PKGBUILDs are produced automatically by cabal2arch. So why do we track them? Shouldn't we rather track the Cabal files?
Naturally, one wonders how to improve the update process. There are a few possible optimizations:
- The simplest way to verify whether all PKGBUILDs compile is to, well, compile them. Given a set of updated packages, all packages that directly or indirectly depend on any of the updated packages need re-compilation, and the current set of PKGBUILDs is to be considered valid only if all those builds succeed.
- It is possible to download the entire state of Hackage in a single tarball. Given all the Cabal files, a Makefile can automatically re-generate those PKGBUILDs that need updating. The same Makefile can also run the necessary builds, and it also perform the necessary uploads to AUR.
Based on these thoughts, I would like to propose an improved procedure for discussion. Let our Git repository track a set of Cabal files. Then an update would work like this:
1) A maintainer downloads
http://hackage.haskell.org/packages/archive/00-index.tar.gz
and extracts the Cabal files into a checked-out Git repository.
2) Optionally, inspect changes with "git status" and "git diff".
3) Run "make all" to re-build all PKGBUILD files that need updating.
4) Run "make check" to perform all necessary re-builds of binary packages. If all builds succeed, proceed with (5). Otherwise, figure out which package broke the build and revert the changes in the corresponding Cabal file. Go back to (3).
5) Run "make upload" and "git commit" the changes.
Now, this procedure is supposed to update AUR, but "make upload" can be easily extended to copy the generated packages into a binary repository as well.
The worst case scenario occurs when every single available update breaks during "make check". In that case, the procedure has linear complexity O(n). The best case scenario, on the other hand, is the one where every single update succeeds. That case is handled by running "make all && make check && make upload", which gives constant complexity O(1).
More importantly, however, the "make check" phase would guarantee that we never ever publish a configuration that doesn't compile.
How do you feel about the idea?
Taking it one step further:
• Replace archhaskell/habs with a single version-controlled file
containing tuples of

On 2010/10/15 Magnus Therning
Taking it one step further:
• Replace archhaskell/habs with a single version-controlled file containing tuples of
. • Make use of bauerbill's already existing support for hackage. (I don't know anything about the internals of bauerbill, but it might need some extending to closer match what cabal2arch does.) Then the process would be:
1. Monitor the RSS feed from hackage. 2. Modify the relevant tuples in the file. 3. Based on 'git diff' run bauerbill on the updated packages. 4. Find the dependants, and re-build them. 5. If all is well upload to AUR or the binary repo. 6. Rinse and repeat.
All steps could then be wrapped up in a makefile. Furthermore, bauerbill could just have knowledge of the control file we maintain, and then step 5 can be skipped.
In any case, I feel that the discussion of what to store in our git repo, whether it's Arch source packages or cabal files or tuples, isn't that important at this point, i.e. your steps 2-5 are the steps to concentrate on. If we are going to attempt maintaining more than a handful binary packages then we'll get most value out of automating the time consuming bits of that. Remy is hard at work on pieces of that, but there's more to be worked out.
Xyne may confirm that, but bauerbill is merely a wrapper around cabal2arch and pacman, which probably resolves dependencies in the ArchLinux way and not in the Cabal way. I would dream of Cabal 2 [#] to do things in the most (in my opinion) satisfactory way. If we do not have an option to install cabal-install as a library, I don't think this is indeed necessary. The 'cabal-query' library gives a very nice example of how we could manage a habs tree purely in memory. 1. I support version-controlling a single package list file. It would be a good idea to get rid of package lists stored in archlinux-web 2. I suggest to add a small Distribution.Archlinux.ArchPackage module: which would manage a data structure like this: { pkg :: PkgBuild, installscript :: String, otherfiles :: [String] } It would give a complete internal representation of a Archlinux package and would provide interested developers a Haskell way of managing generic packages (i.e. not necessarily Haskell packages) 2. I would like to move Cabak2Pkgbuild translation to a Distribution.Archlinux.CabalTranslation module, for example. cabal2arch would keep the command line interface with its options, and additional scripts for the steps you mention. 3. I would like to add a Distribution.Archlinux.HackageTranslation module, that would be able to process a tarball downloaded from Hackage and output a SrcRepo structure for example. I think CabalTranslation of exposing a function packageFromCabal :: PackageDescription -> Maybe a -> Maybe b -> ArchPackage - taking as arguments a Cabal package, optionally a list of dependencies to clear out, optionally a translation table - putting out a ArchPackage structure. It would give the following modules: generic package generation and handling : Distribution.Archlinux.PkgBuild Distribution.Archlinux.ArchPackage Distribution.Archlinux.SrcRepo interaction with Cabal and Hackage (no IO) : Distribution.Archlinux.CabalTranslation Distribution.Archlinux.HackageTranslation -- Rémy.

On 2010/10/15 Magnus Therning
Taking it one step further:
• Replace archhaskell/habs with a single version-controlled file containing tuples of
. • Make use of bauerbill's already existing support for hackage. (I don't know anything about the internals of bauerbill, but it might need some extending to closer match what cabal2arch does.)
I pushed several additions to the archlinux library, and wrote a small script that sketches this part of the procedure. First some comments : * raw Cabal files from 00-index.tar.gz cannot be converted right away to PKGBUILDs, thay must first go through a configuration procedure, which is implemented in Cabal2Arch: I copied the relevant code to the function Distribution.Archlinux.CabalTranslation.preprocessCabal * I added a module Distribution.ArchLinux.HackageTranslation which gives two main functions: - getCabalsFromTarball turns a tarball (ByteString) to a list on GenericPackageDescription - getSpecifiedCabalsFromTarball takes a [String] as additional argument to extract only several items The [String] argument is a set of lines like in the attached PKGLIST file. GenericPackageDescription variables can be fed to preprocessCabal and cabal2pkg to produce PKGBUILD See the attached script for an example. -- Rémy.

On 2010/10/17 Rémy Oudompheng
On 2010/10/15 Magnus Therning
wrote: Taking it one step further:
• Replace archhaskell/habs with a single version-controlled file containing tuples of
. • Make use of bauerbill's already existing support for hackage. (I don't know anything about the internals of bauerbill, but it might need some extending to closer match what cabal2arch does.) I pushed several additions to the archlinux library, and wrote a small script that sketches this part of the procedure. First some comments : [...]
- the HackageTranslation module coins a dependency on an external library Codec.Archive.Tar - I would put any script that generates PKGBUILDs, like the one I sent, in the cabal2arch repository -- Rémy.

On 17/10/10 15:11, Rémy Oudompheng wrote:
On 2010/10/15 Magnus Therning
wrote: Taking it one step further:
• Replace archhaskell/habs with a single version-controlled file containing tuples of
. • Make use of bauerbill's already existing support for hackage. (I don't know anything about the internals of bauerbill, but it might need some extending to closer match what cabal2arch does.) I pushed several additions to the archlinux library, and wrote a small script that sketches this part of the procedure. First some comments : * raw Cabal files from 00-index.tar.gz cannot be converted right away to PKGBUILDs, thay must first go through a configuration procedure, which is implemented in Cabal2Arch: I copied the relevant code to the function Distribution.Archlinux.CabalTranslation.preprocessCabal
Ah, yes, that's the finalisation process, basically nailing down some stuff in order to make the cabal stuff concrete. A short description can be found at [1].
* I added a module Distribution.ArchLinux.HackageTranslation which gives two main functions: - getCabalsFromTarball turns a tarball (ByteString) to a list on GenericPackageDescription - getSpecifiedCabalsFromTarball takes a [String] as additional argument to extract only several items The [String] argument is a set of lines like in the attached PKGLIST file.
Cool. Are you also going to make cabal2arch use those new functions?
GenericPackageDescription variables can be fed to preprocessCabal and cabal2pkg to produce PKGBUILD
See the attached script for an example.
I have to say I love the steady improvements on our tools. I would really like to make a new release soon though, especially due to the recent addition of 'packages()' in the generated code. Remi, Peti, are there any outstanding stuff you would like to get in before we bump the version number and push a new release to Hackage? /M [1] http://therning.org/magnus/archives/514 -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

On 2010/10/17 Magnus Therning
* I added a module Distribution.ArchLinux.HackageTranslation which gives two main functions: - getCabalsFromTarball turns a tarball (ByteString) to a list on GenericPackageDescription - getSpecifiedCabalsFromTarball takes a [String] as additional argument to extract only several items The [String] argument is a set of lines like in the attached PKGLIST file.
Cool. Are you also going to make cabal2arch use those new functions?
I pushed the last changes I wanted to make. I left the semantics of cabal2arch unchanged so that any AUR helper that uses it may continue working. I added a script "manycabal2arch" to cabal2arch that makes use of the HackageTranslation module to produce a whole ABS tree from 00-index.tar and a given list of packages. I have no other features to add for the next release. -- Rémy.

Hi Magnus,
Remi, Peti, are there any outstanding stuff you would like to get in before we bump the version number and push a new release to Hackage?
there's no need to way for me: as far as I'm concerned, we can make a new release. I bumped the Cabal version numbers in Git a few days ago already, by the way. Can whoever publishes the release please create a corresponding tag in the git repository, say 'v0.7.5' for cabal2arch 'v0.3.4' for archlinux? Take care, Peter

On 18/10/10 17:51, Peter Simons wrote:
Hi Magnus,
Remi, Peti, are there any outstanding stuff you would like to get in before we bump the version number and push a new release to Hackage?
there's no need to way for me: as far as I'm concerned, we can make a new release. I bumped the Cabal version numbers in Git a few days ago already, by the way.
Can whoever publishes the release please create a corresponding tag in the git repository, say 'v0.7.5' for cabal2arch 'v0.3.4' for archlinux?
Tagged and pushed. I'm uploading them to Hackage and AUR as well. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe
participants (4)
-
Don Stewart
-
Magnus Therning
-
Peter Simons
-
Rémy Oudompheng