Re: [arch-haskell] Thoughts on Procedure

15 Oct 2010

      On Thu, Oct 14, 2010 at 20:04, Peter Simons  wrote:
...
Hi guys,
in my understanding, our current update procedure works like this:
 1) We notice that a package was updated (or added) on Hackage by means
   of RSS.
 2) A maintainer runs cabal2arch to generate an updated PKGBUILD.
 3) If the generated PKGBUILD looks good, the file is committed to the
   Git repository and uploaded to AUR.
There are a few things worth noting about that procedure:
 - A maintainer must perform 1 manual step per updated package: that is
  linear complexity O(n).
 - There is no mechanism to guarantee that the updated set of PKGBUILD
  files actually works.
 - It's common practice to use version control systems like Git to track
  original source code. Our setup, however, tracks generated files: the
  PKGBUILDs are produced automatically by cabal2arch. So why do we
  track them? Shouldn't we rather track the Cabal files?
Naturally, one wonders how to improve the update process. There are a
few possible optimizations:
 - The simplest way to verify whether all PKGBUILDs compile is to, well,
  compile them. Given a set of updated packages, all packages that
  directly or indirectly depend on any of the updated packages need
  re-compilation, and the current set of PKGBUILDs is to be considered
  valid only if all those builds succeed.
 - It is possible to download the entire state of Hackage in a single
  tarball. Given all the Cabal files, a Makefile can automatically
  re-generate those PKGBUILDs that need updating. The same Makefile can
  also run the necessary builds, and it also perform the necessary
  uploads to AUR.
Based on these thoughts, I would like to propose an improved procedure
for discussion. Let our Git repository track a set of Cabal files. Then
an update would work like this:
 1) A maintainer downloads
     http://hackage.haskell.org/packages/archive/00-index.tar.gz
   and extracts the Cabal files into a checked-out Git repository.
 2) Optionally, inspect changes with "git status" and "git diff".
 3) Run "make all" to re-build all PKGBUILD files that need updating.
 4) Run "make check" to perform all necessary re-builds of binary
   packages. If all builds succeed, proceed with (5). Otherwise, figure
   out which package broke the build and revert the changes in the
   corresponding Cabal file. Go back to (3).
 5) Run "make upload" and "git commit" the changes.
Now, this procedure is supposed to update AUR, but "make upload" can be
easily extended to copy the generated packages into a binary repository
as well.
The worst case scenario occurs when every single available update breaks
during "make check". In that case, the procedure has linear complexity
O(n). The best case scenario, on the other hand, is the one where every
single update succeeds. That case is handled by running "make all &&
make check && make upload", which gives constant complexity O(1).
More importantly, however, the "make check" phase would guarantee that
we never ever publish a configuration that doesn't compile.
How do you feel about the idea?
Taking it one step further:

• Replace archhaskell/habs with a single version-controlled file
containing tuples of .
• Make use of bauerbill's already existing support for hackage.  (I
don't know anything about the internals of bauerbill, but it might
need some extending to closer match what cabal2arch does.)

Then the process would be:

1. Monitor the RSS feed from hackage.
2. Modify the relevant tuples in the file.
3. Based on 'git diff' run bauerbill on the updated packages.
4. Find the dependants, and re-build them.
5. If all is well upload to AUR or the binary repo.
6. Rinse and repeat.

All steps could then be wrapped up in a makefile.  Furthermore,
bauerbill could just have knowledge of the control file we maintain,
and then step 5 can be skipped.

In any case, I feel that the discussion of what to store in our git
repo, whether it's Arch source packages or cabal files or tuples,
isn't that important at this point, i.e. your steps 2-5 are the steps
to concentrate on.  If we are going to attempt maintaining more than a
handful binary packages then we'll get most value out of automating
the time consuming bits of that.  Remy is hard at work on pieces of
that, but there's more to be worked out.

Don't get me wrong, I think it *is* worth discussing what we keep in
our repo but right now that seems to be the least of our problems, and
I think it won't be difficult to switch at a later date.

/M

-- 
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus＠therning．org          Jabber: magnus＠therning．org
http://therning.org/magnus         identi.ca|twitter: magthe