
Peter Simons wrote:
I think that we should remove the data files completely from the packages and only host them at a well-known URL (on kiwilight.com or haskell.org).
The notion that those files reside on a remote server concerns me a little, because it means that reproducibility goes, basically, straight out of the window. If you run cabal2arch to generate a PKGBUILD, and then run the exact same command just a few seconds later, it might generate a different PKGBUILD just because some invisible file has changed on a remote server at the other side of the earth. That property feels like it's bound to create surprises for people. Also, this change would reduce the amount of information expressed by cabal2arch's version number even further into the direction of "zero" than it already is.
I agree with these concerns. The files are essentially configuration files for cabal2arch and they should be included in the cabal2arch package. They should also be available online for direct download. I think the best way to do this is to host them separately on-line as already suggested, but they should be versioned. They could then be specified in the "source" array of the cabal2arch PKGBUILD. This would ensure reproducibility in any given version of cabal2arch while still making those files available for general use. The versions of both files could be made to coincide with cabal2arch versions to keep the cabal2arch PKGBUILD simple, e.g.: source=(..., "http://example.com/data/ghc-provides.${pkgver}.txt",...) (I'm assuming that the server would have compression enabled, otherwise I would use an archive.) Otherwise, if we expect to use those files in other packages then perhaps they should receive their own package (similar to the pacman-mirrorlist package). cabal2arch could still specify a version of that package as a dependency, if needed. Regards, Xyne