hackage, package list, trac, some suggestions/questions

hope this is the right list for hackage issues?-) 1. hackage trac does not seem to have a "register" option (compare hackage trac with ghc trac). individual logins are nice because: - gives individual rather than anonymous reporters - one place less to support spam email harvesters (account name instead of full email address) - "register" is always there, hence easy to find, unlike the hint hidden on the trac home page that hackage is still abusing the haskell' trac.. (i had started this email before i found that hint;-) 2. the hackage package list ought to have a link to an alphabetical index (i often know the package name, but not the likely categories, so i tend to 'search' on that page..). 3. the hackage package list ought to list successful and failed builds (simply a list of compiler versions, each one green or red, depending on success, with direct link to build/ failure log; this would fit into the one-line-per-package format) for each entry, and report them to package authors. build failures currently seem to be created automatically where package authors may not notice them? at least, that would explain things like (just examples of things i happened to have looked at, no offence intended;-) - hint: advertised to work with ghc 6.8.x on the same package page that lists a build failure for ghc-6.8 - haskell-src-exts: lists a build failure for ghc-6.8 the first is probably a too optimistic cabal version spec, the second is a haddock issue. but that makes two out of two for package i looked up recently.. 4. is there an cross-package index of builds/failures, so that one might see trends (cabal issues, base package issues, bytestring issues, next big thing issues,..) and statistics (how many hackage packages fail/build, which compiler/ cabal/haddock versions are successful for the largest number of packages, etc.)? 5. generally, i've always thought that hackage works the wrong way round; instead of yet another push-based place for people to put and forget copies of things, it should work as a pull-based cache: - author registers package, with author/maintainer email and package url - hackage retrieves package, tries to build, and either accepts or not, reporting all to author - occasionally, hackage scans for new versions to be retrieved, again notifying authors of uploads and problem reports - the maintainer email on each hackage package page ought to be (a) protected, not plain text (b) annotated with last successful contact date if i understand #243, authors are not even notified when someone uploads their packages? claus

On Fri, May 30, 2008 at 03:23:06PM +0100, Claus Reinke wrote:
1. hackage trac does not seem to have a "register" option (compare hackage trac with ghc trac). individual logins are nice because: - gives individual rather than anonymous reporters - one place less to support spam email harvesters (account name instead of full email address) - "register" is always there, hence easy to find, unlike the hint hidden on the trac home page that hackage is still abusing the haskell' trac.. (i had started this email before i found that hint;-)
Yes. In the meantime, feel free to file feature requests as guest.
4. is there an cross-package index of builds/failures, so that one might see trends (cabal issues, base package issues, bytestring issues, next big thing issues,..) and statistics (how many hackage packages fail/build, which compiler/ cabal/haddock versions are successful for the largest number of packages, etc.)?
No, but here are some rough figures on failures (for the latest version of each package): 7 configure: prerequisite packages missing or not built 22 configure: other (usually a custom Setup.hs) 6 build: header file for some C library not found on the build system 46 build: a package dependency was not listed (often a base split issue) 14 build: a module was omitted from the package 38 build: other 22 haddock failure 4 install failure It is surprising how many would surely have failed on the maintainer's machine if they had been tested.
5. generally, i've always thought that hackage works the wrong way round; instead of yet another push-based place for people to put and forget copies of things, it should work as a pull-based cache:
It's a place for maintainers to put their releases (there may not even be a another place to pull from). It seems to me entirely appropriate that maintainers should assume responsibility for that.
- the maintainer email on each hackage package page ought to be (a) protected, not plain text (b) annotated with last successful contact date
The email address on the package is merely copied from the .cabal file, which is also publically available. If a maintainer wants to hide this, they need to obfuscate it in the .cabal file.

Yes. In the meantime, feel free to file feature requests as guest.
done. though i notice that uninstall (#106, now #234), which i considered essential, is still lingering there a year later..
No, but here are some rough figures on failures (for the latest version of each package):
thanks, that was interesting. btw, i fully expect such statistics to reflect infrastructure issues at least as much as package author issues, including overlap between the two where authors try to cope with infrastructure hickups. so, having such statistics in generated in a prominent place would both - alert package authors and users - alert cabal/hackage/haddock/ghc/.. maintainers
5. generally, i've always thought that hackage works the wrong way round; instead of yet another push-based place for people to put and forget copies of things, it should work as a pull-based cache:
It's a place for maintainers to put their releases (there may not even be a another place to pull from). It seems to me entirely appropriate that maintainers should assume responsibility for that.
i had forgotten that hackage was the first to provide a space for authors without webspace access. but these days, there is code.haskell.org, and that seems to be a much better home for any package than a tarball described by a .cabal file. why should all releases be centralised? why should package authors have to remember to think of copies on hackage? and why should hackage tarballs be the only release venue?? authors/maintainers should assume responsibility for coding, registering packages with hackage, and indicating when a package is in a release state, so that hackage and alternative clients can fetch updated packages and do their thing. but if no authors chime in, there's probably no case yet against this "hackage owns everything" approach..
- the maintainer email on each hackage package page ought to be (a) protected, not plain text (b) annotated with last successful contact date
The email address on the package is merely copied from the .cabal file, which is also publically available. If a maintainer wants to hide this, they need to obfuscate it in the .cabal file.
good point. as .cabal files are served as text, they might be scanned as well. still, i'd think that hackage should do as good mailing list archives do: obfuscate email addresses to make harvesting more difficult, without users having to obfuscate things by hand. if someone figures out how to install the package descriptions on their own machine for harvesting, there isn't much to be done about that, just as with harvesters registering on mailing lists. but one shouldn't make it too easy.. claus

On Sat, 2008-05-31 at 00:21 +0100, Claus Reinke wrote:
Yes. In the meantime, feel free to file feature requests as guest.
done.
Great, thanks.
though i notice that uninstall (#106, now #234), which i considered essential, is still lingering there a year later..
Make sure you note that is is essential for you on that ticket. We do need help from our user base to help us rank the development priorities. We only have limited developer time.
No, but here are some rough figures on failures (for the latest version of each package):
thanks, that was interesting. btw, i fully expect such statistics to reflect infrastructure issues at least as much as package author issues, including overlap between the two where authors try to cope with infrastructure hickups.
so, having such statistics in generated in a prominent place would both
- alert package authors and users - alert cabal/hackage/haddock/ghc/.. maintainers
I agree such information should be given to package authors/maintainers but I'd be a bit reluctant to see that done with the current information. Package authors would probably not appreciate being told their package does not work for essentially incorrect reasons and we'd have to spend a lot of time fixing those things (like missing libs, incorrect build orders). I am working on fixing some of theses issues, but by replacing the current build reporting system.
5. generally, i've always thought that hackage works the wrong way round; instead of yet another push-based place for people to put and forget copies of things, it should work as a pull-based cache:
It's a place for maintainers to put their releases (there may not even be a another place to pull from). It seems to me entirely appropriate that maintainers should assume responsibility for that.
i had forgotten that hackage was the first to provide a space for authors without webspace access. but these days, there is code.haskell.org, and that seems to be a much better home for any package than a tarball described by a .cabal file.
Hackage has always been about a place to release packages not as somewhere to host development repositories. It's based on things like CPAN, linux distro package archives etc. It is not like sourceforge.
why should all releases be centralised?
It provides many advantages. Simply having a central list is very useful. Having a central list along with a tarball and meta-data allows clients like cabal-install. It also gives us the opportunity to automate things like gathering of build results which would be much harder when packages are scattered all across the internet.
why should package authors have to remember to think of copies on hackage? and why should hackage tarballs be the only release venue??
It is not the only release venue. Many packages are released in multiple locations. If package authors do not want to take advantage of releasing via hackage they do not need to do so. Also, anyone can set up a hackage server. There are in-house installations. The benefit of a public hackage server are maximised by having one central one.
authors/maintainers should assume responsibility for coding, registering packages with hackage, and indicating when a package is in a release state, so that hackage and alternative clients can fetch updated packages and do their thing.
Indicating when a package is in a release state is pretty easy: cabal sdist cabal upload dist/foo-1.0.tar.gz
but if no authors chime in, there's probably no case yet against this "hackage owns everything" approach..
I don't see it as owning anything. In particular because it is only a release venue and not a general hosting venue like sourceforge.
The email address on the package is merely copied from the .cabal file, which is also publically available. If a maintainer wants to hide this, they need to obfuscate it in the .cabal file.
good point. as .cabal files are served as text, they might be scanned as well. still, i'd think that hackage should do as good mailing list archives do: obfuscate email addresses to make harvesting more difficult, without users having to obfuscate things by hand.
if someone figures out how to install the package descriptions on their own machine for harvesting, there isn't much to be done about that, just as with harvesters registering on mailing lists. but one shouldn't make it too easy..
File a ticket, suggest an obfuscation method. Duncan

On Sat, May 31, 2008 at 12:21:21AM +0100, Claus Reinke wrote:
good point. as .cabal files are served as text, they might be scanned as well. still, i'd think that hackage should do as good mailing list archives do: obfuscate email addresses to make harvesting more difficult, without users having to obfuscate things by hand.
if someone figures out how to install the package descriptions on their own machine for harvesting, there isn't much to be done about that, just as with harvesters registering on mailing lists. but one shouldn't make it too easy..
The .cabal file is still inside the tarball, and unpacking archives is a standard thing for web traversers to do -- no installation necessary. The only thing that would work is for the upload script to modify the cabal file inside package, which is something I'm reluctant to do, because it intrudes on the domain of the maintainer.

On Sat, 2008-05-31 at 11:40 +0100, Ross Paterson wrote:
On Sat, May 31, 2008 at 12:21:21AM +0100, Claus Reinke wrote:
good point. as .cabal files are served as text, they might be scanned as well. still, i'd think that hackage should do as good mailing list archives do: obfuscate email addresses to make harvesting more difficult, without users having to obfuscate things by hand.
if someone figures out how to install the package descriptions on their own machine for harvesting, there isn't much to be done about that, just as with harvesters registering on mailing lists. but one shouldn't make it too easy..
The .cabal file is still inside the tarball, and unpacking archives is a standard thing for web traversers to do -- no installation necessary. The only thing that would work is for the upload script to modify the cabal file inside package, which is something I'm reluctant to do, because it intrudes on the domain of the maintainer.
And would mess up md5sums etc. Duncan

On Fri, 2008-05-30 at 15:23 +0100, Claus Reinke wrote:
hope this is the right list for hackage issues?-)
1. hackage trac does not seem to have a "register" option (compare hackage trac with ghc trac). individual logins are nice because: - gives individual rather than anonymous reporters - one place less to support spam email harvesters (account name instead of full email address) - "register" is always there, hence easy to find, unlike the hint hidden on the trac home page that hackage is still abusing the haskell' trac.. (i had started this email before i found that hint;-)
We used to have that and got horribly spammed. Apparently for the ghc trac they've worked out how to do that without getting spammed so presumably we could do the same.
2. the hackage package list ought to have a link to an alphabetical index (i often know the package name, but not the likely categories, so i tend to 'search' on that page..).
Yeah, I always use my browser's in-page "just start typing" search. The longer term plan is to use hoogle as the primary interface so it can search on package name, package meta-data and of course the content, api and docs.
3. the hackage package list ought to list successful and failed builds (simply a list of compiler versions, each one green or red, depending on success, with direct link to build/ failure log; this would fit into the one-line-per-package format) for each entry, and report them to package authors.
It's harder than it looks. The build results from the server builds itself are not very accurate reflections of the real status. That's partly why we do not yet attempt to email maintainers about the results. For example the fact that the build server does not have most C libs installed means that lots of FFI binding packages and all their dependents fail. It also suffers from the diamond dep problem. Also it only reflects one particular operating system and configuration of each package. The plan is to get build results from users. Though then we have to do some statistical analysis to discover if a package works with various versions of compilers and on different OSs etc. http://hackage.haskell.org/trac/hackage/ticket/184
build failures currently seem to be created automatically where package authors may not notice them? at least, that would explain things like (just examples of things i happened to have looked at, no offence intended;-)
- hint: advertised to work with ghc 6.8.x on the same package page that lists a build failure for ghc-6.8 - haskell-src-exts: lists a build failure for ghc-6.8
the first is probably a too optimistic cabal version spec, the second is a haddock issue. but that makes two out of two for package i looked up recently..
4. is there an cross-package index of builds/failures, so that one might see trends (cabal issues, base package issues, bytestring issues, next big thing issues,..) and statistics (how many hackage packages fail/build, which compiler/ cabal/haddock versions are successful for the largest number of packages, etc.)?
That's the kind of information we should be able to gather once we have clients report build results to the server.
5. generally, i've always thought that hackage works the wrong way round; instead of yet another push-based place for people to put and forget copies of things, it should work as a pull-based cache:
- author registers package, with author/maintainer email and package url - hackage retrieves package, tries to build, and either accepts or not, reporting all to author - occasionally, hackage scans for new versions to be retrieved, again notifying authors of uploads and problem reports
I'd tend to disagree. I think it's better for authors/maintainers to make releases on hackage when they think it's right.
- the maintainer email on each hackage package page ought to be (a) protected, not plain text
Would you like to file a ticket about this.
(b) annotated with last successful contact date
Hmm, how do you think that'd work? Sounds tricky to automate. Anyway, we rather hope that it is the author themselves or their delegated release manager who are making the releases so there's no need for a system to automatically contact authors.
if i understand #243, authors are not even notified when someone uploads their packages?
We generally hope that it is the author that is uploading their package. Duncan

| > 3. the hackage package list ought to list successful and failed | > builds (simply a list of compiler versions, each one green | > or red, depending on success, with direct link to build/ | > failure log; this would fit into the one-line-per-package | > format) for each entry, and report them to package authors. Personally I think that the key innovation on Hackage would be a user feedback system, like the user reviews on Amazon. Users can: - give a mark out of 10 for a package - write a review (perhaps short): eg "a real struggle to build on Solaris" or "worked like a charm" The marks are averaged and displayed, so other users can pick ones with good reviews. Well-engineered packages would stand out. Simon

On Tue, Jun 3, 2008 at 4:24 PM, Simon Peyton-Jones
Personally I think that the key innovation on Hackage would be a user feedback system, like the user reviews on Amazon. Users can:
- give a mark out of 10 for a package - write a review (perhaps short): eg "a real struggle to build on Solaris" or "worked like a charm"
The marks are averaged and displayed, so other users can pick ones with good reviews.
We could also build a recommender system with a little machine learning. "If you liked 'bytestring' you might also like...". ;) -- Johan

On Tue, 3 Jun 2008, Johan Tibell wrote:
On Tue, Jun 3, 2008 at 4:24 PM, Simon Peyton-Jones
wrote: Personally I think that the key innovation on Hackage would be a user feedback system, like the user reviews on Amazon. Users can:
- give a mark out of 10 for a package - write a review (perhaps short): eg "a real struggle to build on Solaris" or "worked like a charm"
The marks are averaged and displayed, so other users can pick ones with good reviews.
We could also build a recommender system with a little machine learning. "If you liked 'bytestring' you might also like...".
... to download dependent packages. :-]

On Fri, 2008-05-30 at 21:52 +0100, Duncan Coutts wrote:
On Fri, 2008-05-30 at 15:23 +0100, Claus Reinke wrote:
hope this is the right list for hackage issues?-)
1. hackage trac does not seem to have a "register" option (compare hackage trac with ghc trac). individual logins are nice because: - gives individual rather than anonymous reporters - one place less to support spam email harvesters (account name instead of full email address) - "register" is always there, hence easy to find, unlike the hint hidden on the trac home page that hackage is still abusing the haskell' trac.. (i had started this email before i found that hint;-)
We used to have that and got horribly spammed. Apparently for the ghc trac they've worked out how to do that without getting spammed so presumably we could do the same.
Now done. Duncan
participants (6)
-
Claus Reinke
-
Duncan Coutts
-
Henning Thielemann
-
Johan Tibell
-
Ross Paterson
-
Simon Peyton-Jones