Package documentation complaints -- and a suggestion

Hi all Following a link from the Yesod book, I arrived at [1], curious to find out what groundhog was. Once there, I learned... nothing: "This library provides just the general interface and helper functions. You must use a specific backend in order to make this useful." [1] http://hackage.haskell.org/package/groundhog Hoping to find more, I clicked on the top-level module. Once again, the text here was uninformative, though from the example it seems to be something to do with databases (and it is to be applauded for including an example in the docs). So, package authors: PLEASE, tell me what your package does in the description. Tell me *again* in the top-level haddocks (I may have come directly there by a link). And then, tell me where to look for more information. Instead of "This module defines the functions and datatypes used throughout the framework. Most of them are for internal use", tell me which are *not* for internal use. Write documentation for people who don't know how to use your package. Some people prefer to have this outside of the haddocks (I don't like this because papers and blogs are less likely to stay up to date than what is embedded in the file, but for tutorials etc. a website might be more appropriate) but if so link to it from both the Haddocks and the package description. But I also have a concrete suggestion for Hackage: include the package synopsis on the package's page. The distinction between synopsis and description can be confusing, and sometimes it seems to violate DRY to have the same info in both. To the author of groundhog: I hope you are not offended by my picking on your package. It's not the only culprit -- just the one that pushed me over the edge. Of course, I would be pleased if this email encouraged you to fix it, but I am addressing this rant to the Haskell community because the problem is a cultural one. --Max

The package summary is "Type-safe ADT-database mapping library.", which gives some idea about what it does. In my experience, any package that starts its source files with {-# LANGUAGE GADTs, TypeFamilies, ExistentialQuantification, StandaloneDeriving, TypeSynonymInstances, MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances, FlexibleContexts, OverlappingInstances, ScopedTypeVariables, GeneralizedNewtypeDeriving, UndecidableInstances, EmptyDataDecls #-} is probably an experiment in what is possible, rather than a production-friendly library. Many people upload experimental packages to Hackage so that they can be used by other interested people, even though the packages are not ready/intended for mass consumption. A lack of documentation in such cases is understandable. I wonder if it would be worth giving package uploaders control over whether their packages are shown on the package list? Packages can be manually hidden by emailing an admin, but that's a lot of trouble.

On Mon, Oct 10, 2011 at 03:17, John Millikin
The package summary is "Type-safe ADT-database mapping library.", which gives some idea about what it does.
Whence my suggestion to show this on the package's page. Perhaps I shouldn't have hidden that at the bottom -- I meant this as my main point, and I'm afraid I got a little side-tracked.
In my experience, any package that starts its source files with
{-# LANGUAGE GADTs, TypeFamilies, ExistentialQuantification, StandaloneDeriving, TypeSynonymInstances, MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances, FlexibleContexts, OverlappingInstances, ScopedTypeVariables, GeneralizedNewtypeDeriving, UndecidableInstances, EmptyDataDecls #-}
is probably an experiment in what is possible, rather than a production-friendly library.
An experiment that I was interested in, and hoped to find out more about. But anyway, I see your point.
Many people upload experimental packages to Hackage so that they can be used by other interested people, even though the packages are not ready/intended for mass consumption. A lack of documentation in such cases is understandable.
Some way of documenting this fact would, however, be helpful.
I wonder if it would be worth giving package uploaders control over whether their packages are shown on the package list? Packages can be manually hidden by emailing an admin, but that's a lot of trouble.
In this case I followed an external link, so that would not have helped me. There is the "stability" field, which has an "experimental" value, but it's not at all clear what the different values mean other than "stable". It is fair that some packages on Hackage are not intended for human consumption. Perhaps this is caused in part by having our package installer and humans looking in the same place for information about Haskell libraries. But I think we can do a better job of distinguishing these packages. Perhaps a "visibility" or "release-status" field? --Max

Max Rabkin writes:
But I also have a concrete suggestion for Hackage: include the package synopsis on the package's page. The distinction between synopsis and description can be confusing, and sometimes it seems to violate DRY to have the same info in both.
You may have missed the header on the package page (dark line at the top). The distinction between synopsis and description is borrowed from the Debian package format: http://www.debian.org/doc/debian-policy/ch-binary.html#s-descriptions The two fields are aimed at different audiences. A Synopsis trying to do double duty as the beginning of a general package description won't work as well as a stand-alone summary for package lists, etc.

On Mon, Oct 10, 2011 at 10:06, Paterson, Ross
Max Rabkin writes:
But I also have a concrete suggestion for Hackage: include the package synopsis on the package's page. The distinction between synopsis and description can be confusing, and sometimes it seems to violate DRY to have the same info in both.
You may have missed the header on the package page (dark line at the top).
I did indeed. Perhaps it should be bigger? I've just opened up Synaptic, and it is indeed separate from the description, but there the synopsis is used as a heading for the description, and it's the biggest thing on the screen (whereas Hackage uses package name).
The distinction between synopsis and description is borrowed from the Debian package format:
http://www.debian.org/doc/debian-policy/ch-binary.html#s-descriptions
The two fields are aimed at different audiences. A Synopsis trying to do double duty as the beginning of a general package description won't work as well as a stand-alone summary for package lists, etc.
Good point. On the other hand, nobody points package authors to the Debian documentation (and Debian also has review for newly uploaded packages, as far as I know). --Max

Good point. On the other hand, nobody points package authors to the Debian documentation (and Debian also has review for newly uploaded packages, as far as I know).
Re: review process -- Perhaps there would be a use for a review process somewhere between haskell-platform and the unwashed masses? HP covers a very small percentage of packages, but a larger percentage could probably pass some kind of review akin to the debian process. And it would be a good forcing function to get people to do the things they don't get around to.... -Ryan

On Mon, Oct 24, 2011 at 12:55 PM, Ryan Newton
Good point. On the other hand, nobody points package authors to the Debian documentation (and Debian also has review for newly uploaded packages, as far as I know).
Re: review process -- Perhaps there would be a use for a review process somewhere between haskell-platform and the unwashed masses? HP covers a very small percentage of packages, but a larger percentage could probably pass some kind of review akin to the debian process. And it would be a good forcing function to get people to do the things they don't get around to....
I'm skeptical. We seem to have trouble getting enough of people's
spare time to tackle interesting engineer work, let alone relatively
thankless administrative/bureaucratic/procedural work. If people are
going to devote time towards solving this particular problem (poorly
documented libraries), an interesting first step would be to try
solving the problem using technological means, i.e.: prohibit or
otherwise discourage uploads to hackage that fail
automatically-verifiable criteria here.
Examples could include: "Your package lacks a description", "more than
X% of your modules lack toplevel module comments", "fewer than Y% of
your toplevel exports have haddock comments", etc... Packages with
stability=experimental would probably be exempt from the requirements.
Duncan could probably comment authoritatively, but I'm guessing
Hackage 2 might provide a better framework for tackling these kinds of
policy issues, because it would probably allow you to e.g. filter the
package list by stability.
G
--
Gregory Collins

On 10/24/11 12:34 PM, Gregory Collins wrote:
Examples could include: "Your package lacks a description", "more than X% of your modules lack toplevel module comments", "fewer than Y% of your toplevel exports have haddock comments", etc... Packages with stability=experimental would probably be exempt from the requirements.
I'm not so sure about that exemption. The "experimental" stability level seems to be the norm on Hackage and often means "I use this for real projects, but because I use it for real projects I'm not quite willing to hammer the API in stone just yet". Surely we should distinguish this level of stability from "no seriously I'm just goofing around with category theory", but unfortunately both classes of project are called "experimental". While the latter may deserve a pass (to encourage goofing around with category theory :), the lack of documentation for the former seems to me like the main motivation for instituting such an automatic system in the first place. Before dealing with automatic documentation requirements, perhaps it'd be better to develop a standard consensus on the terms used in the stability field and actively advocating for people to adopt it, as was done with the PVP. -- Live well, ~wren

On 25 October 2011 13:34, wren ng thornton
Before dealing with automatic documentation requirements, perhaps it'd be better to develop a standard consensus on the terms used in the stability field and actively advocating for people to adopt it, as was done with the PVP.
+1, not to mention all the to-level fields available in Haddock (portable, etc.) as well. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Tue, Oct 25, 2011 at 4:34 AM, wren ng thornton
I'm not so sure about that exemption. The "experimental" stability level seems to be the norm on Hackage and often means "I use this for real projects, but because I use it for real projects I'm not quite willing to hammer the API in stone just yet".
...
Before dealing with automatic documentation requirements, perhaps it'd be better to develop a standard consensus on the terms used in the stability field and actively advocating for people to adopt it, as was done with the PVP.
I think there's no need to cajole people into it -- if Hackage 2 puts
"stable" packages on a different / better list, there's your social
pressure. Right now the stability flag in the .cabal file, as you
pointed out, is almost completely content-free.
G
--
Gregory Collins

On 25 October 2011 18:54, Gregory Collins
On Tue, Oct 25, 2011 at 4:34 AM, wren ng thornton
wrote: I'm not so sure about that exemption. The "experimental" stability level seems to be the norm on Hackage and often means "I use this for real projects, but because I use it for real projects I'm not quite willing to hammer the API in stone just yet".
...
Before dealing with automatic documentation requirements, perhaps it'd be better to develop a standard consensus on the terms used in the stability field and actively advocating for people to adopt it, as was done with the PVP.
I think there's no need to cajole people into it -- if Hackage 2 puts "stable" packages on a different / better list, there's your social pressure. Right now the stability flag in the .cabal file, as you pointed out, is almost completely content-free.
Right, but first we need to define what all those terms _mean_... and it's no good saying your package is "stable" if you change the API in a large-scale fashion every release. Also, by promoting packages that are self-picked as stable, this could stop people from picking a better package just because the maintainer is honest enough to state that they're still working on it... I mean, if base and containers keep changing, what can we _really_ say is a stable package? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Ivan Lazar Miljenovic
Right, but first we need to define what all those terms _mean_... and it's no good saying your package is "stable" if you change the API in a large-scale fashion every release.
I think there are better criteria to use, like: - do exported definition have Haddock comments? - does the package have an automated test suite? - is the package used by other packages? - ...by different authors? - has the package been recently updated? I'm sure there are other things as well that could be added. If this could be automatically checked, and displayed alongside the package name on Hackage (perhaps as adding one star per checklist item), it would encourage authors to actually improve their packages, rather than just label them "stable". -k -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, Oct 25, 2011 at 11:17, Ketil Malde
Ivan Lazar Miljenovic
writes: Right, but first we need to define what all those terms _mean_... and it's no good saying your package is "stable" if you change the API in a large-scale fashion every release.
I think there are better criteria to use, like:
- do exported definition have Haddock comments? - does the package have an automated test suite? - is the package used by other packages? - ...by different authors? - has the package been recently updated?
This is useful information, but to call it "stability" is not only misleading, but it also prevents the package from using that field to indicate whether or not it is stable! --Max

Max Rabkin
This is useful information, but to call it "stability" is not only misleading, but it also prevents the package from using that field to indicate whether or not it is stable!
Oh, right - I'm not much interested in the stability of a package. What I want to know, is which package to choose for some purpose. By highlighting stuff that is correlated with usefulness, I'll be able to make a quicker, more informed decision. Separating this from stability is a feature, not a bug, since it frees the author to label the package stable or not - instead of encouraging using "stable" to mean "please use". :-) -k -- If I haven't seen further, it is by standing in the footprints of giants

On 25 October 2011 20:17, Ketil Malde
Ivan Lazar Miljenovic
writes: Right, but first we need to define what all those terms _mean_... and it's no good saying your package is "stable" if you change the API in a large-scale fashion every release.
I think there are better criteria to use, like:
- do exported definition have Haddock comments? - does the package have an automated test suite?
What about a test suite that either isn't packaged with the .cabal file or doesn't use Cabal's new test-suite architecture? Does the fact that it _has_ a test suite tell you it's rigorous? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Tue, Oct 25, 2011 at 2:17 AM, Ketil Malde
Ivan Lazar Miljenovic
writes: Right, but first we need to define what all those terms _mean_... and it's no good saying your package is "stable" if you change the API in a large-scale fashion every release.
I think there are better criteria to use, like:
- do exported definition have Haddock comments? - does the package have an automated test suite? - is the package used by other packages? - ...by different authors?
These signals might not apply if the package is primarily a binary.
- has the package been recently updated?
This one is also tricky. A stable package is a good thing! On the other hand, a package that is broken by a new version of ghc and then takes months to be updated is not so great. What matters is maintainer responsiveness and that's not so easily measurable. I feel like use-derived signals are safer. E.g. number of downloads, user ratings, user reviews, depending packages. But that stuff obviously goes in a separate section, not a .cabal field. With ratings or reviews it's tricky because you want to make sure they apply to specific versions, so obsolete complaints about a fixed bug don't hang around forever.

On 10/25/11 3:54 AM, Gregory Collins wrote:
On Tue, Oct 25, 2011 at 4:34 AM, wren ng thornton
wrote: I'm not so sure about that exemption. The "experimental" stability level seems to be the norm on Hackage and often means "I use this for real projects, but because I use it for real projects I'm not quite willing to hammer the API in stone just yet".
...
Before dealing with automatic documentation requirements, perhaps it'd be better to develop a standard consensus on the terms used in the stability field and actively advocating for people to adopt it, as was done with the PVP.
I think there's no need to cajole people into it -- if Hackage 2 puts "stable" packages on a different / better list, there's your social pressure. Right now the stability flag in the .cabal file, as you pointed out, is almost completely content-free.
The problem isn't social pressure to be stable, it's the ambiguity of what "stable" means. If Hackage 2 institutes a policy whereby things claiming to be stable are treated better, then "stable" is likely to become the new "experimental". Just because I call something stable doesn't mean that it is. Just because I give something enough documentation so appease the bots so that I'm allowed to call it stable doesn't mean that it is. Frankly, giving a one-line synopsis of what a function does isn't a high enough barrier to entry to keep someone from abusing the system in order to self-select which index page they get put on. The only way to get a consensus about what "stable", "experimental", etc mean is ...to get a consensus about what they mean. It's exactly the same thing as the PVP: in order to get people to agree about what version increments mean, we need to get them to agree to mean whatever it is everybody else thinks they mean. Automating the verification of that agreement is a nice tool to have to hand, but it's meaningless without the agreement about what we're all engaged in. -- Live well, ~wren

The problem isn't social pressure to be stable, it's the ambiguity of what "stable" means. If Hackage 2 institutes a policy whereby things claiming to be stable are treated better, then "stable" is likely to become the new "experimental".
I'd say, rather than rely on social agreement on what terms mean, let's just collect lots of automated metrics, and present them as extra information on the hackage pages. At work, we have all modules scored by hlint metrics, and doclint metrics. (Doclint complains about modules without a module header comment, and type signatures without haddock comments.) We count infractions and have a "top ten" hall-of-shame, as well as placing the scores in the module documentation itself. We also have a "fingerprint" for every release (basically the API type signatures), and the size of fingerprint-diffs between releases is a rough measure of API-churn. Some of these measures are designed to place social pressure on authors to improve their code/documentation, but they have a dual role in allowing users to get a feel for the quality of the code they are using, without imposing any external hierarchy on which metrics are more important in any given situation. Regards, Malcolm

On Mon, Oct 10, 2011 at 09:06:01AM +0100, Paterson, Ross wrote:
The distinction between synopsis and description is borrowed from the Debian package format:
http://www.debian.org/doc/debian-policy/ch-binary.html#s-descriptions
The two fields are aimed at different audiences.
Not in Debian. The synopsis and description are a bit like the title and the abstract of a scholarly paper: you might see a title without the abstract (and it must work alone), but both are aimed at the same audience - people who are unsure whether they should read the paper (install the package) and look for information sufficient to decide that it's not what they need (or that it probably is). -- Antti-Juhani Kaijanaho, Jyväskylä, Finland http://antti-juhani.kaijanaho.fi/newblog/ http://www.flickr.com/photos/antti-juhani/
participants (11)
-
Antti-Juhani Kaijanaho
-
Evan Laforge
-
Gregory Collins
-
Ivan Lazar Miljenovic
-
John Millikin
-
Ketil Malde
-
Malcolm Wallace
-
Max Rabkin
-
Paterson, Ross
-
Ryan Newton
-
wren ng thornton