RE: A Haskell Documentation Standard

Jan Sibinki writes:
Am I missing something here? I thought that the Haskell Report does not the comments semantics. They are the free floating entities, and can be put anywhere as it pleases a module developer. And we can have many possible types of comments describing: module, datatype, class, function, etc. Add to it some important decorations, delimiting some categories of functions, and we are in a complete mess.
Granted the variety of styles used, how on earth even the cleverest parser can figure it out? Typical parsers (I am not talking here about HDoc) do not care because they skip the comments anyway but it is not the case for the documentation extractors.
Sorry for not being clear about this. You're right in that trying to understand arbitrary comments in Haskell source isn't workable. The documentation annotations must be in a special format that the documentation tool can understand. eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's pragma convention, like this: {-# DOC f <desc><id>f</id> turns people into frogs</desc> <arg><id>x</id> A <type>Person</type></arg> <ret>A <type>Frog</type></ret> #-} f :: Person -> Frog f x = ... with similar annotations for classes, instances, datatypes, newtypes etc. There's no requirement that the documentation appears directly before the source code for the function; since it contains the identifier of the entity being documented, it can be placed anywhere (even in a different file). The markup format for the documentation is of course up for discussion. XML seems plausible but verbose. If I understand correctly, I think you were proposing a two stage process to get the documentation (similar to the Eiffel approach?): Haskell source --> interface ---> on-line documentation `--> printed documentation ..... Why not do it in one? Haskell source ---> on-line documentation `--> printed documentation .... Cheers, Simon

On Wed, 31 Jan 2001, Simon Marlow wrote:
Sorry for not being clear about this. You're right in that trying to understand arbitrary comments in Haskell source isn't workable. The documentation annotations must be in a special format that the documentation tool can understand.
eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's pragma convention, like this:
{-# DOC f <desc><id>f</id> turns people into frogs</desc> <arg><id>x</id> A <type>Person</type></arg> <ret>A <type>Frog</type></ret> #-} f :: Person -> Frog f x = ...
Machine-wise, this is an excellent format, because it not only signifies that this comment is important (as HDoc does by using triple dashes) but also associates the comment with the specific entity (as I was discussing it in the previous post). Easy to parse, no ambiguities. Human-wise, this is a terrible thing. I would never like to read sources written this way, nor to produce them by hand this way.
The markup format for the documentation is of course up for discussion. XML seems plausible but verbose.
Now take a look at this disciplined ascii version: f :: Person -> Frog f x -- A frog made from a person 'x' = ..... (Note the I always specify the result by placing it up front of the sentence. No need for <ret>, no need for types to be told twice.) or at a more complex example to make even a stronger point: f :: Person -> Frog -> Bool f person y -- True if a 'person' can be turned -- to a frog -- where -- y is a frog -- = .... I am sure there is nothing in your XML version that I did not explain in plain English of my versions - unless you want to cross-reference all the frogs, persons and booleans (which would not make any sense to me). But the the major difference between the two is such that I can still read the source files with ease. This is similar to the Eiffel style of documentation: you extract the signature, the left hand side of function definition and the comments. You place them in your interface (which can be pretty printed in any format) in exactly this order - from most general info to the most detailed explanation. Comments can refer to arguments by their names for clarity. And a function comment clearly becomes a part of a function definition. This positional method has few drawbacks however. First, this specific order cannot be applied to functions with multiple equations (the order: "signature, comment, equations" looks better in such cases), although it is still fine with guards. Secondly, this could open a can of protests about the order requirements. Thirdly, it requires a lot of self discipline. However, the final result definitely exceeds other styles -- readability-wise. But I am not trying to promote this style, I am just pointing out some more readable alternatives to your original version. For example, this would do equally well: f :: Person -> Frog f x {-# Doc f -- Frog from person 'x' -#} = ..... and could be pretty printed, with pragmas removed. But --as a developer -- I would still hate reading and writing those pragma tokens, especially the complex thingies -- as pointed by Armin in another post .... unless I would never see them anytime during the development cycle. In theory it is possible to have both worlds with the help of tools: you write a single function in plain English, annotate it (or not - depending on the tool), and it comes back again, but prettified. So if we want both worlds then we better provide good tools and then convince the community that this is the way to work with Haskell.
If I understand correctly, I think you were proposing a two stage process to get the documentation (similar to the Eiffel approach?):
No, the numbered list in my previous post did not represent any order of steps, but some options I was musing about. Jan

On Wed, 31 Jan 2001, Simon Marlow wrote:
eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's pragma convention, like this:
{-# DOC f <desc><id>f</id> turns people into frogs</desc> <arg><id>x</id> A <type>Person</type></arg> <ret>A <type>Frog</type></ret> #-} f :: Person -> Frog f x = ...
with similar annotations for classes, instances, datatypes, newtypes etc. There's no requirement that the documentation appears directly before the source code for the function; since it contains the identifier of the entity being documented, it can be placed anywhere (even in a different file).
That means that classes and instances (and their functions) have to be distinguished by giving the complete type of the class / instance declaration, right? E.g. class X a b where f :: a -> b instance X Person Frog where f x = ... How do we avoid that the tool confuses the two version of "f" ? An obvious way would be {-# DOC f INSTANCE X Person Frog ... #-} {-# DOC f CLASS X ... #-} (HDoc required similar "help" in early versions, but I changed that in favour of considering positional information to reduce the redundancy required in the annotations.) I guess additional annotations are an unavoidable drawback when not relying on positional information. On the other hand, being able to put the documentation in different files may be a big advantage. So, should we allow both variants? I.e. use positional information when the pragma happens to be next to a class/instance declaration (or a function therein) and rely on extra information (like "CLASS X") in the other case?
The markup format for the documentation is of course up for discussion. XML seems plausible but verbose.
As flexibility should be a priority here (we want to produce many different output formats, right?), I think XML is verbose, but not too verbose. I don't see an alternative format which is significantly less verbose. And, there's HaXml which could do a good job at processing the documentation (I haven't had a very close look at HaXml, yet). Regards, Armin

Hi all, It seems as the discussion has got going and that there are plenty of ideas around. Good! I'll get back with something more substantial in a bit. Trying to structure what's been discussed so far so as to get a clearer picture of the options is probably a good thing, and I also think we should focus a bit more on the goal and design principles before getting too involved in the details of different formats and such. Anyway, just a remark on what Simon Marlow wrote:
If I understand correctly, I think you were proposing a two stage process to get the documentation (similar to the Eiffel approach?):
Haskell source --> interface ---> on-line documentation `--> printed documentation .....
Why not do it in one?
Haskell source ---> on-line documentation `--> printed documentation
I don't know if "you" above referred to me, but that was indeed part of what I suggested at the HI workshop. I do not think that the staged approach is absolutely essential: if we just could come up with a good standard for how to write embedded documentation in Haskell, that would be extremely valuable in its own right. But I think there are a number of really compelling reasons for taking the staged approach and also standardize on the "raw" machine-readable, documentation format which would be the output from the first stage. One reason is that there potentially are a large number of different formats in which one might want to generate documentation. Some might be of general interest, some might have more limited scope. One can even imagine completely different applications in which it is important to have access to "documentation level" information about source code, for example various serach tools. Given a carefully designed "raw" format as a starting point, it is fairly easy to write such tools. There is evidence of this from the past: both the Fudgets documentation tool and the HaskellDoc tool used HBC generated interface files as their "raw" format to good effect. But this also meant that these tools became system specific, and they were also limited by what HBC (ok, Lennart) happened to record in the interface files. E.g. for HaskellDoc, this meant that there are only hyper-links to entire source code files, not to individual functions. (OK, that might be a limitation of HTML as well.) A more recent example, of course, is Jan's source code browser. Thus I view the "raw" format as a way of getting the benefits from creative use of interface files, while avoding getting tied to and restricted by some particular Haskell compiler. One could argue that given a freely available, easy-to-use, documentation-extracting Haskell parser, the above is a non issue. Just incorporate that code into your own. But I can see that approach leading to a number of maintenance problems which a well-specified and extensible file format would insulate against. Also, some peolpe might prefer to write their documentation generator in something else than Haskell (assuming that the above-mentioned parser is a piece of Haskell code). Another reason for why I like the staged approach is that a Haskell compiler conceivably could perform the first stage. Now, I know that this was not to everyones liking. And I'm absoluetly not saying that we in any way should require a Haskell compiler to do this work, or that we should rule out stand-alone tools. But I think there are quite a few good reasons for why one might want to do it that way, and thus I belive it is good if the documentation standard is such that it is possible to do so. Thus I'm saying that I think there should be two related parts of the standard: one for how to write documentation embedded in Haskell source, and one for representing collected documentation in a stand-alone, machine-friendly, format. I'm not saying that we should require tools to work in a staged manner. E.g. I can easily see an augmented HDoc emitting the raw format for the benefit of other tools, as well as keeping its current HTML-emitting capabilities. [Incidentally, note that documentation extracting compilers is not a new thing. For example, I know of two different C/C++ compilers which supported source code browsing by extracting information from source code and storing it in special files.] Finally, in defining the "raw" format, we will have to decide on exactly what Haskell documentation essentially is. I think that will be very helpful during the standardization process. I also think this will be very helpful later for people writing document formating tools. Of course, there are many ways to achieve this. But specifying the context-free syntax of "raw" documentation seems to me to be a good way. Assuming that there will be such a thing as the raw format, then the question is what that format should look like. Simon suggested that it should look like Haskell + pragma-style comments. This is certainly an interesting idea with a number of merits. On the other hand, XML is gaining wide-spread acceptance as a standard on which various exchange formats are based. This means that there are quite a few tools out there that might be put to useful use, including at least one Haskell library, as Armin mentioned. Also, the fact that XML currently IS used for things similar to what we'd like to do, might mean that we can avoid a number a pitfalls by sticking to the standard. But as I said earlier, detailed format discussions can probably wait a little. Best regards, /Henrik -- Henrik Nilsson Yale University Department of Computer Science nilsson@cs.yale.edu

I sympathize with Henrik idea about the developing the "raw", rich, machine interface standard. I appreciate it because I already experienced the impact of incompatibilities on development of Haskell Module Browser. I use NHC and Hugs interfaces as helpers an guidelines even though I do extract other information directly from sources. From this perspective I consider it one of the priorities, especially because Henrik made me realize that the incompatibilities could multiply when an implementor decided one day to switch to a new format. And this looks quite probable - vide the announcement of the new version of Hugs. It appears that Johan Nordlander is taking over the Hugs maintenance. Examples of incompatibilities between NHC and Hugs interfaces are numerous. One good example is different representation of function signatures: f :: a -> a -> Int - Hugs f :: (a -> (a -> Prelude.Int)) - NHC Jan
participants (4)
-
Armin Groesslinger
-
Henrik Nilsson
-
Jan Skibinski
-
Simon Marlow