RE: A Haskell Documentation Standard

Hi folks, Henrik argues in favour of a raw documentation format:
I do not think that the staged approach is absolutely essential: if we just could come up with a good standard for how to write embedded documentation in Haskell, that would be extremely valuable in its own right. But I think there are a number of really compelling reasons for taking the staged approach and also standardize on the "raw" machine-readable, documentation format which would be the output from the first stage.
One reason is that there potentially are a large number of different formats in which one might want to generate documentation. Some might be of general interest, some might have more limited scope. One can even imagine completely different applications in which it is important to have access to "documentation level" information about source code, for example various serach tools. Given a carefully designed "raw" format as a starting point, it is fairly easy to write such tools. There is evidence of this from the past: both the Fudgets documentation tool and the HaskellDoc tool used HBC generated interface files as their "raw" format to good effect. But this also meant that these tools became system specific, and they were also limited by what HBC (ok, Lennart) happened to record in the interface files. E.g. for HaskellDoc, this meant that there are only hyper-links to entire source code files, not to individual functions. (OK, that might be a limitation of HTML as well.) A more recent example, of course, is Jan's source code browser.
Thus I view the "raw" format as a way of getting the benefits from creative use of interface files, while avoding getting tied to and restricted by some particular Haskell compiler.
One could argue that given a freely available, easy-to-use, documentation-extracting Haskell parser, the above is a non issue. Just incorporate that code into your own. But I can see that approach leading to a number of maintenance problems which a well-specified and extensible file format would insulate against. Also, some peolpe might prefer to write their documentation generator in something else than Haskell (assuming that the above-mentioned parser is a piece of Haskell code).
Good point. So, taking the idea of XML as widely implemented machine-readable format, I'd like to propose a low-effort way to achieve this (again, using freely available existing tools): Haskell source + annotations ==> XML rendering ==> docs Stage 1 consists of reading the source and documentation, and outputing the information in XML. The XML might or might not omit the actual source code at this point. This stage can be done by either the compiler (in which case it could fill in any missing type signatures), or it might be done by a HDoc backend which just generated XML. The XML can be generated by HaXml - basically plugging together the Haskell parser and HaXml should give us stage 1. Stage 2 doesn't have to be written in Haskell, since we'll have the DTD for the intermediate format, but if we're writing in Haskell we could again use HaXml with the Haskell abstract syntax data type to read the interface. Furthermore, a tool like HDoc could be written to go straight from Haskell to the documentation without passing through the intermediate format, as Henrik points out. If the embedded documentation is in XML format, this also fits in nicely. This seems neat, useful and not too much effort - what does everyone think? (I have a suspicion that we might need to tweak HaXml to generate a nice-looking DTD from the abstract syntax). ...--------... On a slightly higher level, I should say something about my motivations and goals for this project. Another thing to come out of the recent implementors' meeting was a proposal for a new module namespace and a set of libraries for Haskell (from Malcolm Wallace); we're going to start discussing this soon. With a large set of interconnected libraries, automatically-generated documentation becomes essential. Also, much of the code we have already has no documentation, or it is in differing formats. Being able to generate hyperlinked, indexed documentation from raw source code will be a real win, before we gradually incorporate the existing documentation back into the source. Lots of people moan about the lack of actual documentation for the Prelude beyond the source code in the report. Having a way to take the prelude source files, with comments appropriately turned into documentation tags, and generate hyperlinked documentation will address much of this criticism - and I imagine most of us would find it useful, I certainly would. I'm less concerned for now about the intermediate format, because the above goals can be achieved with a single tool. But after all, the two issues are largely orthogonal - having a tool which can generate documentation from source doesn't preclude also having a well-understood intermediate format. Cheers, Simon
participants (1)
-
Simon Marlow