haddock as a markdown preprocessor

There was a chat today on #haskellhttp://tunes.org/%7Enef/logs/haskell/08.02.20 (15:08 to 16:10) about evolving haddock. I'd like to get comments. The goal is to get the full functionality of a general purpose, programmer-friendly markup language like markdown. One example is image embedding. Another is friendly links (no visible URL). The idea is to make a future haddock be a *preprocessor* that generates pandoc's extended markdown (or some such). Documentation would be mostly markdown, with very few extensions for code documentation ('foo' and " Foo.Bar", maybe a bit more). Most of the doc would simply be passed through untouched. The code-doc extensions would get rewritten into standard markdown and mixed in with the rest. Pandoc could then take the generated markdown and produce HTML, LaTeX, DocBoook XML, etc. Perhaps there will be ways in which markdown falls short in expressiveness. If so, I'm guessing the shortcomings wouldn't be specific to the task of code documentation, and so could be approached as improvements to markdown/pandoc (which is written in Haskell). Since the old and new doc languages would be quite incompatible, we might want to specify in a .cabal file which language to use. Reactions? - Conal

On Wed, 2008-02-20 at 16:43 -0800, Conal Elliott wrote:
There was a chat today on #haskell (15:08 to 16:10) about evolving haddock. I'd like to get comments.
The goal is to get the full functionality of a general purpose, programmer-friendly markup language like markdown. One example is image embedding. Another is friendly links (no visible URL).
To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be to make it easy to extract the docs from haddock in a nice format so that the could be re-used in other contexts rather than just generating html api documentation. Haddock does have support for multiple backends, someone just needs to define and write a generic backend that spits out the info that haddock gathers in a machine readable format. Then people could feed that into whatever other system they like.
Since the old and new doc languages would be quite incompatible, we might want to specify in a .cabal file which language to use.
That's the main thing that worries me. Currently we have the rather nice situation that we have a single standardised markup format that everyone understands. So I very much support the idea of making the markup easier to extract but I think we should be very careful about changing the markup format. The haddock markup format has always been very lightweight and does not assume much about the capabilities of the backend (paper, web, whatever). Duncan

On 21/02/2008, Duncan Coutts
To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be to make it easy to extract the docs from haddock in a nice format so that the could be re-used in other contexts rather than just generating html api documentation. Haddock does have support for multiple backends, someone just needs to define and write a generic backend that spits out the info that haddock gathers in a machine readable format.
I have probably misunderstood both of you, but I think that Conal proposed that Haddock *input* syntax is largely unchanged; Haddock should be able to *output* markdown, for consumption by pandoc. (Which I think is also what you're suggesting.) Alistair

Duncan Coutts wrote:
To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be... a generic backend that spits out the info that haddock gathers in a machine readable format.
Alistair Bayley wrote:
I have probably misunderstood both of you, but I think that Conal proposed that Haddock *input* syntax is largely unchanged; Haddock should be able to *output* markdown, for consumption by pandoc.
Perhaps, but I don't think "markdown", or any other presentation format, is right for that. I'm sure that there are many presentation formats needed by many different people. I think Duncan's point is that haddock only really needs to produce one *generic* output. It should faithfully preserve all of the information that haddock knows how to produce, in a format that is easy to parse. That could then be transformed by other existing tools into whatever you want, including the current HTML/CSS, markdown, or anything else. XML is what people usually use nowadays for that sort of thing, but it doesn't have to be XML. The haddock web site mentions that some work has already been done on DocBook XML; that could work. DITA would perhaps be a better fit. Or we could use our own set of tags. Regards, Yitz

On Thu, Feb 21, 2008 at 12:54 PM, Yitzchak Gale
Duncan Coutts wrote:
To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be...
a generic backend that spits out the info that haddock gathers in a machine readable format.
Alistair Bayley wrote:
I have probably misunderstood both of you, but I think that Conal proposed that Haddock *input* syntax is largely unchanged; Haddock should be able to *output* markdown, for consumption by pandoc.
Perhaps, but I don't think "markdown", or any other presentation format, is right for that.
Markdown is not really a presentation format. It's an authoring format
which "allows you to write using an easy-to-read, easy-to-write plain
text format, then convert it to structurally valid XHTML (or HTML)."
(from http://daringfireball.net/projects/markdown/)
Pandoc apparently generalizes this, allowing you to use the Markdown
syntax to produce other forms of output. I'm not sure what it does
with embedded XHTML, which Markdown allows (and which is necessary if
you want to do things like tables).
Markdown is more powerful than Haddock, and (for me, at least) easier
to read. I'd love to see it used for Haskell code documentation, but I
don't see it happening.
--
Dave Menendez

David Menendez wrote:
Markdown is not really a presentation format. It's an authoring format
Its primary design goal is to be easy to read, not easy to parse. That's why I consider it a presentation format, Anyway, it's not suitable for use as API markup. The whole point is that you want to add metadata indicating how various pieces of your content relate to various pieces of Haskell syntax. You would have to add special markup, in which case you would get, well, Haddock. Or you could extend markdown's embedded HTML facility to accept other tags for that purpose - but then your content would be less readable than Haddock, not more readable. (Though that is the direction taken by C#.) I'm happy with Haddock's input syntax. It's quite readable, and simple enough. And it's similar to many other API markup systems for other languages, so many people feel comfortable with it right from the start. Regards, Yitz

On Thu, 2008-02-21 at 13:12 +0000, Alistair Bayley wrote:
On 21/02/2008, Duncan Coutts
wrote: To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be to make it easy to extract the docs from haddock in a nice format so that the could be re-used in other contexts rather than just generating html api documentation. Haddock does have support for multiple backends, someone just needs to define and write a generic backend that spits out the info that haddock gathers in a machine readable format.
I have probably misunderstood both of you, but I think that Conal proposed that Haddock *input* syntax is largely unchanged; Haddock should be able to *output* markdown, for consumption by pandoc.
(Which I think is also what you're suggesting.)
Yes, I misunderstood, I though Conal was suggesting we extend the haddock input format to allow all the markdown notations. I'd rather not see different packages using different documentation dialects as it makes it much easier for people to contribute if we're all using the same language. I know there is a tension between richer markup for nicer presentation and keeping simple markup for ease of understanding and to present on limited medium like ghci or IDE tooltips. So IMHO we should consider syntactic extensions rather carefully. Though on that topic, we have no consensus as a community about what to use for tutorials or user guides. Consequently there is no support in Cabal etc for those kinds of documentation. GHC, Cabal and c2hs amongst others use docbook but it's a horrible format to write and the tools to process it are very finicky (we apparently have to hard code paths to specific versions of xslt stylesheets). Duncan

On 2/21/08 3:57 PM, Duncan Coutts wrote:
Consequently there is no support in Cabal etc for those kinds of documentation. GHC, Cabal and c2hs amongst others use docbook but it's a horrible format to write and the tools to process it are very finicky (we apparently have to hard code paths to specific versions of xslt stylesheets).
Hi, DocBook authoring tools have progressed tremendously in the past few years, and I disagree that the "tools to process it are very finicky". If there are specific questions about making DocBook more palatable for GHC, Cabal, c2hs, others, please send them to me directly or the docbook-apps list: http://www.docbook.org/help Thanks, Keith

On Fri, 2008-02-22 at 07:21 -0800, Keith Fahlgren wrote:
On 2/21/08 3:57 PM, Duncan Coutts wrote:
Consequently there is no support in Cabal etc for those kinds of documentation. GHC, Cabal and c2hs amongst others use docbook but it's a horrible format to write and the tools to process it are very finicky (we apparently have to hard code paths to specific versions of xslt stylesheets).
Hi,
DocBook authoring tools have progressed tremendously in the past few years, and I disagree that the "tools to process it are very finicky". If there are specific questions about making DocBook more palatable for GHC, Cabal, c2hs, others, please send them to me directly or the docbook-apps list: http://www.docbook.org/help
I admit to knowing rather little about it but I've noticed that these three projects are using xsltproc directly as their docbook processor (the other two just copied GHC). Because xsltproc is a general purpose tool they have to supply a large number of parameters including a hard coded location of an xslt script (and xsltproc seems to always want to go to the network to download a dtd when it already has a version it can use offline). Currently Cabal uses: xsltproc --param use.id.as.filename 1 --param toc.section.depth 3 --stringparam base.dir dist/doc/users-guide/ --stringparam html.stylesheet doc/fptools.css /usr/share/sgml/docbook/xsl-stylesheets-1.70.1/xhtml/chunk.xsl doc/Cabal.xml Yes, that's a hard coded path to the style sheet. GHC does it slightly more cleverly by using a configure.ac test to look through a very large number of hard coded paths to try and find the above style sheet. c2hs used to use docbook2html which was easier to use in terms of specifying command lines but produced worse output and didn't seem to work on more recent distros so c2hs switched to xml docbook format rather than the previous sgml format. Basically I'd like to know what tool (that is packaged on every linux distro) do I use to convert a docbook .xml file to xhtml. I took a quick look on the FAQ linked from docbook.org/help and could not immediately find what standard tools and commands I'm supposed to use. Duncan

On Sat, 2008-02-23 at 01:28 +0000, Duncan Coutts wrote:
Basically I'd like to know what tool (that is packaged on every linux distro) do I use to convert a docbook .xml file to xhtml. I took a quick look on the FAQ linked from docbook.org/help and could not immediately find what standard tools and commands I'm supposed to use.
After a bit more looking around I found the xmlto program which is packaged for my distro: xmlto xhtml doc/Cabal.xml -o doc/manual That's considerably nicer. Now all I need to do is figure out how to tell it to use a particular css file we use. I can't help noticing that the second tutorial listed on the wiki (and the first one I've found that actually tells you want tools to use) seems to suggest using xsltproc with long hard coded paths: http://opensource.bureau-cornavin.com/crash-course/en/hello-world.html Duncan

I guess there was some confusion about the haddock-as-preprocessor idea.
Here's another try:
Pare the Haddock markup language down to very few markup directives, say
just 'foo' and "Foo.Bar". (Of course, Haddock continues to read and process
type signatures and module import & export specs.) Compose this slimmed
down Haddock with a more mainstream and powerful markup language/processor
like markdown/pandoc. How to compose? By having Haddock translate its
markdown extensions into markdown and pass through all the rest.
The goal redesigning for composability is that we get more for less.
Haddock can focus on its speciality, namely hyperlinked Haskell code
documentation, and pandoc on its, namely human-writable and -readable prose
with modern features (images, friendly hyperlinks, smart quotes & dashes,
footnotes, super- and subscripts, pretty math, bibliography-style link
specs, etc). Haddock development can focus its resources on
Haskell-specific functionality, and we library writers can still use a
full-featured mark-up language.
I love Haddock's Haskell-smarts, and I love (extended) markdown's features
and usability. Currently, I have to choose between them, and I'd rather get
both at once.
We can take this composability idea further and plug in other nifty tools
like hscolour and lhs2TeX. And a new tool that hyperlinks and annotates
source code in a variety of ways. For instance, hover over an identifier to
see its type and doc string in a pop-up, or click to jump to the source code
(also annotated with type, doc, and source links). And other tools we
haven't yet thought of.
Cheers, - Conal
On Thu, Feb 21, 2008 at 5:12 AM, Alistair Bayley
On 21/02/2008, Duncan Coutts
wrote: To be honest I like the fact that haddock's markup is really simple and perhaps somewhat restrictive. A great improvement though would be to make it easy to extract the docs from haddock in a nice format so that the could be re-used in other contexts rather than just generating html api documentation. Haddock does have support for multiple backends, someone just needs to define and write a generic backend that spits out the info that haddock gathers in a machine readable format.
I have probably misunderstood both of you, but I think that Conal proposed that Haddock *input* syntax is largely unchanged; Haddock should be able to *output* markdown, for consumption by pandoc.
(Which I think is also what you're suggesting.)
Alistair

On Thu, 2008-02-21 at 16:54 -0800, Conal Elliott wrote:
I guess there was some confusion about the haddock-as-preprocessor idea. Here's another try:
Pare the Haddock markup language down to very few markup directives, say just 'foo' and "Foo.Bar". (Of course, Haddock continues to read and process type signatures and module import & export specs.) Compose this slimmed down Haddock with a more mainstream and powerful markup language/processor like markdown/pandoc. How to compose? By having Haddock translate its markdown extensions into markdown and pass through all the rest.
So the advantage of passing the rest through uninterpreted is that markdown then interprets it and we get lots of cool markup for free, the disadvantage is that we get lots more markup that I don't understand! :-) There really is something to be said for being able to download a random package, read the code at the documentation markup and be able to understand it and modify it. If it's a simple common language like we have at the moment we can do that. I worry about loosing that property. So yes we could make haddock not care so much and pass everything through and then people could do whatever they liked with new markup formats but I wonder if we cannot find a common language that we can all agree on. Are there any particularly cool things in markdown that lots of haskell developers want to use in their api documentation? Duncan

On Thu, Feb 21, 2008 at 5:37 PM, Duncan Coutts
So the advantage of passing the rest through uninterpreted is that markdown then interprets it and we get lots of cool markup for free, the disadvantage is that we get lots more markup that I don't understand! :-)
Thanks for this summary, Duncan.
There really is something to be said for being able to download a random package, read the code at the documentation markup and be able to understand it and modify it. If it's a simple common language like we have at the moment we can do that. I worry about loosing that property.
Have you looked at markdown? It's a popular and well-documented format and based on common conventions. I bet you'd have no trouble learning it, and I bet many other Haskell programmers *already* know it. (BTW, I just noticed that this mail message is in written in markdown.)
So yes we could make haddock not care so much and pass everything through and then people could do whatever they liked with new markup formats but I wonder if we cannot find a common language that we can all agree on. Are there any particularly cool things in markdown that lots of haskell developers want to use in their api documentation?
My previous note listed some (pandoc-extended) markdown features I use regularly (while blogging) that are missing in Haddock. If I could, I'd use them in my code documentation. I'd like to hear from others about what markup features you'd like to have in your code documentation but aren't supported by Haddock. Cheers, - Conal

Conal Elliott wrote:
Pare the Haddock markup language down to very few markup directives, say just 'foo' and "Foo.Bar".
Other critical ones: -- | This shows which syntax this text describes. -- ^ So does this. Less critical, but usually not provided by general markup languages: -- $doc A movable documentation chunk. If Haddock itself does not parse any other markup, we must make sure to use markup that does not lock up its information. It should be something we have a parser for, or something that has good tools for turning it into some robust machine-readable format in a lossless way. The reason is that I may want to use a bit of Haskell in a much larger project that uses some other markup system for its API documentation. So, for example, if I want to integrate the output into a larger DITA project, there should be an easy way to do that. Or Doxygen, or whatever else. Then Haddock would need to have some way of outputting its own information nicely, with embedded chunks of markup. You would read that, passing each chunk of markup through its parser. Truth is, I don't see any such parser for "markdown". Do you know of one? Maybe we would have to write one. I think that improving the markup capabilities of Haddock is a minor issue. The main value of Haddock is its API metadata. Haddock currently keeps most of that in its bellly, using it secretly to create its own presentation output. The biggest improvement would be getting meaningful machine-readable output. Your idea of abstracting out the markup could actually make that easier, if we keep that goal in mind as well. Thanks, Yitz

Hi Yitzchak,
About "-- |", "-- ^", and "-- $doc", we might call them "markup
meta-directives", since they delimit the text to be preprocessed and then
produced as markup. The meta-directives and the "-- " line prefixes would
be removed in the process.
As for producing machine-readable API metadata, I hadn't been thinking along
those lines, and I enthusiastically agree with you. Further factor haddock
into a metadata extractor and a documentation generator.
Cheers, - Conal
On Fri, Feb 22, 2008 at 3:25 AM, Yitzchak Gale
Conal Elliott wrote:
Pare the Haddock markup language down to very few markup directives, say just 'foo' and "Foo.Bar".
Other critical ones:
-- | This shows which syntax this text describes. -- ^ So does this.
Less critical, but usually not provided by general markup languages:
-- $doc A movable documentation chunk.
If Haddock itself does not parse any other markup, we must make sure to use markup that does not lock up its information. It should be something we have a parser for, or something that has good tools for turning it into some robust machine-readable format in a lossless way.
The reason is that I may want to use a bit of Haskell in a much larger project that uses some other markup system for its API documentation.
So, for example, if I want to integrate the output into a larger DITA project, there should be an easy way to do that. Or Doxygen, or whatever else.
Then Haddock would need to have some way of outputting its own information nicely, with embedded chunks of markup. You would read that, passing each chunk of markup through its parser.
Truth is, I don't see any such parser for "markdown". Do you know of one? Maybe we would have to write one.
I think that improving the markup capabilities of Haddock is a minor issue. The main value of Haddock is its API metadata. Haddock currently keeps most of that in its bellly, using it secretly to create its own presentation output. The biggest improvement would be getting meaningful machine-readable output.
Your idea of abstracting out the markup could actually make that easier, if we keep that goal in mind as well.
Thanks, Yitz

The goal redesigning for composability is that we get more for less. Haddock can focus on its speciality, namely hyperlinked Haskell code documentation, and pandoc on its, namely human- writable and -readable prose with modern features (images, friendly hyperlinks, smart quotes & dashes, footnotes, super- and subscripts, pretty math, bibliography-style link specs, etc). Haddock development can focus its resources on Haskell-specific functionality, and we library writers can still use a full-featured mark-up language. While I like the idea of a very powerful authoring system, I doubt
Op 22-feb-2008, om 1:54 heeft Conal Elliott het volgende geschreven: that we should mix the documentation code with the source code. It seems much clearer to me to separate such heavily-formatted documentation from the source into separate files. Of course, the source code includes comments that specify what functions do, and so provide a bit of API documentation. But such comments should contain as little formatting as possible to keep them readable in a text editor. Reinier

On Fri, Feb 22, 2008 at 10:57 AM, Reinier Lamers
[...] Of course, the source code includes comments that specify what functions do, and so provide a bit of API documentation. But such comments should contain as little formatting as possible to keep them readable in a text editor.
Hi Reinier, Do you know about the [Markdown] format and the [Pandoc] processor? [Markdown] is designed for *readability* in text editors and based on common text conventions.
From the [markdown] home page:
The overriding design goal for Markdown's formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions. While Markdown's syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown's syntax is the format of plain text email.
Don't take their word for it or mine. You can copy and paste this message into the [Try Pandoc] page. Regards, - Conal [Markdown]: http://daringfireball.net/projects/markdown "The markdown project page" [Pandoc]: http://johnmacfarlane.net/pandoc/try "The Pandoc project page" [Try Pandoc]: http://johnmacfarlane.net/pandoc/try "Try out Pandoc for yourself"

2008/2/21, Conal Elliott
There was a chat today on #haskell (15:08 to 16:10) about evolving haddock. I'd like to get comments.
The goal is to get the full functionality of a general purpose, programmer-friendly markup language like markdown. One example is image embedding. Another is friendly links (no visible URL).
Haddock already supports image embedding since version 0.8. I don't think it works properly in version 2 (because of merge errors), but it probably will in the next version. Use <<url>> for including images. David
participants (8)
-
Alistair Bayley
-
Conal Elliott
-
David Menendez
-
David Waern
-
Duncan Coutts
-
Keith Fahlgren
-
Reinier Lamers
-
Yitzchak Gale