Re: Request for comments on proposal for literate programming using markdown

On Thu, Aug 09, 2012 at 01:07:10PM +0000, Philip Holzenspies wrote:
I have looked at pandoc and I use it for quite a few things.
Just to clarify, I was not talking about pandoc, but pandoc-unlit (which uses pandoc to unlit Markdown, see the README [1]).
However, it's a bit of an overspec'd package to link into the compiler, don't you think?
I did not mean to modify the Compiler. Unliting is done by an external program. This already allows you to customize unliting [2]. Cheers, Simon [1] https://github.com/sol/pandoc-unlit#readme [2] http://www.haskell.org/ghc/docs/latest/html/users_guide/options-phases.html#...

On 9 Aug 2012, at 15:26, Simon Hengel wrote:
Just to clarify, I was not talking about pandoc, but pandoc-unlit (which uses pandoc to unlit Markdown, see the README [1]).
Sorry, I was a bit unclear there. I know about the program and it depends on the library.
However, it's a bit of an overspec'd package to link into the compiler, don't you think?
I did not mean to modify the Compiler. Unliting is done by an external program. This already allows you to customize unliting [2].
Absolutely true, but I came across this in the GHC-source itself. I would like the GHC-source to be literateable (not a work, but you know what I mean) in markdown. Now, the GHC-source could be build with the same mechanism for having unlitting done by an external program, but that would make the build process depend on a very large library (through pandoc-unlit, depending on pandoc), which, by the way, has a GPL license. Anyhow, although absolutely true for one's own usage, this does not seem like an alternative to allow markdown (or reStructuredText) lhs in the GHC source. Regards, Philip

On Mon, Aug 13, 2012 at 08:45:51AM +0000, Philip Holzenspies wrote:
However, it's a bit of an overspec'd package to link into the compiler, don't you think?
I did not mean to modify the Compiler. Unliting is done by an external program. This already allows you to customize unliting [2].
Absolutely true, but I came across this in the GHC-source itself. I would like the GHC-source to be literateable (not a work, but you know what I mean) in markdown. Now, the GHC-source could be build with the same mechanism for having unlitting done by an external program, but that would make the build process depend on a very large library (through pandoc-unlit, depending on pandoc), which, by the way, has a GPL license.
I think it makes sense, that you do not want to depend on pandoc for GHC's build process. But would a more lightweight unlit for Markdown work? Hmm, one issue could arise with a huge codebase (like GHC's) that uses both traditional literate Haskell and Markdown. You can't set the unlit program globally then. I think this could be solved by adding {-# OPTIONS_GHC -pgmL unlit-markdown #-} to source files that use Markdown. Sadly this is no valid Markdown, so it is not really sane to add it to a Markdown file. Would it work to adapt GHC's option sniffing, so that it recognizes options in HTML comments (which are valid Markdown): <!-- OPTIONS_GHC -pgmL unlit-markdown --> Possibly with the requirement that it has to be on the first line, and maybe guarded by a flag (e.g. -ext-options-sniffing)? Cheers, Simon

Dear Simon, On 13 Aug 2012, at 10:23, Simon Hengel wrote:
I think it makes sense, that you do not want to depend on pandoc for GHC's build process. But would a more lightweight unlit for Markdown work?
Ultimately, all unlitting does is replace things not in code blocks by white lines. Bird-style unlitting also replaces '>' at the start of a line with a space. It doesn't take that much.
Hmm, one issue could arise with a huge codebase (like GHC's) that uses both traditional literate Haskell and Markdown. You can't set the unlit program globally then. I think this could be solved by adding
{-# OPTIONS_GHC -pgmL unlit-markdown #-}
to source files that use Markdown. Sadly this is no valid Markdown, so it is not really sane to add it to a Markdown file.
This would be a way to do it with the means available now. My proposal, however, is to replace the external unlit and cpp binaries, the former of which is shipped as source with the GHC code, built separately and then called as an external program from the compiler. The latter is assumed to be present on the host system and also called as an external program. My proposition is to replace these external programs by code *inside* GHC. There is indeed no formal standard for Markdown declaring one version of the language 'valid' but it's like the unlitting for LaTeX-style lhs; there is no standard for LaTeX, so there is no Valid LaTeX, but you can still define the unlitting nicely by saying; you need to stick "\begin{code}" on a line, starting at column 1, not followed by anything. The same can be done for Markdown. Just choose a style. I suggested using GitHub style-markdown. Then the unlitting is just defined as: stick three backticks followed by "hs" or "haskell" onto a line, starting at column 1, not followed by anything.
Would it work to adapt GHC's option sniffing, so that it recognizes options in HTML comments (which are valid Markdown):
<!-- OPTIONS_GHC -pgmL unlit-markdown -->
Possibly with the requirement that it has to be on the first line, and maybe guarded by a flag (e.g. -ext-options-sniffing)?
Again; this could well work, but it's only relevant if we're wanting to have the unlitting as an external process. If we build the unlitter into GHC, we need not pollute the files with options to be passed to the compiler. See the proposal for suggested ways of disambiguating Markdown files. We could also add an option to GHC to say that a file is in Markdown mode, overruling all the detection mechanisms suggested in the proposal, but it wouldn't be a pgm-option, because it wouldn't be about calling something external. Regards, Philip

On 13 Aug 2012, at 13:20, Simon Hengel wrote:
What is the benefit of doing so?
- Simpler build environment - Easier to understand interaction and bugs resulting from them (viz. [1], [2]), because the interactions happen in the same domain - (as mentioned in the proposal) Simplification of the API; not going through temp-files (which, by the way, I don't understand anyway; why not through pipes? Probably pipes are harder to do under Windows). - (minor) May well help speed things up by not having to reread every file three times and write it twice before actually coming to GHC's parser - Helps pull the code for unlit into the community; it seems nobody's looking at it at the moment. The README in utils/unlit/ still hails the merits of code-sharing between GHC and LML/HBC, but I don't believe this sharing exists (actively) anymore and it keeps people from improving it. Part of improving it would be to add a little more documentation / commenting. For example, can you explain why "myputc(c,stream)" is called when "c == '#'", regardless of whether we are inside or outside of a code block? I've thought about it for a bit and can't come up with an answer. - If it is only an optional replacement unlitter in the GHC build system, there is no reward for GHC-hackers to agree on a consistent style of making important notes. Currently, my workflow (when *using* the GHC API) involves the haddock-generated API-documentation, the HsColour generated html-source and the actual source code, because many "[Note: Something or Other]" are formatted in a markdown-style (as already mentioned in the proposal), which turn out unformatted (and thereby largely unreadable) in the generated html. - Building on the benefit above: As haddock support is being built into the GHC API by keeping comments in the HsSyn, so also could any markdown commenting be built-in for HsColour support. This would lead to much better public documentation of the API, *without* GHC-hackings changing what they do now, i.e. writing block notes in the source. They may not all be as valuable to everyone, but together they're enough to motivate me to have a go at it. Would there be a benefit to *not* doing this other than *me* not having to do the work? Regards, Philip [1] http://hackage.haskell.org/trac/ghc/ticket/6144 [2] http://hackage.haskell.org/trac/ghc/ticket/4836

Hi Philip, On Mon, Aug 13, 2012 at 12:57:44PM +0000, Philip Holzenspies wrote:
What is the benefit of doing so?
- Simpler build environment
- Easier to understand interaction and bugs resulting from them (viz. [1], [2]), because the interactions happen in the same domain
- (as mentioned in the proposal) Simplification of the API; not going through temp-files (which, by the way, I don't understand anyway; why not through pipes? Probably pipes are harder to do under Windows).
- (minor) May well help speed things up by not having to reread every file three times and write it twice before actually coming to GHC's parser
- Helps pull the code for unlit into the community; it seems nobody's looking at it at the moment. The README in utils/unlit/ still hails the merits of code-sharing between GHC and LML/HBC, but I don't believe this sharing exists (actively) anymore and it keeps people from improving it. Part of improving it would be to add a little more documentation / commenting. For example, can you explain why "myputc(c,stream)" is called when "c == '#'", regardless of whether we are inside or outside of a code block? I've thought about it for a bit and can't come up with an answer.
- If it is only an optional replacement unlitter in the GHC build system, there is no reward for GHC-hackers to agree on a consistent style of making important notes. Currently, my workflow (when *using* the GHC API) involves the haddock-generated API-documentation, the HsColour generated html-source and the actual source code, because many "[Note: Something or Other]" are formatted in a markdown-style (as already mentioned in the proposal), which turn out unformatted (and thereby largely unreadable) in the generated html.
- Building on the benefit above: As haddock support is being built into the GHC API by keeping comments in the HsSyn, so also could any markdown commenting be built-in for HsColour support. This would lead to much better public documentation of the API, *without* GHC-hackings changing what they do now, i.e. writing block notes in the source.
They may not all be as valuable to everyone, but together they're enough to motivate me to have a go at it.
Thanks a lot for the clarification. I see some value in your proposal to replace GHC's unlit, mainly in terms of setting a common standard. Personally, I'd still feel more comfortable if that proposed standard would be developed as a Hackage package, so that it can proof itself useful first. I'm less convinced, that it should be inlined into GHC (I do agree, that this would be necessary, if you want to include the markdown into the AST. But where is the user? HsSyn does not even use the GHC API, AFAIK.). Anyway, just my *personal* opinion ;)
Would there be a benefit to *not* doing this other than *me* not having to do the work?
Yes, currently you can replace the unlit phase. So you can use arbitrary markup in .lhs files. Which I think is quite useful. Changing this would also have practical implications. Let's assume I want to use my favorite markup language *foo* in .lhs files, after a lot of bikeshedding the community agrees that *foo* is a good idea, and I have to modify GHC to add support for *foo*, then: * The barrier to modify GHC is still much higher than writing a custom unlit. * I have to wait for the next GHC release before I can use *foo*. * When finally a new version of GHC with support for *foo* has been released, a program that uses *foo* will still only work with the latest version of GHC. If I e.g. want to support the latest three major versions of GHC, I need to wait an other year before I can actually use *foo*. Cheers, Simon

Dear Simon, On 13 Aug 2012, at 15:18, Simon Hengel wrote:
Thanks a lot for the clarification.
I see some value in your proposal to replace GHC's unlit, mainly in terms of setting a common standard. Personally, I'd still feel more comfortable if that proposed standard would be developed as a Hackage package, so that it can proof itself useful first. I'm less convinced, that it should be inlined into GHC (I do agree, that this would be necessary, if you want to include the markdown into the AST. But where is the user? HsSyn does not even use the GHC API, AFAIK.).
HsSyn doesn't use the GHC API, but the GHC API uses HsSyn. If things aren't kept in the HsSyn, the API can not produce them either. Comments that the parser discards, the API can not produce.
Would there be a benefit to *not* doing this other than *me* not having to do the work?
Yes, currently you can replace the unlit phase. So you can use arbitrary markup in .lhs files. Which I think is quite useful.
Just so that there are no misunderstandings: The proposal **never** suggested throwing out any pluggability for custom unlitters. I can not be clear enough about this. The **only** thing I intend to change is the **default** case of unlitting (and maybe CPP). **Whatever** options exist now (command line or otherwise) for using alternatives to the default cases will **remain** as they are. I don't think I ever suggested otherwise, so I'm surprised that this came up. Thanks for bringing it up, though, because if anyone else got that idea, I hope it is now thoroughly squashed! Having put this behind us, do you still see reasons not to do this? Regards, Philip

On Mon, Aug 13, 2012 at 03:20:53PM +0000, Philip Holzenspies wrote:
I see some value in your proposal to replace GHC's unlit, mainly in terms of setting a common standard. Personally, I'd still feel more comfortable if that proposed standard would be developed as a Hackage package, so that it can proof itself useful first. I'm less convinced, that it should be inlined into GHC (I do agree, that this would be necessary, if you want to include the markdown into the AST. But where is the user? HsSyn does not even use the GHC API, AFAIK.).
HsSyn doesn't use the GHC API, but the GHC API uses HsSyn. If things aren't kept in the HsSyn, the API can not produce them either. Comments that the parser discards, the API can not produce.
Oh, I meant HsColour instead of HsSyn. Sorry for that.
Would there be a benefit to *not* doing this other than *me* not having to do the work?
Yes, currently you can replace the unlit phase. So you can use arbitrary markup in .lhs files. Which I think is quite useful.
Just so that there are no misunderstandings: The proposal **never** suggested throwing out any pluggability for custom unlitters. I can not be clear enough about this. The **only** thing I intend to change is the **default** case of unlitting (and maybe CPP). **Whatever** options exist now (command line or otherwise) for using alternatives to the default cases will **remain** as they are.
I don't think I ever suggested otherwise, so I'm surprised that this came up. Thanks for bringing it up, though, because if anyone else got that idea, I hope it is now thoroughly squashed!
Having put this behind us, do you still see reasons not to do this?
Personally, still do not see the big benefit for all that work, and I'm still somewhat worried that a mechanism that is not used by default (I'm talking about unliting with an external command) may start to bit rot. But as long as you are commit to keep `-pgmL` intact, I'm ok ;). I think in the end it's best to go with the solution that works best for GHC-HQ. Cheers, Simon

On 14 Aug 2012, at 07:48, Simon Hengel wrote:
Personally, still do not see the big benefit for all that work, and I'm still somewhat worried that a mechanism that is not used by default (I'm talking about unliting with an external command) may start to bit rot. But as long as you are commit to keep `-pgmL` intact, I'm ok ;).
A biggy that I had left out has just reoccurred to me. The very first reason for me to look at how unlitting and preprocessing is done in GHC was, because I was looking into what would be required for a refactoring engine (like haRe) to be based on the GHC API. Of course, at the moment, the API doesn't do anything with unlitting and preprocessing.
I think in the end it's best to go with the solution that works best for GHC-HQ.
Still hoping to hear from them ;) Regards, Philip

Ultimately your best bet to actually get something integrated will be to
find something that minimizes the amount of work on the part of GHC HQ.
I don't think *anybody* there is interested in picking up a lot of fiddly
formatting logic and carving it into stone.
They might be slightly less inclined to shut the door in your face if the
proposal only involved adding a few hooks in the AST for exposing
alternative documentation formats, which would enable you to hook in via a
custom unlit or do something like how haddock hooks in, but overall, if it
involves folks at GHC HQ maintaining a full markdown parser I think they
will (and should) just shrug and move on.
The resulting system would be slightly less work for you, but would only
see any improvements delayed a year between GHC releases, and then the
community can't adopt the improvements in earnest for another year after
that. This is *not* an encouraging development cycle, and doesn't strike me
as a recipe for a successful project.
As proposed, this would distract some pretty core resources from working on
core functionality and I for one am heavily against it as I understand what
has been proposed so far.
Haddock works with some fairly simple extensions to GHC's syntax tree. If
your proposal was modified so that it just requires a few hooks or worked
with the existing haddock hooks in the syntax tree, then while I would
hardly be a huge proponent due the fragmentation issues about how to deal
with documentation, I would at least cease to be actively opposed.
-Edward
On Tue, Aug 21, 2012 at 7:45 AM, Philip Holzenspies
On 14 Aug 2012, at 07:48, Simon Hengel wrote:
Personally, still do not see the big benefit for all that work, and I'm still somewhat worried that a mechanism that is not used by default (I'm talking about unliting with an external command) may start to bit rot. But as long as you are commit to keep `-pgmL` intact, I'm ok ;).
A biggy that I had left out has just reoccurred to me. The very first reason for me to look at how unlitting and preprocessing is done in GHC was, because I was looking into what would be required for a refactoring engine (like haRe) to be based on the GHC API. Of course, at the moment, the API doesn't do anything with unlitting and preprocessing.
I think in the end it's best to go with the solution that works best for GHC-HQ.
Still hoping to hear from them ;)
Regards, Philip _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 21 Aug 2012, at 13:47, Edward Kmett wrote: Ultimately your best bet to actually get something integrated will be to find something that minimizes the amount of work on the part of GHC HQ. Check. I don't think anybody there is interested in picking up a lot of fiddly formatting logic and carving it into stone. They might be slightly less inclined to shut the door in your face if the proposal only involved adding a few hooks in the AST for exposing alternative documentation formats, which would enable you to hook in via a custom unlit or do something like how haddock hooks in, but overall, if it involves folks at GHC HQ maintaining a full markdown parser I think they will (and should) just shrug and move on. I'm cursed with a very suggestive writing style. I have no idea why Simon got the idea I wanted to remove the command line arguments. I have no idea why you think I want to build full markdown parsers. Help me out; where did you get that idea? Also, just for the record, I'm planning on dealing with markdown as extensively as the current unlitter deals with LaTeX. The resulting system would be slightly less work for you, but would only see any improvements delayed a year between GHC releases, and then the community can't adopt the improvements in earnest for another year after that. This is not an encouraging development cycle, and doesn't strike me as a recipe for a successful project. So, there are many things people read in the proposal that I didn't want to put in, but the things I very much do want to include get lost in translation also. I wanted to allow the GHC source itself to be written in markdown. As proposed, this would distract some pretty core resources from working on core functionality and I for one am heavily against it as I understand what has been proposed so far. Haddock works with some fairly simple extensions to GHC's syntax tree. If your proposal was modified so that it just requires a few hooks or worked with the existing haddock hooks in the syntax tree, then while I would hardly be a huge proponent due the fragmentation issues about how to deal with documentation, I would at least cease to be actively opposed. I thought that GHC first runs unlit, then CPP and only then does it construct an AST. I don't know how to implement unlitting by hooks in the AST, if unlitting happens before building the AST. Unfortunately, it seems the proposal is so poorly written that I've spent more time dealing with the misconceptions it creates than actually implementing the unlitter. I'll retract the proposal. Ph.

On Wed, Aug 22, 2012 at 3:37 AM, Philip Holzenspies
Unfortunately, it seems the proposal is so poorly written that I've spent more time dealing with the misconceptions it creates than actually implementing the unlitter. I'll retract the proposal.
Maybe just try again in a separate thread? Perhaps under a pseudonym! :)

On Wed, Aug 22, 2012 at 10:02 AM, Nicolas Frisby
Maybe just try again in a separate thread? Perhaps under a pseudonym! :)
Whoa, just realized once again that email is tone-deaf. I meant that 'pseudonym' thing cheekily: just to help differentiate the proposal in a silly way. In no way was it supposed to be insulting! Sorry for the noise. Also, for the record, I would like for markdown to be an option — regardless of other practical considerations at the moment. It might also be attractive to web people as they check out Haskell.

On Wed, Aug 22, 2012 at 4:37 AM, Philip Holzenspies
So, there are many things people read in the proposal that I didn't want to put in, but the things I very much do want to include get lost in translation also. I wanted to allow the GHC source itself to be written in markdown.
If the existing source tree is using one form of markup, changes and additions should really be consistent with what's already there instead of introducing a new kind of markup. This could actually be *more* disruptive. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 22 Aug 2012, at 16:13, Brandon Allbery wrote:
On Wed, Aug 22, 2012 at 4:37 AM, Philip Holzenspies

On Wed, Aug 22, 2012 at 11:22 AM, Philip Holzenspies
On 22 Aug 2012, at 16:13, Brandon Allbery wrote:
On Wed, Aug 22, 2012 at 4:37 AM, Philip Holzenspies
wrote:
So, there are many things people read in the proposal that I didn't want to put in, but the things I very much do want to include get lost in translation also. I wanted to allow the GHC source itself to be written in markdown.
If the existing source tree is using one form of markup, changes and additions should really be consistent with what's already there instead of introducing a new kind of markup. This could actually be *more* disruptive.
The point was that quite a bit of the GHC source has markdown-like things in it, using LaTeX-style code-fencing, but LaTeX-incompatible markup (like underlining section with ~~~~~).
Even so. A concrete version of what I'm getting at is that ghc is self-bootstrapping, so older versions need to be able to build newer ones; GHC code using a new markdown literate preprocessor --- or, worse, one integrated with lexing or parsing --- will not be buildable with GHC versions predating its addition. So even given the addition of such, ghc wont itself be able to use it for at least several releases, to give OS distributions etc. time to upgrade their packages to versions that can build the result. (Asking them to re-bootstrap is usually asking too much; they'll likely just stop updating or possibly drop ghc entirely.) -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 22 Aug 2012, at 16:29, Brandon Allbery wrote:
Even so. A concrete version of what I'm getting at is that ghc is self-bootstrapping, so older versions need to be able to build newer ones; GHC code using a new markdown literate preprocessor --- or, worse, one integrated with lexing or parsing --- will not be buildable with GHC versions predating its addition. So even given the addition of such, ghc wont itself be able to use it for at least several releases, to give OS distributions etc. time to upgrade their packages to versions that can build the result. (Asking them to re-bootstrap is usually asking too much; they'll likely just stop updating or possibly drop ghc entirely.)
Of course, but this is where the (very old) -pgmL command line switch would have come in. Simply releasing the unlitter as a stand-alone tool also and invoking it from an old GHC by using -pgmL, which could be automated in the build system; including the check for the GHC version to see whether any external tooling was required at all. Anyway, the point is a bit moot. It seems obvious that the proposal had very little support and has been withdrawn. Ph.

On Wed, Aug 22, 2012 at 11:42 AM, Philip Holzenspies
Anyway, the point is a bit moot. It seems obvious that the proposal had very little support and has been withdrawn.
This might be a poor time for it with 7.6.1 around the corner. That said, I would re-propose *with code* (i.e. a patch to ghc); it will be easier for people to see what you are getting at that way, given all the misunderstandings apparently running around loose. It would also provide concrete evidence of how ghc will be impacted. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On 22/08/12 16:22, Philip Holzenspies wrote:
On 22 Aug 2012, at 16:13, Brandon Allbery wrote:
On Wed, Aug 22, 2012 at 4:37 AM, Philip Holzenspies
mailto:pkfh@st-andrews.ac.uk> wrote: So, there are many things people read in the proposal that I didn't want to put in, but the things I very much do want to include get lost in translation also. I wanted to allow the GHC source itself to be written in markdown.
If the existing source tree is using one form of markup, changes and additions should really be consistent with what's already there instead of introducing a new kind of markup. This could actually be *more* disruptive.
The point was that quite a bit of the GHC source has markdown-like things in it, using LaTeX-style code-fencing, but LaTeX-incompatible markup (like underlining section with ~~~~~).
I tend to gently nudge the codebase towards illiterate source whenever I can. This is probably a personal preference, but I haven't been convinced that literate code is worth the effort. I want the code to look its most readable in a text editor, which is where I look at it most. Now, perhaps if I had an editor that rendered the markdown on the fly while syntax-highlighting the code, maybe that would tip the balance. (the editor must be emacs, though). I have nothing against adding the extension you propose to GHC, I'm just not sure that we'll actually want to use it in GHC. Cheers, Simon

On Mon, Aug 13, 2012 at 08:45:51AM +0000, Philip Holzenspies wrote:
Absolutely true, but I came across this in the GHC-source itself. I would like the GHC-source to be literateable (not a work, but you know what I mean) in markdown.
FWIW, I'm not sure the work necessary to maintain correctly marked-up commentary is worth the gain. But that aside, why not use a doc generator that first unlits the \begin{code}...\end{code} style, and then treats the non-code sections as markdown format? You'd need to either not use blockquotes, or provide some way to escape the ">"s. Thanks Ian
participants (7)
-
Brandon Allbery
-
Edward Kmett
-
Ian Lynagh
-
Nicolas Frisby
-
Philip Holzenspies
-
Simon Hengel
-
Simon Marlow