Re: GHC's CPP and Cabal's unlit

It would be nice if we can find an unlit extension that is compatible with the H98 requirement and alo allows useful behaviour for haddock.
I have completed changes to the Unlit module, and also a change to the Haddock module (to run unlit before cpp). darcs patches attached. I'd love to know what the official approach to testing is. I have written a test module which exercises the Unlit module, but I'm not sure how to tie it in to any existing test infrastructure. At the moment I just have it sitting in tests/unlit. Thanks, Alistair

Hello, I've not had any feedback on this. Is there some additional work for me to do to make the patches more acceptable for the cabal codebase? Thanks, Alistair

On Mon, 2007-12-17 at 16:08 +0000, Alistair Bayley wrote:
Hello,
I've not had any feedback on this. Is there some additional work for me to do to make the patches more acceptable for the cabal codebase?
David, you looked at this, what was your conclusion? Duncan

On Mon, 2007-12-17 at 16:08 +0000, Alistair Bayley wrote:
Hello,
I've not had any feedback on this. Is there some additional work for me to do to make the patches more acceptable for the cabal codebase?
I finally go round to looking at this. I'm fairly happy with it code wise. I've done a bit of refactoring to improve handling of error messages and to add back handling of line pragmas. I've kept the state machine the same. Thanks for the test suite, that was particularly handy as I was making some changes. One bit I'm less sure about is the handling of paragraphs in comments. Currently the code transforms: blah blah blah blah into -- blah blah -- blah blah but it transforms blah blah blah blah into -- blah blah -- -- blah blah spot the difference? Yeah, just white space. The completely blank line separates the paragraphs into two comments which haddock will treat differently from a single comment. It comes from this code: -- special case here: a truly empty line will terminate -- a comment section (and send us into the "blank" state) comment (Blank "" :ls) = Blank "" : blank ls -- but a line containing whitespace will be treated as a -- comment (prefixed with "-- "), unless it is followed by -- a program line, in which case it is really blank. comment (Blank l:ls@(BirdTrack _:_)) = Blank l : blank ls comment (Blank l:ls) = Comment l : comment ls Perhaps we should just always make the trailing space part of a comment. Obviously we'd have to handle space before some bird-track line without complaining that it's a comment next to code. That might need one more state. I've attached my changes. I still think we need some larger scale test to make sure this is ok. Perhaps there's some way of comparing the output with ghc's unlit. We'd need to blank out "--" lines in both files and probably more to get them to compare equal. Then we could test that on all the .lhs files we can lay our hands on (ghc, hackage etc). Duncan

On Sun, Dec 30, 2007 at 01:40:44AM +0000, Duncan Coutts wrote:
One bit I'm less sure about is the handling of paragraphs in comments. Currently the code transforms:
blah blah
blah blah
into
-- blah blah
-- blah blah
but it transforms
blah blah
blah blah
into
-- blah blah -- -- blah blah
spot the difference? Yeah, just white space. The completely blank line separates the paragraphs into two comments which haddock will treat differently from a single comment.
I think both your examples should be treated the same, although I don't know whether they should be treated as one or two comments. Invisible whitespace is generally ignored elsewhere in Haskell, e.g. when determining if a literate comment is next to a bird track, or after a \ starting a string gap, the rationale being that if you can't see a difference then there shouldn't be one. Thanks Ian

In message <20080104170826.GA7601@matrix.chaos.earth.li> Duncan Coutts
spot the difference? Yeah, just white space. The completely blank line separates the paragraphs into two comments which haddock will treat differently from a single comment.
I think both your examples should be treated the same, although I don't know whether they should be treated as one or two comments.
Alistair, can you remind us why we need the ability to break paragraphs into multiple comments rather than a single comment? Or is it just that we need to be able to have a comment followed by a blank line and then code. eg blah blah
some code
into -- blah blah some code rather than: -- blah blah -- some code is that all? I think that so long as we can avoid generating spurious comment-next-to-code errors, the above distinction is irrelevant. We could avoid comment next to code errors by using an extra comment/blank state for blank lines trailing comments. They'd be transformed into comments but could transition to > bird track lines without a code-next-to-comment error. Perhaps I'll send a patch to implement that.
Invisible whitespace is generally ignored elsewhere in Haskell, e.g. when determining if a literate comment is next to a bird track, or after a \ starting a string gap, the rationale being that if you can't see a difference then there shouldn't be one.
I agree, whatever distinction might be necessary it should not be done on the basis of something that's invisible. Duncan

Duncan, Sorry, I haven't had time to look at your patch yet. Still on holiday.
Alistair, can you remind us why we need the ability to break paragraphs into multiple comments rather than a single comment?
One example: In Takusen's Database.Enumerator module we use named chunks. The chucks are all placed at the end of the file. A given chunk may contain many paragraphs, but it must consist of a single comment block. Separate chunks must be in separate comment blocks (I assume; I haven't actually tested this). Another example: Let's say you want to produce this .hs output for Haddock: -- | A description of x -- -- some more comments about x x = ... Now imagine you want to produce this .hs: -- | A description of x -- some comments which are interesting but we don't want Haddock to use them x = ... What do you write in your .lhs file to produce these outputs? I chose a fairly simple rule: a line containing some whitespace chars continued the current comment block, while a truly empty line ended it.
Invisible whitespace is generally ignored elsewhere in Haskell
I agree, whatever distinction might be necessary it should not be done on the basis of something that's invisible.
Yes, choosing to use non-empty whitespace (to indicate that a comment block is not finished) might well be a poor design decision. Another possibility is to use a line containing a single period as a continuation line e.g. line 1 . line 2 becomes -- line 1 -- -- line 2 ... while whitespace only just outputs an empty line (no comment). Do you think that is a better choice? The extra state idea might be an improvement over the current design. Currently we have to look ahead in the comment state to decide to transition to the blank or comment states. An extra state would probably avoid the lookahead. I'll try it out to see how well it works. Alistair

On Sat, Jan 05, 2008 at 11:14:44PM +0000, Alistair Bayley wrote:
Yes, choosing to use non-empty whitespace (to indicate that a comment block is not finished) might well be a poor design decision. Another possibility is to use a line containing a single period as a continuation line e.g.
line 1 . line 2
I think we do this in Cabal descriptions (copying the design from Debian package's descriptions), so this would be consistent with that. It gets my vote! Thanks Ian

Yes, choosing to use non-empty whitespace (to indicate that a comment block is not finished) might well be a poor design decision. Another possibility is to use a line containing a single period as a continuation line e.g.
I think we do this in Cabal descriptions (copying the design from Debian package's descriptions), so this would be consistent with that. It gets my vote!
I have implemented this, and made a couple of other changes: - use a single period on a line to indicate that we want to continue the comment block (like cabal does) - don't indent code relative to comments, because Haddock doesn't like this - reverse Left and Right cases in the Either returned by unlit, so that they're consistent with Either conventions (Left == failure, Right = success) - reclassify is simpler because there's no need for lookahead for the blank-followed-by-code case I've also updated the test module and the calling code in ppUnlit in PreProcess (because the Left and Right cases are reversed). I wanted to send a separate patch for each source file, but my darcs-fu is poor, and darcs insisted on including a bunch of older patches which are already in my repo and the main cabal repo. So it's all in one fat patch, which seems to include a bunch of older patches it depends on. I don't know why this is; I expected that "darcs pull" would have updated my repo w.r.t. the main cabal repo, so that this wouldn't be necessary. I'll try to get Haddock 2 soon and see how it fares with .lhs input. Alistair

I'm going to try and get this integrated. I'm not happy yet with the issue about blank lines vs '.' lines etc etc. I think that needs a wider discussion but I don't want to hold up what we already have. So I'll integrate it without the '.' line handling for now and we can discuss it further as necessary. Duncan On Thu, 2008-01-10 at 14:25 +0000, Alistair Bayley wrote:
Yes, choosing to use non-empty whitespace (to indicate that a comment block is not finished) might well be a poor design decision. Another possibility is to use a line containing a single period as a continuation line e.g.
I think we do this in Cabal descriptions (copying the design from Debian package's descriptions), so this would be consistent with that. It gets my vote!
I have implemented this, and made a couple of other changes: - use a single period on a line to indicate that we want to continue the comment block (like cabal does) - don't indent code relative to comments, because Haddock doesn't like this - reverse Left and Right cases in the Either returned by unlit, so that they're consistent with Either conventions (Left == failure, Right = success) - reclassify is simpler because there's no need for lookahead for the blank-followed-by-code case
I've also updated the test module and the calling code in ppUnlit in PreProcess (because the Left and Right cases are reversed).
I wanted to send a separate patch for each source file, but my darcs-fu is poor, and darcs insisted on including a bunch of older patches which are already in my repo and the main cabal repo. So it's all in one fat patch, which seems to include a bunch of older patches it depends on. I don't know why this is; I expected that "darcs pull" would have updated my repo w.r.t. the main cabal repo, so that this wouldn't be necessary.
I'll try to get Haddock 2 soon and see how it fares with .lhs input.

On 31/01/2008, Duncan Coutts
I'm going to try and get this integrated. I'm not happy yet with the issue about blank lines vs '.' lines etc etc. I think that needs a wider discussion but I don't want to hold up what we already have.
OK. Ian voted for '.' as empty line, so I went with that as it was the only comment, and was a positive one. What exactly are you not happy with? Is it the unsightliness of the periods in comments, or something else? As for a wider discussion, I'm all for it, but I believe the impact of this change on existing code should be negligible (pending further testing, of course), so I'm not sure if we're going to get much interest. I'm trying to solve the problem in a way that's useful for me now, and, I hope, in a way that's useful for others. I get the impression that I'm a pretty small minority in trying to generate Haddock docs from .lhs source. Alistair

On Thu, 2008-01-31 at 14:37 +0000, Alistair Bayley wrote:
On 31/01/2008, Duncan Coutts
wrote: I'm going to try and get this integrated. I'm not happy yet with the issue about blank lines vs '.' lines etc etc. I think that needs a wider discussion but I don't want to hold up what we already have.
OK. Ian voted for '.' as empty line, so I went with that as it was the only comment, and was a positive one.
What exactly are you not happy with? Is it the unsightliness of the periods in comments, or something else?
Partly. I've never liked that convention. It seems quite unnecessary in .cabal files. My main complaint is that it does not correspond to any existing practise in haddock docs. There are some people who use haddock style markup in .lhs files and we should aim to make it straightforward for them to convert. Also, as I've said it's not clear to me that it is even needed. It's only for the case where you want to have a haddock doc for something and then an intervening non-haddock comment before the actual definition. The obvious solution there is just to move that non-haddock comment somewhere else, like before the haddock doc, or inside or after the definition.
As for a wider discussion, I'm all for it, but I believe the impact of this change on existing code should be negligible (pending further testing, of course), so I'm not sure if we're going to get much interest. I'm trying to solve the problem in a way that's useful for me now, and, I hope, in a way that's useful for others. I get the impression that I'm a pretty small minority in trying to generate Haddock docs from .lhs source.
You are, but that's only because it doesn't currently work :-). In particular I'd like to know how well it works for Jon Fairbairn who has .lhs code that uses haddock markup and he uses a little pre-processor to convert it. Duncan

Duncan Coutts
On Thu, 2008-01-31 at 14:37 +0000, Alistair Bayley wrote:
As for a wider discussion, I'm all for it, but I believe the impact of this change on existing code should be negligible (pending further testing, of course), so I'm not sure if we're going to get much interest. I'm trying to solve the problem in a way that's useful for me now, and, I hope, in a way that's useful for others. I get the impression that I'm a pretty small minority in trying to generate Haddock docs from .lhs source.
You are, but that's only because it doesn't currently work :-).
I would certainly have written my pedantic html library using literate style if Haddock had worked for it without pain.
In particular I'd like to know how well it works for Jon Fairbairn who has .lhs code that uses haddock markup and he uses a little pre-processor to convert it.
(I didn't want to have to include that preprocessor with the library, so used illiterate Haskell instead). While I have a fair bit of literate Haskell, hardly any of it uses Haddock, so I don't think I can supply a useful amount of data here as it would take me so little time to convert it to whatever form you end up with. Thanks for asking, though. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Thu, Jan 31, 2008 at 03:41:59PM +0000, Duncan Coutts wrote:
On Thu, 2008-01-31 at 14:37 +0000, Alistair Bayley wrote:
On 31/01/2008, Duncan Coutts
wrote: OK. Ian voted for '.' as empty line, so I went with that as it was the only comment, and was a positive one. What exactly are you not happy with? Is it the unsightliness of the periods in comments, or something else?
Partly. I've never liked that convention. It seems quite unnecessary in .cabal files.
With the old Cabal syntax, blank lines separated stanzas, so this was the only way to get blank lines in fields, notably Description. I'm not sure whether it's still needed with the new syntax.

On Thu, 2008-01-31 at 17:41 +0000, Ross Paterson wrote:
On Thu, Jan 31, 2008 at 03:41:59PM +0000, Duncan Coutts wrote:
On Thu, 2008-01-31 at 14:37 +0000, Alistair Bayley wrote:
On 31/01/2008, Duncan Coutts
wrote: OK. Ian voted for '.' as empty line, so I went with that as it was the only comment, and was a positive one. What exactly are you not happy with? Is it the unsightliness of the periods in comments, or something else?
Partly. I've never liked that convention. It seems quite unnecessary in .cabal files.
With the old Cabal syntax, blank lines separated stanzas, so this was the only way to get blank lines in fields, notably Description. I'm not sure whether it's still needed with the new syntax.
It's not essential anymore though the parser would have to be modified slightly to make them optional. We'd have to continue treating '.' lines as blank for the sake of old packages. Duncan

On Thu, 2008-01-10 at 14:25 +0000, Alistair Bayley wrote:
the comment block (like cabal does) - don't indent code relative to comments, because Haddock doesn't like this
Hmm, we will have to find another solution to this because the H98 unlit spec clearly states that '>' is to be replaced with a ' ', not just deleted. So if haddock barfs on code like: -- a comment some code then either we should fix haddock, or perhaps indent the comments too, eg: -- the comment the code Since we expect to use this unlit code more generally in future we do need to make sure it is a compatible extension of the H98 unlit spec. Duncan

Duncan Coutts
On Thu, 2008-01-10 at 14:25 +0000, Alistair Bayley wrote:
the comment block (like cabal does) - don't indent code relative to comments, because Haddock doesn't like this
Hmm, we will have to find another solution to this because the H98 unlit spec clearly states that '>' is to be replaced with a ' ', not just deleted. So if haddock barfs on code like:
-- a comment
some code
then either we should fix haddock, or perhaps indent the comments too, eg:
-- the comment
the code
Since we expect to use this unlit code more generally in future we do need to make sure it is a compatible extension of the H98 unlit spec.
I haven't looked at this in a while, but I reckon that Haddock does the wrong thing wrt layout -- in my hacked-up unlit I replace leading ">" with " " and assume that all code lines have at least one space after the ">", so use " {-" on the blank line separating code from comment and just "-}" on the line between comment and code. This was all to stop haddock giving parse errors in what looked like reasonable layouts. Even using illiterate Haskell I find haddock (at least up to 0.8) to be unreasonable picky about indentation of comments, so I think that's where the change should be made. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Thu, 2008-01-31 at 14:43 +0000, Duncan Coutts wrote:
On Thu, 2008-01-10 at 14:25 +0000, Alistair Bayley wrote:
the comment block (like cabal does) - don't indent code relative to comments, because Haddock doesn't like this
Hmm, we will have to find another solution to this because the H98 unlit spec clearly states that '>' is to be replaced with a ' ', not just deleted. So if haddock barfs on code like:
-- a comment
some code
So I downloaded takusen and tried this. Haddock does indeed barf. The problem is worse though because we do not know how much to indent by to fix things. a comment
some code
a comment
some code
These are both valid literate Haskell. If we make comments indent by one or two spaces then we break one or the other of these examples. The latter style with two spaces is more common so we should probably go with that until haddock can be fixed. I'm cc'ing David for his opinion on the matter. Duncan

On Sat, 2008-02-02 at 02:47 +0000, Duncan Coutts wrote:
So I downloaded takusen and tried this. Haddock does indeed barf. The problem is worse though because we do not know how much to indent by to fix things.
I'm cc'ing David for his opinion on the matter.
So the problem is actually quite tricky. If we follow the unlit spec then we must replace '>' by ' '. It is tempting to replace it by "" but that is not enough, we would have to replace "> " by "" too and that is definitely wrong because it would break code like this:
topLevel = ... where ...
We'd end up breaking this code by unliting it to: topLevel = ... where ... So I think we really have to follow the unlit spec. We are free to play with the indenting of comments. We can assume that most bird track style code uses "> " and most \being{code} style uses no indenting at all. So that makes things tricky, we would have to indent comments by two spaces before/after/near bird track code, and by none before/after/near latex style code. That seems possible if a bit unpleasant. Any better ideas? Duncan

On Sat, 2008-01-05 at 23:14 +0000, Alistair Bayley wrote:
Another example:
Let's say you want to produce this .hs output for Haddock:
-- | A description of x -- -- some more comments about x
x = ...
Now imagine you want to produce this .hs:
-- | A description of x
-- some comments which are interesting but we don't want Haddock to use them
x = ...
What do you write in your .lhs file to produce these outputs?
So the interim patch I just applied would produce the first output. If you wanted the second, the nearest you could get would be: | A description of x some comments which are interesting but we don't want Haddock to use them
x = ...
and this would generate: -- | A description of x -- some comments which are interesting but we don't want Haddock to use them x = ... That is, you'd put an extra blank line to separate the two comments. I'm not claiming it's necessarily a brilliant rule but it's also fairly simple. Single line breaks separate paragraphs in a comment. Double or more denote separate comments. To be honest, it's not clear to me that we need any distinction at all since people can just swap the order of the comments: -- some more comments about x -- -- | A description of x x = ... Duncan

To be honest, it's not clear to me that we need any distinction at all since people can just swap the order of the comments:
-- some more comments about x -- -- | A description of x
x = ...
Yes, this is not a good motivating example, just an example. However, a good motivating example is named chunks, which we use in Takusen's Database.Enumerator module. The chucks are all placed at the end of the file. A given chunk may contain many paragraphs, but it must consist of a single comment block. Separate chunks must be in separate comment blocks (I assume; I haven't actually tested this). Alistair

On Thu, 2008-01-31 at 15:46 +0000, Alistair Bayley wrote:
To be honest, it's not clear to me that we need any distinction at all since people can just swap the order of the comments:
-- some more comments about x -- -- | A description of x
x = ...
Yes, this is not a good motivating example, just an example. However, a good motivating example is named chunks, which we use in Takusen's Database.Enumerator module. The chucks are all placed at the end of the file. A given chunk may contain many paragraphs, but it must consist of a single comment block. Separate chunks must be in separate comment blocks (I assume; I haven't actually tested this).
Ok, yes. So how about what I just implemented, that you put two blank lines between the named chunks? Duncan

Ok, yes. So how about what I just implemented, that you put two blank lines between the named chunks?
Yes, that would work. You just need some way of separating/continuing comment blocks, whatever it is.
In particular I'd like to know how well it works for Jon Fairbairn who has .lhs code that uses haddock markup and he uses a little pre-processor to convert it.
I have a feeling his preprocessor doesn't cover this case, from a brief inspection. He uses {- -} to delimit comment sections, rather than -- for comment lines. Alistair

"Alistair Bayley"
I have a feeling his preprocessor doesn't cover this case, from a brief inspection. He uses {- -} to delimit comment sections, rather than -- for comment lines.
Yes, that's right, it doesn't permit a non-haddock comment to follow a haddock comment. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
participants (5)
-
Alistair Bayley
-
Duncan Coutts
-
Ian Lynagh
-
Jon Fairbairn
-
Ross Paterson