
I'm doing some testing with GHC 6.6.1 and Cabal 1.3, and I'm trying to figure out what happens with CPP and Cabal's unlit. I start with file Test.lhs:
{-# OPTIONS -fglasgow-exts #-} module Test where main = putStrLn "hello CPP"
and run command: ghc -E -x hs -cpp Test.lhs -o Test2.lhs which gives me Test2.lhs: {-# LINE 1 "Test.lhs" #-} # 1 "Test.lhs" # 1 "<built-in>" # 1 "<command line>" # 1 "Test.lhs"
{-# OPTIONS -fglasgow-exts #-} module Test where main = putStrLn "hello CPP"
So I'm wondering: where does the {-# LINE #-} comment come from, and also the # 1 lines? AFAICT the # 1 lines are ignored by GHC; I can compile Test2.lhs without errors. Is there anything in GHC's docs about this? More puzzling is that the files that Cabal runs through ghc's CPP don't get the # n lines, so we end up with something like this: {-# LINE 1 "Test.lhs" #-}
{-# OPTIONS -fglasgow-exts #-} module Test where main = putStrLn "hello CPP"
which is not a valid .lhs file, because we have a code line next to a comment. I also note in Cabal the haddock command runs CPP before unlit. GHC does it the other way around i.e. run unlit first then CPP, and I'm wondering if Cabal shouldn't do the same thing? Thanks, Alistair

More puzzling is that the files that Cabal runs through ghc's CPP don't get the # n lines, so we end up with something like this:
(Answering my own message) Having done some more testing with ghc-6.8.1 and ghc-6.6.1 and cabal's 1.1.6.2 and 1.3, I've realised that the cpp optP-P option in Cabal-1.3 is suppressing the # n lines, so that means the {-# LINE 1 "Test.lhs" #-} comment does indeed end up immediately preceding the first real line of the program (thus causing unlit to spit the dummy). I've also noticed that the options passed from cabal-1.1.6.2 to the ghc cpp phase do NOT include -x hs, so ghc unlits the file before cabal then tries to unlit it. Surely this cannot work, and indeed it does not, because the resulting .hs file contains no code. I'm of a mind to fix two things in cabal: - the haddock command runs unlit first, THEN cpp - the unlit module preserves comments, for the benefit of haddock I already have these done in my local Cabal-1.3, so creating patches ought to be straightforward. I've only tested with ghc on Windows though. Comments? Thanks, Alistair

On Wed, 2007-11-28 at 22:37 +0000, Alistair Bayley wrote:
More puzzling is that the files that Cabal runs through ghc's CPP don't get the # n lines, so we end up with something like this:
(Answering my own message)
Having done some more testing with ghc-6.8.1 and ghc-6.6.1 and cabal's 1.1.6.2 and 1.3, I've realised that the cpp optP-P option in Cabal-1.3 is suppressing the # n lines, so that means the {-# LINE 1 "Test.lhs" #-} comment does indeed end up immediately preceding the first real line of the program (thus causing unlit to spit the dummy).
I've also noticed that the options passed from cabal-1.1.6.2 to the ghc cpp phase do NOT include -x hs, so ghc unlits the file before cabal then tries to unlit it. Surely this cannot work, and indeed it does not, because the resulting .hs file contains no code.
I'm of a mind to fix two things in cabal: - the haddock command runs unlit first, THEN cpp - the unlit module preserves comments, for the benefit of haddock
I already have these done in my local Cabal-1.3, so creating patches ought to be straightforward. I've only tested with ghc on Windows though.
Comments?
Sounds great. Send your patches to cabal-devel@haskell.org. You may also like to subscribe to that mailing list. Duncan

I'm of a mind to fix two things in cabal: - the haddock command runs unlit first, THEN cpp - the unlit module preserves comments, for the benefit of haddock
Sounds great. Send your patches to cabal-devel@haskell.org. You may also like to subscribe to that mailing list.
Have subscribed. Also, have a question about Cabal's unlit. I intend to change it's behaviour slightly, so that lines that are not completely empty (e.g. a single space) are treated as comment lines, rather than blanks. This makes it possible to write Haddock sections that contain paragraphs e.g. Bullet list: <space> * bullet 1 <space> * bullet 2 becomes -- Bullet list: -- -- * bullet 1 -- -- * bullet 2 which is what Haddock requires to render a bullet list. This change might break programs that have otherwise blank lines containing spaces next to program lines. However, given that (AFAICT) unlit is only used in generating Haddock docs, and this is currently broken anyway for .lhs files, this doesn't seem like much of a concern. Another possibility is to also relax (well, disable) the "program-line-next-to-comment-line" test in unlit. Again, we're only generating Haddock docs, so no major loss there. And this would permit programs that have blank-lines-with-spaces next to program lines. Alistair

On Thu, 2007-11-29 at 11:18 +0000, Alistair Bayley wrote:
I'm of a mind to fix two things in cabal: - the haddock command runs unlit first, THEN cpp - the unlit module preserves comments, for the benefit of haddock
Sounds great. Send your patches to cabal-devel@haskell.org. You may also like to subscribe to that mailing list.
Have subscribed. Also, have a question about Cabal's unlit. I intend to change it's behaviour slightly, so that lines that are not completely empty (e.g. a single space) are treated as comment lines, rather than blanks. This makes it possible to write Haddock sections that contain paragraphs e.g.
You may also like to consider a previous discussion on this issue: http://www.haskell.org/pipermail/cabal-devel/2007-August/thread.html#725 Jón Fairbairn sent in some code he uses for turning .lhs files into something that haddock can understand. It may or may not be helpful to you.
Bullet list: <space> * bullet 1 <space> * bullet 2
becomes
-- Bullet list: -- -- * bullet 1 -- -- * bullet 2
which is what Haddock requires to render a bullet list.
Sounds reasonable.
This change might break programs that have otherwise blank lines containing spaces next to program lines. However, given that (AFAICT) unlit is only used in generating Haddock docs, and this is currently broken anyway for .lhs files, this doesn't seem like much of a concern.
True, though in future we may want to use it more widely.
Another possibility is to also relax (well, disable) the "program-line-next-to-comment-line" test in unlit. Again, we're only generating Haddock docs, so no major loss there. And this would permit programs that have blank-lines-with-spaces next to program lines.
I'm not sure I fully understand all this. What is required in H98? It requires blank lines next to program text at least with > bird track style. Is the question simply whether a line with only spaces counts as a blank line? Seems to me we should follow H98 whatever it says, and/or the current interpretation of implementations like ghc & hugs. Duncan

(Posting from a different account because your reply hasn't reach google mail yet)
You may also like to consider a previous discussion on this issue: http://www.haskell.org/pipermail/cabal-devel/2007-August/thread.html#725 Jón Fairbairn sent in some code he uses for turning .lhs files into something that haddock can understand.
I also have a program which I use to unlit Takusen's source: http://darcs.haskell.org/takusen/Bird2Hs.hs But it is more lenient than the H98 standard for .lhs files. It's pretty easy to modify the existing Unlit module in Cabal, so I'll do that.
Another possibility is to also relax (well, disable) the "program-line-next-to-comment-line" test in unlit.
I'm not sure I fully understand all this. What is required in H98? It requires blank lines next to program text at least with > bird track style. Is the question simply whether a line with only spaces counts as a blank line? Seems to me we should follow H98 whatever it says, and/or the current interpretation of implementations like ghc & hugs.
I think that's the real question: should Cabal's unlit be faithful to the H98 standard, or should it be treated simply as a Haddock preprocessor? (and therefore can be a little lax) H98 does require that code and comments are properly separated by blank lines, where a blank line consists only of whitespace: http://haskell.org/onlinereport/syntax-iso.html#sect9.4 So what I've proposed would be a deviation from the standard. I do need a way of ensuring that I can write paragraphs in contiguous comment blocks. e.g. so this | para 1 para 2 becomes -- | -- para 1 -- -- para 2 rather than -- | -- para 1 -- para 2 (Haddock will ignore the second paragraph). Perhaps using a single period on a line as a convention for a blank comment line? e.g. | para 1 . para 2 Or maybe I can enhance the unlit algorithm so that it can detect whether or not a blank line precedes a program line and behave accordingly. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

On Thu, 2007-11-29 at 13:14 +0000, Bayley, Alistair wrote:
I think that's the real question: should Cabal's unlit be faithful to the H98 standard, or should it be treated simply as a Haddock preprocessor? (and therefore can be a little lax)
H98 does require that code and comments are properly separated by blank lines, where a blank line consists only of whitespace: http://haskell.org/onlinereport/syntax-iso.html#sect9.4 So what I've proposed would be a deviation from the standard.
I do need a way of ensuring that I can write paragraphs in contiguous comment blocks. e.g. so this
| para 1
para 2
becomes
-- | -- para 1
-- para 2
(Haddock will ignore the second paragraph).
Yes, I see. how about using {- -} brackets instead. Would that work better?
Or maybe I can enhance the unlit algorithm so that it can detect whether or not a blank line precedes a program line and behave accordingly.
It would be nice if we can find an unlit extension that is compatible with the H98 requirement and alo allows useful behaviour for haddock. Duncan

-- | -- para 1
-- para 2
(Haddock will ignore the second paragraph).
Yes, I see. how about using {- -} brackets instead. Would that work better?
No, because you still need a way to delimit the Haddock comment. Sometimes you might want to write more comments, but not have them be part of the Haddock comment. Two blank lines, maybe?
It would be nice if we can find an unlit extension that is compatible with the H98 requirement and alo allows useful behaviour for haddock.
I'll see what I can do. Alistair

It would be nice if we can find an unlit extension that is compatible with the H98 requirement and alo allows useful behaviour for haddock.
I have completed changes to the Unlit module, and also a change to the Haddock module (to run unlit before cpp). darcs patches attached. I'd love to know what the official approach to testing is. I have written a test module which exercises the Unlit module, but I'm not sure how to tie it in to any existing test infrastructure. At the moment I just have it sitting in tests/unlit. Thanks, Alistair
participants (3)
-
Alistair Bayley
-
Bayley, Alistair
-
Duncan Coutts