Literate haskell format unclear (implementation and specification inconsistencies)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell). [*](particularly, to make DrIFT able to deal with TeX-style lhs - there's unfinished work in darcs repo http://isaac.cedarswampstudios.org/2007/DrIFT/ ) testing with: ghc: 6.4.2, 6.6(some) hugs: Hugs Version 20050308 nhc98: recent darcs (1.19) report: Haskell 98 (The Revised Report: December 2002), section 9.4, http://www.haskell.org/onlinereport/syntax-iso.html#sect9.4 A full set of .lhs test files for all the issues: darcs get http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests or download http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests-1.tar.gz or you can try just prefixing all examples with \begin{code} module Main where main = print str \end{code} or
module Main where main = print str as appropriate... (please don't get mangled by mail programs, initial '>'s... ):
1.[UnmatchedBegin] If a \begin{code} starts a section of code, is \end{code} _required_ before the end of the file? report: unclear ghc: required hugs, nhc98: not required The report says "entirely enclosed between", but goes on to say "More precisely:" and give a description that is not at all precise in the matter of this question. 2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}] Can a line beginning \begin{code} or \end{code} have additional stuff on the end, where the directive is understood and the additional stuff is ignored? report:[yes] hugs:[yesIffAdditionalStuffIsInvisible] ghc:[case beginningOfLine of "\end{code}" -> yes "\begin{code}" -> yesIffAdditionalStuffIsInvisible] nhc98:[UNLIT_IGNORED] where yesIffAdditionalStuffIsInvisible = if (all isSpace additionalStuff) then yes else UNLIT_IGNORED UNLIT_IGNORED means that if it was inside a code block then the line is treated as program text (so it's probably a syntax error) and if it was in a literate comment section it is treated as a non-empty literate comment line. Note that it takes a careful reading of the report: for begin, program code only begins on the _following_ line. Most seem to agree that it shouldn't mess up your program to have trailing whitespace on such a line (but at least nhc98 doesn't currently implement this). Is there any reason to allow NON-whitespace in that location? 3.[IgnoringStringLiterals/{A,B}] what does "(ignoring string literals, of course)" mean? that the following(A) makes str = "string gap:end{code}" and an unended code block(A), or that it makes an ended code block(B)? (A)--------- \begin{code} str = "string gap:\ \end{code}" - --------- report:unclear, hugs:A, ghc:B, nhc98:A This works for ghc, the result being "string gap:string gap ends": (B)--------- \begin{code} str = "string gap:\ \end{code}" \begin{code} \string gap ends" \end{code} - ----------- Note that behavior 1 requires a detailed knowledge of Haskell's syntax in order to unliterate a file, for a dubious benefit (if a string literal with string gaps is used like that, the programmer could just indent the second line!) 4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}] What happens if \begin{code} appears after another \begin{code} before an \end{code}; and what happens if an \end{code} appears without a code block previously having been started by a \begin{code}? stray end: ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)] hugs:[error "\end{code} encountered outside code block"] stray begin: ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)] hugs:[error "\begin{code} encountered inside code block"] 5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}] Can lexical units jump across literate comment gaps? report, ghc, hugs, nhc98: yes... Note that the Report specifies it by removing all non-program lines, rather than converting them to blank lines, but an additional blank line in the middle of a Haskell program NEVER makes a difference (except for line numbering, of course). - ----------
str = "string gap:\
This might be a literate comment.
\ends here"
ghc, hugs, nhc98: "string gap:ends here" or - --------
str = "string" {- a comment
This might be a literate comment -} with weird character sequences.
ends here -}
ghc, hugs, nhc98: think it's a fine comment I mention this because allowing these makes it complicated to preserve literate comments in a translation to .hs, because, other than cases like these, prefixing literate comment lines with "-- " works fine.[*] However, banning these could make processing that wants to report errors end up more complicated. Maybe the report could/should say that it is "not advisable", as it does for mixing '>' and {code} styles? (Also it's confusing to the programmer - I wondered "can I (and should I) really do that?!" sometimes..) [*]Haddock style is a nuisance too, which is why there are two spaces added -- Haddock seems not to recognize such comments then, as desired. Or would it be better to take the other approach and say those should count as haddock comments? 6.[TeXBirdtrack/] I understand that "It is not advisable to mix these two styles in the same file." and the report doesn't even talk about how they mix, but now that I've gotten started on the implementation inconsistencies... Actually, despite the Report's advice against it, there seems to be a consensus on what the meaning of mixing the two styles is, which I'll describe below: Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank for the purposes of '>'-style comment checking (which is that a code and a non-blank literate comment line can't be adjacent); this works: [TeXBirdtrack/NoLayout]------------
module Main where {main = print str \begin{code} ;str = "string"} \end{code} ok
Note I didn't rely on the layout rule. This should work: [TeXBirdtrack/AlignedLayout]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
It does in hugs and nhc98, and according to http://hackage.haskell.org/trac/ghc/ticket/210 it does in GHC HEAD now (6.7) as well. As another example, this doesn't work, for the same reason that you can't start a line with '>' in a .hs file: [TeXBirdtrack/Wrong]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
Hoping to start some discussion, Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF5gaoHgcxvIWYTTURAoF4AJwIjQ3hJ9jpwUgHiYgTB7IhN2so4QCdGCKU 96q4YIeakWtlBKOdAiFM+vU= =qzCQ -----END PGP SIGNATURE-----

Hi Isaac
Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell).
If you look at Yhc and nhc they both took the unlit code straight out of the Haskell 1.something report, so I guess could be treated as the official spec of the last version that had a concrete one. Thanks Neil
[*](particularly, to make DrIFT able to deal with TeX-style lhs - there's unfinished work in darcs repo http://isaac.cedarswampstudios.org/2007/DrIFT/ )
testing with: ghc: 6.4.2, 6.6(some) hugs: Hugs Version 20050308 nhc98: recent darcs (1.19) report: Haskell 98 (The Revised Report: December 2002), section 9.4, http://www.haskell.org/onlinereport/syntax-iso.html#sect9.4
A full set of .lhs test files for all the issues: darcs get http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests or download http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests-1.tar.gz or you can try just prefixing all examples with \begin{code} module Main where main = print str \end{code} or
module Main where main = print str as appropriate... (please don't get mangled by mail programs, initial '>'s... ):
1.[UnmatchedBegin] If a \begin{code} starts a section of code, is \end{code} _required_ before the end of the file? report: unclear ghc: required hugs, nhc98: not required The report says "entirely enclosed between", but goes on to say "More precisely:" and give a description that is not at all precise in the matter of this question.
2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}] Can a line beginning \begin{code} or \end{code} have additional stuff on the end, where the directive is understood and the additional stuff is ignored? report:[yes] hugs:[yesIffAdditionalStuffIsInvisible] ghc:[case beginningOfLine of "\end{code}" -> yes "\begin{code}" -> yesIffAdditionalStuffIsInvisible] nhc98:[UNLIT_IGNORED] where yesIffAdditionalStuffIsInvisible = if (all isSpace additionalStuff) then yes else UNLIT_IGNORED UNLIT_IGNORED means that if it was inside a code block then the line is treated as program text (so it's probably a syntax error) and if it was in a literate comment section it is treated as a non-empty literate comment line. Note that it takes a careful reading of the report: for begin, program code only begins on the _following_ line. Most seem to agree that it shouldn't mess up your program to have trailing whitespace on such a line (but at least nhc98 doesn't currently implement this). Is there any reason to allow NON-whitespace in that location?
3.[IgnoringStringLiterals/{A,B}] what does "(ignoring string literals, of course)" mean? that the following(A) makes str = "string gap:end{code}" and an unended code block(A), or that it makes an ended code block(B)? (A)--------- \begin{code} str = "string gap:\ \end{code}" - --------- report:unclear, hugs:A, ghc:B, nhc98:A This works for ghc, the result being "string gap:string gap ends": (B)--------- \begin{code} str = "string gap:\ \end{code}"
\begin{code} \string gap ends" \end{code} - ----------- Note that behavior 1 requires a detailed knowledge of Haskell's syntax in order to unliterate a file, for a dubious benefit (if a string literal with string gaps is used like that, the programmer could just indent the second line!)
4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}] What happens if \begin{code} appears after another \begin{code} before an \end{code}; and what happens if an \end{code} appears without a code block previously having been started by a \begin{code}? stray end: ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)] hugs:[error "\end{code} encountered outside code block"] stray begin: ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)] hugs:[error "\begin{code} encountered inside code block"]
5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}] Can lexical units jump across literate comment gaps? report, ghc, hugs, nhc98: yes... Note that the Report specifies it by removing all non-program lines, rather than converting them to blank lines, but an additional blank line in the middle of a Haskell program NEVER makes a difference (except for line numbering, of course). - ----------
str = "string gap:\
This might be a literate comment.
\ends here"
ghc, hugs, nhc98: "string gap:ends here" or - --------
str = "string" {- a comment
This might be a literate comment -} with weird character sequences.
ends here -}
ghc, hugs, nhc98: think it's a fine comment I mention this because allowing these makes it complicated to preserve literate comments in a translation to .hs, because, other than cases like these, prefixing literate comment lines with "-- " works fine.[*] However, banning these could make processing that wants to report errors end up more complicated. Maybe the report could/should say that it is "not advisable", as it does for mixing '>' and {code} styles? (Also it's confusing to the programmer - I wondered "can I (and should I) really do that?!" sometimes..)
[*]Haddock style is a nuisance too, which is why there are two spaces added -- Haddock seems not to recognize such comments then, as desired. Or would it be better to take the other approach and say those should count as haddock comments?
6.[TeXBirdtrack/] I understand that "It is not advisable to mix these two styles in the same file." and the report doesn't even talk about how they mix, but now that I've gotten started on the implementation inconsistencies... Actually, despite the Report's advice against it, there seems to be a consensus on what the meaning of mixing the two styles is, which I'll describe below:
Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank for the purposes of '>'-style comment checking (which is that a code and a non-blank literate comment line can't be adjacent); this works: [TeXBirdtrack/NoLayout]------------
module Main where {main = print str \begin{code} ;str = "string"} \end{code} ok
Note I didn't rely on the layout rule. This should work: [TeXBirdtrack/AlignedLayout]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
It does in hugs and nhc98, and according to http://hackage.haskell.org/trac/ghc/ticket/210 it does in GHC HEAD now (6.7) as well. As another example, this doesn't work, for the same reason that you can't start a line with '>' in a .hs file: [TeXBirdtrack/Wrong]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
Hoping to start some discussion, Isaac
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF5gaoHgcxvIWYTTURAoF4AJwIjQ3hJ9jpwUgHiYgTB7IhN2so4QCdGCKU 96q4YIeakWtlBKOdAiFM+vU= =qzCQ -----END PGP SIGNATURE----- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Feb 28, 2007 at 05:48:09PM -0500, Isaac Dupree wrote:
Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell).
Hmm, some of this came up around the time the revised report was being written: http://www.haskell.org/pipermail/haskell/2001-December/008549.html http://www.haskell.org/pipermail/haskell/2001-December/008550.html but oddly doesn't seem to have been clarified in the report. We should definitely make sure that Haskell' does so!
1.[UnmatchedBegin] If a \begin{code} starts a section of code, is \end{code} _required_ before the end of the file?
I would say yes.
2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}] Can a line beginning \begin{code} or \end{code} have additional stuff on the end, where the directive is understood and the additional stuff is ignored?
I would say yesIffAdditionalStuffIsInvisible (although I wouldn't object to "no"; trailing white space makes me sad). And nothing may precede "\begin{code}" or "\end{code}".
3.[IgnoringStringLiterals/{A,B}] what does "(ignoring string literals, of course)" mean? that the following(A) makes str = "string gap:end{code}" and an unended code block(A), or that it makes an ended code block(B)? (A)--------- \begin{code} str = "string gap:\ \end{code}"
I didn't follow your question, but I think that in order to allow things to be nicely compositional \begin{code} str = "string gap:\ \end{code}" \end{code} should be rejected by the unlitter for having trailing characters following "\end{code}". Did that answer it?
4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}] What happens if \begin{code} appears after another \begin{code} before an \end{code}; and what happens if an \end{code} appears without a code block previously having been started by a \begin{code}? stray end: ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)] hugs:[error "\end{code} encountered outside code block"] stray begin: ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)] hugs:[error "\begin{code} encountered inside code block"]
I agree with hugs.
5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}] Can lexical units jump across literate comment gaps? report, ghc, hugs, nhc98: yes...
I agree.
ghc, hugs, nhc98: think it's a fine comment
I agree.
I mention this because allowing these makes it complicated to preserve literate comments in a translation to .hs,
I don't have a problem with that; I unlit, not convertlit :-) Allowing them makes it easier to write an implementation in a compositional style.
because, other than cases like these, prefixing literate comment lines with "-- " works fine.[*] However, banning these could make processing that wants to report errors end up more complicated. Maybe the report could/should say that it is "not advisable", as it does for mixing '>' and {code} styles?
I don't object to saying it is inadvisable.
6.[TeXBirdtrack/] I understand that "It is not advisable to mix these two styles in the same file." and the report doesn't even talk about how they mix, but now that I've gotten started on the implementation inconsistencies... Actually, despite the Report's advice against it, there seems to be a consensus on what the meaning of mixing the two styles is, which I'll describe below:
Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank for the purposes of '>'-style comment checking (which is that a code and a non-blank literate comment line can't be adjacent); this works: [TeXBirdtrack/NoLayout]------------
module Main where {main = print str \begin{code} ;str = "string"} \end{code}
I don't have an opinion on whether or not this should be allowed as I don't think you should do it anyway, but you are right that it should be clearly defined.
Note I didn't rely on the layout rule. This should work: [TeXBirdtrack/AlignedLayout]------------
module Main where main = print str \begin{code} str = "string" \end{code}
Again no opinion, but should be the same answer as the previous one.
As another example, this doesn't work, for the same reason that you can't start a line with '>' in a .hs file: [TeXBirdtrack/Wrong]------------
module Main where main = print str \begin{code} str = "string" \end{code}
Right, this should not be allowed. Thanks Ian

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nice, I pretty much agree with you on everything :) Ian Lynagh wrote:
On Wed, Feb 28, 2007 at 05:48:09PM -0500, Isaac Dupree wrote:
Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell).
Hmm, some of this came up around the time the revised report was being written:
http://www.haskell.org/pipermail/haskell/2001-December/008549.html http://www.haskell.org/pipermail/haskell/2001-December/008550.html
but oddly doesn't seem to have been clarified in the report. We should definitely make sure that Haskell' does so!
1.[UnmatchedBegin] If a \begin{code} starts a section of code, is \end{code} _required_ before the end of the file?
I would say yes.
2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}] Can a line beginning \begin{code} or \end{code} have additional stuff on the end, where the directive is understood and the additional stuff is ignored?
I would say yesIffAdditionalStuffIsInvisible (although I wouldn't object to "no"; trailing white space makes me sad).
"No" would not be that bad if compiler error messages actually clearly told you "You have trailing whitespace characters on line X - you may not be able to see them, but they're there, and you should delete them!" or something equally specific.
And nothing may precede "\begin{code}" or "\end{code}".
Yes, this is already universally true in the Report as well as implementations, I believe.
3.[IgnoringStringLiterals/{A,B}] what does "(ignoring string literals, of course)" mean? that the following(A) makes str = "string gap:end{code}" and an unended code block(A), or that it makes an ended code block(B)? (A)--------- \begin{code} str = "string gap:\ \end{code}"
I didn't follow your question, but I think that in order to allow things to be nicely compositional
\begin{code} str = "string gap:\ \end{code}" \end{code}
should be rejected by the unlitter for having trailing characters following "\end{code}". Did that answer it?
Yes, your answer is at least as clear as my question... it says that, in order to be nicely compositional, the unlitter should not have to know about Haskell string syntax -- which is as in case (B) except that my example should really probably be an error anyway (as per our preferred answer on 2.[AfterBeginOrEnd]).
4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}] What happens if \begin{code} appears after another \begin{code} before an \end{code}; and what happens if an \end{code} appears without a code block previously having been started by a \begin{code}? stray end: ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)] hugs:[error "\end{code} encountered outside code block"] stray begin: ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)] hugs:[error "\begin{code} encountered inside code block"]
I agree with hugs.
Yes. It would be nice if there was nothing that required unliteration to be, semantically, top-to-bottom, and I think this answer, along with disallowing 1.[UnmatchedBegin] and the answer on 3.[IgnoringStringLiterals], defines what is allowed clearly and symmetrically (although the location of compiler error messages can vary, and we don't have standards-quality wording yet :).
5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}] Can lexical units jump across literate comment gaps? report, ghc, hugs, nhc98: yes...
I agree.
ghc, hugs, nhc98: think it's a fine comment
I agree.
I mention this because allowing these makes it complicated to preserve literate comments in a translation to .hs,
I don't have a problem with that; I unlit, not convertlit :-)
Allowing them makes it easier to write an implementation in a compositional style.
because, other than cases like these, prefixing literate comment lines with "-- " works fine.[*] However, banning these could make processing that wants to report errors end up more complicated. Maybe the report could/should say that it is "not advisable", as it does for mixing '>' and {code} styles?
I don't object to saying it is inadvisable.
6.[TeXBirdtrack/] I understand that "It is not advisable to mix these two styles in the same file." and the report doesn't even talk about how they mix, but now that I've gotten started on the implementation inconsistencies... Actually, despite the Report's advice against it, there seems to be a consensus on what the meaning of mixing the two styles is, which I'll describe below:
Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank for the purposes of '>'-style comment checking (which is that a code and a non-blank literate comment line can't be adjacent); this works: [TeXBirdtrack/NoLayout]------------
module Main where {main = print str \begin{code} ;str = "string"} \end{code}
I don't have an opinion on whether or not this should be allowed as I don't think you should do it anyway, but you are right that it should be clearly defined.
I didn't mean to suggest that it /should be/ clearly defined, I was just clearly defining it ;). But, it probably should be defined as long as it is allowed, just so there is a single reference if nothing else.
Note I didn't rely on the layout rule. This should work: [TeXBirdtrack/AlignedLayout]------------
module Main where main = print str \begin{code} str = "string" \end{code}
Again no opinion, but should be the same answer as the previous one.
As another example, this doesn't work, for the same reason that you can't start a line with '>' in a .hs file: [TeXBirdtrack/Wrong]------------
module Main where main = print str \begin{code} str = "string" \end{code}
Right, this should not be allowed.
Thanks Ian
Thanks Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF6ApKHgcxvIWYTTURAvSjAJ9EoUsTETnPhz5wpwFBY9TA4dGmFACfebzr oEcTkylavvxDoPxOAArqEdU= =z2D+ -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Isaac Dupree wrote:
we don't have standards-quality wording yet
Okay, here's a first attempt at formalizing it. It's really messy yet, and doesn't incorporate narrative material from the Haskell 98 literate comments section yet. Feedback so far? Literate programs are interpreted as a series of lines that are parsed, and each line is transformed to another line that is part of the program text. Here are some (String -> Bool) to test lines during parsing. BIRDPROG = begins with ">" BEGIN = "\begin{code}" ++ (all isSpace) END = "\end{code}" ++ (all isSpace) TEXPROG = doesn't begin with "\begin{code}" or "\end{code}" BLANKLINE = all isSpace COMMENTLINE = doesn't begin with ">", "\begin{code}" or "\end{code}" and not all isSpace BORINGLINE = BLANKLINE | COMMENTLINE = doesn't begin with ">", "\begin{code}" or "\end{code}" If we want to incorporate the "no comment lines adjacent to program lines" into the syntax here, we have comment = one or more COMMENTLINE birdprog = one or more BIRDPROG birdOrComment = birdprog | comment texprog = BEGIN (zero or more TEXPROG) END notBirdOrComment = BLANKLINE | texprog file = (zero or more notBirdOrComment) birdOrComment (zero or more ((one or more notBirdOrComment) birdOrComment)) (zero or more notBirdOrComment) which is a little ugly compared to file = zero or more (BORINGLINE | BIRDPROG | texprog) texprog = BEGIN (zero or more TEXPROG) END (Neither one expresses the requirement (not explicitly present in Haskell 98) that there must be at least one actual program line in the result. Of course, if there isn't such a requirement, the file would mean "module Main(main) where {}" which is in error anyway.) Lines judged to be BIRDPROG have the initial ">" replaced with a " ". Lines judged to be TEXPROG are retained intact. All other lines are reduced to emptiness. It is not advisable to have a single lexical unit ("lexeme" (which only includes "gap" for these purposes) or "ncomment" --- references from Haskell98 section 9.2 Lexical Syntax) that crosses a line that was a COMMENTLINE. (or, unadvisable to cross ANY line other than BIRDPROG and TEXPROG, because doing so is just weird anyway.) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF6a3zHgcxvIWYTTURAiqtAKCAXtNFueCxsTRNJpuSYuPL+6On4wCeJNQV Teswf08Pr0senEaeNRNvJsw= =ZiJL -----END PGP SIGNATURE-----

On Sat, Mar 03, 2007 at 12:18:44PM -0500, Isaac Dupree wrote:
Here are some (String -> Bool) to test lines during parsing.
I haven't looked at your definitions in detail, but I think they might be easier to follow (and, ultimately, include in the report) if they were written in a BNF style like the one used in the report. I also think it would be good to have a Haskell spec (a testsuite would also be good, but not in the report itself). I've had a quick go at hacking one up (attached) - entirely untested, but ghci -Wall is happy. If I'm lucky it might even match my answers earlier in the thread. It should be easy to alter if we decide to use different answers instead. Thanks Ian

On Fri, Mar 02, 2007 at 01:46:38AM +0000, Ian Lynagh wrote:
On Wed, Feb 28, 2007 at 05:48:09PM -0500, Isaac Dupree wrote:
Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell).
Hmm, some of this came up around the time the revised report was being written:
http://www.haskell.org/pipermail/haskell/2001-December/008549.html http://www.haskell.org/pipermail/haskell/2001-December/008550.html
but oddly doesn't seem to have been clarified in the report. We should definitely make sure that Haskell' does so!
Or perhaps we should get rid of \begin{code} and \end{code}, before someone proposes <code> and </code>.

On 2007 Mar 3, at 7:43 AM, Ross Paterson indited:
but oddly doesn't seem to have been clarified in the report. We should definitely make sure that Haskell' does so!
Or perhaps we should get rid of \begin{code} and \end{code}, before someone proposes <code> and </code>.
UGH. Since the "text" that is not inside of the \begin{code} and \end {code} is relatively unconstrained, would be it cool, or egregious, to have a comment which would permit a particular file to designate its own literacy boundaries? Bird beaks allow for simple markup, and the TeX commands all for trivial integration with (La)TeX, so would it really be all that demeaning to allow for other alternatives even if you wouldn't choose them yourself? Metaprogramming to specify this would be overkill, but constant strings would get you 95% of the way to utter generality. Anyways, thought I'd toss out a third alternative to "no change or remove TeX". --Doug

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Douglas Philips wrote:
On 2007 Mar 3, at 7:43 AM, Ross Paterson indited:
but oddly doesn't seem to have been clarified in the report. We should definitely make sure that Haskell' does so!
Or perhaps we should get rid of \begin{code} and \end{code}, before someone proposes <code> and </code>.
UGH.
Since the "text" that is not inside of the \begin{code} and \end{code} is relatively unconstrained, would be it cool, or egregious, to have a comment which would permit a particular file to designate its own literacy boundaries?
Here's an idea: A literate haskell file in TeX style has extension ".hs.tex". That way tools that recognize the .tex extension can process it as that directly, and tools that want to "decode" it remove the .tex part of the extension (analogous to ".gz" (compressed files), perhaps?) So if someone really wanted, they could define a .hs.xml format and decoder, or something (though, considering the issues of escaping characters, '<' and '&' here, it would probably be a mess, use http://enigmail.mozdev.org iD8DBQFF7dl6HgcxvIWYTTURAu1TAKCeZVGxGNvKqz79mCmV2m1KYtDnhwCfeCYq 9HVxhNtwxJoHksr/aXu9iIE= =+1Mp -----END PGP SIGNATURE-----
participants (5)
-
Douglas Philips
-
Ian Lynagh
-
Isaac Dupree
-
Neil Mitchell
-
Ross Paterson