
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Trying to implement literate haskell[*], I realized several ways in which the correct behavior for unliterating (especially with regard to errors) was unclear. I have several cases which ghc, hugs and Haskell 98 have differing opinions on! The Report as it stands is far from a clear and complete specification (and I didn't find anything in the Haskell' wiki/trac about literate haskell). [*](particularly, to make DrIFT able to deal with TeX-style lhs - there's unfinished work in darcs repo http://isaac.cedarswampstudios.org/2007/DrIFT/ ) testing with: ghc: 6.4.2, 6.6(some) hugs: Hugs Version 20050308 nhc98: recent darcs (1.19) report: Haskell 98 (The Revised Report: December 2002), section 9.4, http://www.haskell.org/onlinereport/syntax-iso.html#sect9.4 A full set of .lhs test files for all the issues: darcs get http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests or download http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests-1.tar.gz or you can try just prefixing all examples with \begin{code} module Main where main = print str \end{code} or
module Main where main = print str as appropriate... (please don't get mangled by mail programs, initial '>'s... ):
1.[UnmatchedBegin] If a \begin{code} starts a section of code, is \end{code} _required_ before the end of the file? report: unclear ghc: required hugs, nhc98: not required The report says "entirely enclosed between", but goes on to say "More precisely:" and give a description that is not at all precise in the matter of this question. 2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}] Can a line beginning \begin{code} or \end{code} have additional stuff on the end, where the directive is understood and the additional stuff is ignored? report:[yes] hugs:[yesIffAdditionalStuffIsInvisible] ghc:[case beginningOfLine of "\end{code}" -> yes "\begin{code}" -> yesIffAdditionalStuffIsInvisible] nhc98:[UNLIT_IGNORED] where yesIffAdditionalStuffIsInvisible = if (all isSpace additionalStuff) then yes else UNLIT_IGNORED UNLIT_IGNORED means that if it was inside a code block then the line is treated as program text (so it's probably a syntax error) and if it was in a literate comment section it is treated as a non-empty literate comment line. Note that it takes a careful reading of the report: for begin, program code only begins on the _following_ line. Most seem to agree that it shouldn't mess up your program to have trailing whitespace on such a line (but at least nhc98 doesn't currently implement this). Is there any reason to allow NON-whitespace in that location? 3.[IgnoringStringLiterals/{A,B}] what does "(ignoring string literals, of course)" mean? that the following(A) makes str = "string gap:end{code}" and an unended code block(A), or that it makes an ended code block(B)? (A)--------- \begin{code} str = "string gap:\ \end{code}" - --------- report:unclear, hugs:A, ghc:B, nhc98:A This works for ghc, the result being "string gap:string gap ends": (B)--------- \begin{code} str = "string gap:\ \end{code}" \begin{code} \string gap ends" \end{code} - ----------- Note that behavior 1 requires a detailed knowledge of Haskell's syntax in order to unliterate a file, for a dubious benefit (if a string literal with string gaps is used like that, the programmer could just indent the second line!) 4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}] What happens if \begin{code} appears after another \begin{code} before an \end{code}; and what happens if an \end{code} appears without a code block previously having been started by a \begin{code}? stray end: ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)] hugs:[error "\end{code} encountered outside code block"] stray begin: ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)] hugs:[error "\begin{code} encountered inside code block"] 5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}] Can lexical units jump across literate comment gaps? report, ghc, hugs, nhc98: yes... Note that the Report specifies it by removing all non-program lines, rather than converting them to blank lines, but an additional blank line in the middle of a Haskell program NEVER makes a difference (except for line numbering, of course). - ----------
str = "string gap:\
This might be a literate comment.
\ends here"
ghc, hugs, nhc98: "string gap:ends here" or - --------
str = "string" {- a comment
This might be a literate comment -} with weird character sequences.
ends here -}
ghc, hugs, nhc98: think it's a fine comment I mention this because allowing these makes it complicated to preserve literate comments in a translation to .hs, because, other than cases like these, prefixing literate comment lines with "-- " works fine.[*] However, banning these could make processing that wants to report errors end up more complicated. Maybe the report could/should say that it is "not advisable", as it does for mixing '>' and {code} styles? (Also it's confusing to the programmer - I wondered "can I (and should I) really do that?!" sometimes..) [*]Haddock style is a nuisance too, which is why there are two spaces added -- Haddock seems not to recognize such comments then, as desired. Or would it be better to take the other approach and say those should count as haddock comments? 6.[TeXBirdtrack/] I understand that "It is not advisable to mix these two styles in the same file." and the report doesn't even talk about how they mix, but now that I've gotten started on the implementation inconsistencies... Actually, despite the Report's advice against it, there seems to be a consensus on what the meaning of mixing the two styles is, which I'll describe below: Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank for the purposes of '>'-style comment checking (which is that a code and a non-blank literate comment line can't be adjacent); this works: [TeXBirdtrack/NoLayout]------------
module Main where {main = print str \begin{code} ;str = "string"} \end{code} ok
Note I didn't rely on the layout rule. This should work: [TeXBirdtrack/AlignedLayout]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
It does in hugs and nhc98, and according to http://hackage.haskell.org/trac/ghc/ticket/210 it does in GHC HEAD now (6.7) as well. As another example, this doesn't work, for the same reason that you can't start a line with '>' in a .hs file: [TeXBirdtrack/Wrong]------------
module Main where main = print str \begin{code} str = "string" \end{code} ok
Hoping to start some discussion, Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF5gaoHgcxvIWYTTURAoF4AJwIjQ3hJ9jpwUgHiYgTB7IhN2so4QCdGCKU 96q4YIeakWtlBKOdAiFM+vU= =qzCQ -----END PGP SIGNATURE-----