Literate Haskell source files. How do I turn them into something I can read?

I'm trying to wrap my mind around the darcs source code as a preliminary to looking into GHC's guts. All of darcs is written as .lhs files which have bizarre mark-up in them which distracts me from the actual Haskell source I'm trying to figure out and get used to. Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup. Are there any tools which I can use to render .lhs files readable? I'm fine with having them converted into documented source (i.e. source code embedded in documentation) or as pure Haskell source (but without the huge whitespace gaps) -- but I can't figure out how to get either. -- Michael T. Richter Email: ttmrichter@gmail.com, mtr1966@hotpop.com MSN: ttmrichter@hotmail.com, mtr1966@hotmail.com; YIM: michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber: mtr1966@jabber.cn "I think it is very beautiful for the poor to accept their lot [...]. I think the world is being much helped by the suffering of the poor people." --Mother Theresa

On 30/12/06, Michael T. Richter
I'm trying to wrap my mind around the darcs source code as a preliminary to looking into GHC's guts. All of darcs is written as .lhs files which have bizarre mark-up in them which distracts me from the actual Haskell source I'm trying to figure out and get used to. Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
Are there any tools which I can use to render .lhs files readable? I'm fine with having them converted into documented source (i.e. source code embedded in documentation) or as pure Haskell source (but without the huge whitespace gaps) -- but I can't figure out how to get either.
Assuming that it's LaTeX-based literate source, you usually run pdflatex on it to get a pdf of the code, but I'm not familiar with the darcs code in particular, and whether anything special needs to be done, or whether they have a specialised build for that. - Cale

On Sat, 2006-30-12 at 02:57 -0500, Cale Gibbard wrote:
Assuming that it's LaTeX-based literate source, you usually run pdflatex on it to get a pdf of the code, but I'm not familiar with the darcs code in particular, and whether anything special needs to be done, or whether they have a specialised build for that.
It appears to be the same markup used in the GHC compiler source code
(which does not bode well for my future reading of the GHC source
either). Running it on the darcs source code generates several dozen
pages (I'm not exaggerating!) of error messages and no dvi, ps or pdf
files. Playing around with various command line options doesn't help.
Running it on the GHC source code generates simpler error messages, but
error messages nonetheless. Then it dumps me in some kind of
interactive mode. Here's some sample output:
=====8<=====
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./CgCon.lhs
LaTeX2e <2003/12/01>
Babel

Hi, I also dislike Haskell code that contains LaTeX macros as it makes reading the comments more difficult (and I know both Haskell and LaTeX). Also converting the Haskell code to pdf is probably not a good option because you cannot use all the usual tools in your editor: I don't read code in the same way as I read a book. Anyways, I would leave the source alone but here is how you can determine which parts are code and which are comments (I also find it useful to use the highlighting of my editor which highlights comments and code differently). Literate Haskell scripts usually have the extension .lhs and in them the convention of what is code and what is comment is reversed: everything is a comment by default and code is marked specially. There are two ways to mark code. Using the one markup, code lines start with >. For example: This is a comment but the lines bellow contain code:
main = print "This is code" -- a normal comment within a code block pi = 3.14
This is a comment again. The second way to mark code is to place it
between \begin{code} and \end{code}. For example:
\begin{code}
main = print "This is code"
pi = 3.14
\end{code}
An here we have comments again. Personally, I prefer the first form
of markup but, I guess, other people like the second form so Haskell
provides both, which may be confusing,
Hope this helps and Happy New Year to everyone!
-Iavor
On 12/31/06, Michael T. Richter
On Sat, 2006-30-12 at 02:57 -0500, Cale Gibbard wrote:
Assuming that it's LaTeX-based literate source, you usually run pdflatex on it to get a pdf of the code, but I'm not familiar with the darcs code in particular, and whether anything special needs to be done, or whether they have a specialised build for that.
It appears to be the same markup used in the GHC compiler source code (which does not bode well for my future reading of the GHC source either). Running it on the darcs source code generates several dozen pages (I'm not exaggerating!) of error messages and no dvi, ps or pdf files. Playing around with various command line options doesn't help. Running it on the GHC source code generates simpler error messages, but error messages nonetheless. Then it dumps me in some kind of interactive mode. Here's some sample output:
=====8<===== This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) entering extended mode (./CgCon.lhs LaTeX2e <2003/12/01> Babel
and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. ! Undefined control sequence. l.4 \section [CgCon]{Code generation for constructors} ? =====8<===== I don't know LaTeX (if that's what this is) at all and I don't know Haskell sufficiently comfortably to actually distinguish reliably between LaTeX code and Haskell, so the direct .lhs source code is basically useless to me. What's the trick people use to read it?
-- Michael T. Richter Email: ttmrichter@gmail.com, mtr1966@hotpop.com MSN: ttmrichter@hotmail.com, mtr1966@hotmail.com; YIM: michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber: mtr1966@jabber.cn
"Thanks to the Court's decision, only clean Indians or colored people other than Kaffirs, can now travel in the trams." --Mahatma Gandhi
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, 2006-31-12 at 16:52 -0800, Iavor Diatchki wrote:
I also dislike Haskell code that contains LaTeX macros as it makes reading the comments more difficult (and I know both Haskell and LaTeX). Also converting the Haskell code to pdf is probably not a good option because you cannot use all the usual tools in your editor: I don't read code in the same way as I read a book.
The PDF isn't ideal, but it is better than what I've got now -- a PDF file would be readable, you see. Part of the problem with trying to read the .lhs files (the ones not marked with the simpler markup '>') is that the comments sections -- the part that's supposed to document the code to make it understandable -- seems to be threaded with macro substitution calls (or whatever it is called in latex). So I'll see something like this: In favour of omitting \tr{!B!}, \tr{!C!}: - {\em May} save a heap overflow test, if ...A... allocates anything. The other advantage of this is that we can use relative addressing from a single Hp to get at all the closures so allocated. Looking at this I'm seeing what appears to be some kind of latex variable name being expanded with what looks like a macro called "\tr". So what is this mysterious "B"? I have no idea. The code blocks both before and after this comment don't seem to show anything that this B-expansion would be turned into. At least if I could get the PDF output the macro expansion would be replaced with whatever name is in B's stead.
There are two ways to mark code. Using the one markup, code lines start with >.
This format I'm familiar with and can read readily. (I don't see the point of it -- what does this buy me that -{ }- blocks don't? -- but I can read it without too much difficulty.)
This is a comment again. The second way to mark code is to place it between \begin{code} and \end{code}. For example:
\begin{code} main = print "This is code" pi = 3.14 \end{code}
An here we have comments again.
This is the stuff that's hurting. The problem is that to my unschooled-in-Haskell eyes, the markup and the Haskell source blur together into a soup of executable line noise. The comments prove unhelpful because of the macro expansions I can't decode (like that "B" thing above) and the actual Haskell source can easily get lost in the mix. In the darcs source code, for example, there will be literally pages of comments (the user manual is part of the source code) with a little five-line block of code here, a ten-line block there. It's very easy to overlook the code in the sea of macro-expanded commentary. Being able to take the source and run it through something that formats the comments and the code differently and clearly (and expanding macros along the way) would render the whole thing far more readable (read: readable) even if I do lose the ability to easily navigate through the source to modify it for experiments, etc. The ideal world would be something that expands the latex code -- macro expansion in particular -- and strips the formatting code so that the code and the commentary are clearly separated but both are readable in a plain text editor. A good second place would be a way to actually take these .lhs files and make them PDFs (or DVIs or PSs or even HTMLs) so that at least the macros get expanded and the comments aren't interrupted by formatting code. A distant third place would be to strip the comments away and leave just the raw code behind -- but without the long, distracting gaps that unlit leaves. Now the distant third I can do thanks to the (excessively snarky IMO) comment Tony Finch left behind. I would like to know, however, if there is any way for me to get my second-place or even first-place options filled. Like a working command line for pdflatex? Or something better? And me? I'm going to use XML for literate Haskell. ;) -- Michael T. Richter Email: ttmrichter@gmail.com, mtr1966@hotpop.com MSN: ttmrichter@hotmail.com, mtr1966@hotmail.com; YIM: michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber: mtr1966@jabber.cn "I have no purpose, directly or indirectly, to interfere with the institution of slavery in the States where it exists." --Abraham Lincoln

<snip>
interactive mode. Here's some sample output:
=====8<===== This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) entering extended mode (./CgCon.lhs LaTeX2e <2003/12/01> Babel
and hyphenation patterns for american, french, german, ngerman, b ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur kish, ukrainian, nohyphenation, loaded. ! Undefined control sequence. l.4 \section [CgCon]{Code generation for constructors} ?
Hi, latex is ennoyed by the \section (the line breaks after the problem in latex's error message). And, seeing l.4, which means line number 4, I think you tried to run latex on one particular file, not the main one. The main one begins (after some comments I guess) with \documentclass. Cheers, mt

Hi Michael,
cpphs is the answer: http://www.cs.york.ac.uk/fp/cpphs/
It has a --unlit mode, and if one of the other flags doesn't remove the
various forms of line dropping, I'm sure if you posted what you need doing,
contacted the author or submitted a patch that someone would be able to do
it. It's something that should exist.
Thanks
Neil
On 12/30/06, Michael T. Richter
I'm trying to wrap my mind around the darcs source code as a preliminary to looking into GHC's guts. All of darcs is written as .lhs files which have bizarre mark-up in them which distracts me from the actual Haskell source I'm trying to figure out and get used to. Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
Are there any tools which I can use to render .lhs files readable? I'm fine with having them converted into documented source (i.e. source code embedded in documentation) or as pure Haskell source (but without the huge whitespace gaps) -- but I can't figure out how to get either.
-- *Michael T. Richter* *Email:* ttmrichter@gmail.com, mtr1966@hotpop.com *MSN:* ttmrichter@hotmail.com, mtr1966@hotmail.com; *YIM:*michael_richter_1966; *AIM:* YanJiahua1966; *ICQ:* 241960658; *Jabber:* mtr1966@jabber.cn
*"I think it is very beautiful for the poor to accept their lot [...]. I think the world is being much helped by the suffering of the poor people." * *--Mother Theresa*
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 12/29/06, Michael T. Richter
I'm trying to wrap my mind around the darcs source code as a preliminary to looking into GHC's guts. All of darcs is written as .lhs files which have bizarre mark-up in them which distracts me from the actual Haskell source I'm trying to figure out and get used to. Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
Speaking of bizarre markup, I gently suggest using plaintext rather than HTML, and wrapping your lines, when you post to this list. In any case, I'm surprised that you find the Literate Haskell aspect of it to be the *most* bizarre thing about darcs's sources, but anyway, I really do suggest turning the code into PDFs as Cale suggested rather than trying to strip out the literate markup. Sometimes, documentation really does help one understand code, and the entire point of literate programming is to make code more readable. A little effort spent learning now could save you a whole lot of effort later. Cheers, Kirsten -- Kirsten Chevalier* chevalier@alum.wellesley.edu *Often in error, never in doubt "Dare to be naive."--R. Buckminster Fuller

On 12/29/06, Michael T. Richter
I'm trying to wrap my mind around the darcs source code as a preliminary to looking into GHC's guts. All of darcs is written as .lhs files which have bizarre mark-up in them which distracts me from the actual Haskell source I'm trying to figure out and get used to. Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
Doesn't something like egrep -v "^[^>]" solve it? -- Mikael Johansson | To see the world in a grain of sand mikael@johanssons.org | And heaven in a wild flower http://www.mikael.johanssons.org | To hold infinity in the palm of your hand | And eternity for an hour

On Sat, 30 Dec 2006, Michael T. Richter wrote:
Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
uniq will solve this part of your problem, but you're probably taking the
wrong approach to the wider problem.
Tony.
--
f.a.n.finch

On Sat, 2006-30-12 at 17:13 +0000, Tony Finch wrote:
Apparently the GHC compiler can take .lhs files, strip them with "unlit" (a utility which I finally found buried deep in the GHC installation -- off-path) and then compile them normally. The problem I have is that unlit leaves behind instead these huge gaping (and highly distracting) stretches of whitespace while it takes out the markup.
uniq will solve this part of your problem, but you're probably taking the wrong approach to the wider problem.
So what is the right approach (and, for that matter, what is the wider problem)? -- Michael T. Richter Email: ttmrichter@gmail.com, mtr1966@hotpop.com MSN: ttmrichter@hotmail.com, mtr1966@hotmail.com; YIM: michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber: mtr1966@jabber.cn "I think it is very beautiful for the poor to accept their lot [...]. I think the world is being much helped by the suffering of the poor people." --Mother Theresa

On Sun, 31 Dec 2006, Michael T. Richter wrote:
So what is the right approach (and, for that matter, what is the wider problem)?
I thought that was clear from the other replies.
Tony.
--
f.a.n.finch
participants (8)
-
Cale Gibbard
-
Iavor Diatchki
-
Kirsten Chevalier
-
Michael T. Richter
-
Mikael Johansson
-
minh thu
-
Neil Mitchell
-
Tony Finch