Re: [Haskell-cafe] how to get started: a text application

24 Jun 2004

      Graham Klyne wrote:
...
I think the first choice is whether to go for a separately identifiable 
lexing phase, rather than working directly from the raw text.  Either 
might work, I think.
Fhe first option (tokenization) is more appealing to me.
...
The HaXml XML parser has a separate lexer, but it 
turns out that it's not always easy to get the tokenization right 
without having contextual information (e.g. from the syntax analyzer).  
(XML is rather messy in that way.)
Well, yes. In Markdown, like in most other "rich-text" formats symbols
are overloaded a lot. After all, it has to constrain itself to "plain text".

I'm going to try a "two-stage tokenization" (not sure how to name this 
correctly). Basically, first I'd split the raw text into "symbols" (like 
space, char, digit, left-bracket) and then turn these symbols into 
tokens (like paragraph, reference, start bold text, end bold text, etc.)
...
In Haskell, it's often reasonably efficient to construct a program as a 
composition of "filters", rather like a Unix command pipeline;  lazy 
evaluation often means that data can "stream" through the filters and 
never exist in its entirety in an intermediate form.  This immediately 
allows the program structure to be resolved into a number of smaller, 
independent pieces; e.g.
tokenize    :: String -> [Token]
   parse       :: [Token] -> DocModel
   createXHTML :: DocModel -> Document  -- (cf. HaXml)
Yes, I have seen this pattern in tutorial materials and am inclined to 
use it.
...
Then HaXml provides function that can generate textual XML.  Thus the 
overall conversion function might look like:
markdownToXHTML :: String -> String
   markdownToXHTML = show . document . createXHTML . parse . tokenize
That would be a good start, thanks!
...
(where "document" is from the HaXml module Text.XML.HaXml.Pretty).
For parsing of any complexity, I recommend Parsec:  it has the advantage 
of being very well documented, and it helps to show how monads can be 
used to handle state information.
OK.
...
The outline sketched above has at least one weakness, it doesn't provide 
any way to handle errors.  This could be overcome by using Either as an 
error monad (see Control.Monad and Control.Monad.Error in the standard 
hierarchical libraries), and then using >>= in place of function 
composition (noting the reversal of component order):
Uhm. Looks like error handling is very different from the imperative 
languages. I think I'll try to get the basic version without it first. 
On a related note, how can I debug my program along the way? I suspect I 
can't even use a print inside a function.

[ error handling cut ]
...
Just to check out my use of >>= and do-notation, I constructed a trivial 
complete program using both.
Phew.
Not sure how much time it will take to comprehend this "triviality".
I'm not yet can grasp the monads and their applications.

[ example skipped ]

Thank you for your time and explanations. It were surely very helpful, 
esp. considering the fact I had exactly one reply to my post. ;-)

Re: [Haskell-cafe] how to get started: a text application

Max Ischenko