
Graham Klyne wrote:
On reflection, I think there's a strong case for doing it this way (i.e. with a separate tokenizer) in Haskell, even if the tokenization is very simple, because it helps to separate some of the character-level issues from the remaining program logic. Any spurious detail that can be separated from the core logic makes the core easier to understand.
Yep.
Divide and rule! And, if I understand correctly, Haskell's lazy evaluation should mean that there's little or no penalty for doing this, even though it looks as if you're generating substantial intermediate data.
Fine. Though I'm not concerned with a perfomance, at least not until I get a first working version. ;)
BTW, for a project like this, be very aware of the cost of using ++ to append to a sequence. Look at the ShowS type in the standard Prelude (PreludeText). I also made some notes about this [1].
[1] http://www.ninebynine.org/Software/Learning-Haskell-Notes.html#UsingShowS
I skimmed through, and the whole page looks very instructive, thanks!