[Haskell-cafe] Re: Lazy Parsing

31 May 2009

      Dear Malcom,

thanks for helping.

I had actually come to Haskell originally because of a parsing problem. 
I had been using Smalltalk until I started a project which required 
parsing files. Until then I had not done any RW parsing.

Well the route was more a Parsec -> Haskell, wtf is Haskell? Anyway 
eventually I dropped Smalltalk and got addicted to Haskell. And managed 
  familiarize myself with Haskell and Parsec, the latter as it turned 
out I didn't even need to solve my original problem.

Anyway polyparse certainly is an option, but there are a few things that 
despite my "list of failures" to use it give uu-parsinglib a special 
appeal, the breadth-first approach with choice, I find that terrible 
elegant. Due to some kicks in my behind it seems that I might be able to 
use Doaitse's combinators now, some more details on that are in another 
post.

Günther

Malcolm Wallace schrieb:
...
...
It is my pleasure to announce that after 5 days of experimenting with 
uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.
I do not even manage to write a parser for even a mere digit or a 
simple character.
I don't know whether you will be willing to change over to polyparse 
library, but here are some hints about how you might use it.
Given that you want the input to be a simple character stream, rather 
than use a more elaborate lexer, the first thing to do is to specialise 
the parser type for your purposes:
...
type TextParser a = Parser Char a
Now, to recognise a "mere digit",
...
digit :: TextParser Char
digit = satisfy Char.isDigit
and for a sequence of digits forming an unsigned integer:
...
integer :: TextParser Integer
integer = do ds <- many1 digit
             return (foldl1 (\n d-> n*10+d)
                            (map (fromIntegral.digitToInt) ds))
          `adjustErr` (++("expected one or more digits"))
...
I mean I'd like to be able to turn "12.05.2009" into something like 
(12, 5, 2009) and got no clue what the code would have to look like. I 
do know almost every variation what the code must not look like :).
...
date = do a <- integer
          satisfy (=='.')
          b <- integer
          satisfy (=='.')
          c <- integer
          return (a,b,c)
Of course, that is just the standard (strict) monadic interface used by 
many combinator libraries.  Your original desire was for lazy parsing, 
and to achieve that, you must move over to the applicative interface.  
The key difference is that you cannot name intermediate values, but must 
construct larger values directly from smaller ones by something like 
function application.
...
lazydate = return (,,) `apply` integer `discard` dot
                       `apply` integer `discard` dot
                       `apply` integer
   where dot = satisfy (=='.')
The (,,) is the constructor function for triples.  The `discard` 
combinator ensures that its second argument parses OK, but throws away 
its result, keeping only the result of its first argument.
Apart from lazy space behaviour, the main observable difference between 
"date" and "lazydate" is when errors are reported on incorrect input.  
For instance:
...
fst $ runParser date "12.05..2009"
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits
...
fst $ runParser lazydate "12.05..2009"
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits
Notice how the lazy parser managed to build the first two elements of 
the triple, whilst the strict parser gave no value at all.
I know that the error messages shown here are not entirely satisfactory, 
but they can be improved significantly just by making greater use of the 
`adjustErr` combinator in lots more places (it is rather like Parsec's 
).  Errors containing positional information about the input can be 
constructed by introducing a separate lexical tokenizer, which is also 
not difficult.
Regards,
    Malcolm

[Haskell-cafe] Re: Lazy Parsing

Gü?nther Schmidt