
I had been using Parsec to parse VCD files, but needed to lazily parse
streaming data. After stumbling on this thread below, I switch to
polyparse.
What a great library! I was able to migrate from a strict to a
semi-lazy parser and many of my parse reductions didn't even need to
change. Thanks Malcolm!
In addition to lazy VCD parsing, this version of vcd [1] also includes
step', which forces a step regardless if variables have changed or not
-- helpful for realtime simulation.
(BTW, parsec is a great library too.)
-Tom
[1] http://hackage.haskell.org/package/vcd-0.1.4
On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it.
Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes:
type TextParser a = Parser Char a
Now, to recognise a "mere digit",
digit :: TextParser Char digit = satisfy Char.isDigit
and for a sequence of digits forming an unsigned integer:
integer :: TextParser Integer integer = do ds <- many1 digit return (foldl1 (\n d-> n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++("expected one or more digits"))
I mean I'd like to be able to turn "12.05.2009" into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :).
date = do a <- integer satisfy (=='.') b <- integer satisfy (=='.') c <- integer return (a,b,c)
Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application.
lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.')
The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument.
Apart from lazy space behaviour, the main observable difference between "date" and "lazydate" is when errors are reported on incorrect input. For instance:
> fst $ runParser date "12.05..2009" *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
> fst $ runParser lazydate "12.05..2009" (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all.
I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's >). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult.
Regards, Malcolm
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe