
Hi all, is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to. The files it has to parse is anywhere from 500 MB to 5 GB. Günther

Günther Schmidt schrieb:
Hi all,
is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to.
The files it has to parse is anywhere from 500 MB to 5 GB.
I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure. If parser failure is denoted by a Left constructor in (Either Reason Result) then the whole parsing process must be finished, before the parser knows whether the answer is Left or Right. I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy. I tried to do better in tagchup, where I make explicit in the type, whether a parser can fail or not. In the first case in cannot be lazy, in the second case it can. I also did lazy parsing in 'midi' package and in 'spreadsheet'. I liked to factor out a lazy parser library from them, but I failed to unify all these applications. At least I have factored out handling of lazy failure (aka asnychronous exceptions) in explicit-exception package. Btw. a good place to discuss such issues is our local Haskell meeting that takes place on 2009-06-12: http://iba-cg.de/hal4.html

Henning Thielemann
I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure.
Polyparse demonstrates that you can mix-and-match lazy parsers with strict parsers in the different parts of a grammar (by choosing whether to use applicative or monadic style). You can also switch between lazy or strict interpretations of the applicative parts of your grammar (by changing the import that decides which version of the parser primitives is in scope).
I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy.
It can certainly be tricky, and requires a certain amount of experimentation. I think the difficulties are mainly due to the mix of lazy (applicative) and strict (monadic) styles in different non-terminals. A parser that you intend to be lazy, may turn out to be stricter than you hope, because of the strictness of another parser that it depends upon. Regards, Malcolm

In the uu-parsinglib we actually have two versions of parsers: lazy ones and strict ones, which have different types. So by giving a type annotation you can select the one you want. Notice that in the left- hand side of a monadic construct it does not make sense to use a lazy parser, since its result will be used as a parameter to the right-hand side operator, so in case of a monad our library system automagically selects the strict version for the left hand side. For the right hand side it depends on the type of the overall expression. Unfortunately in Haskell both the left and right hand side of a bind need the to be elements of the same monad, whereas in the case of a lazy oevrall parser this is not the case. We solve this problem by tupling the two parsers (NOT the parsing results), so still the do-notation can be used. The use of the library is free of any trickery! Doaitse Swierstra On 28 mei 2009, at 11:41, Malcolm Wallace wrote:
Henning Thielemann
wrote: I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure.
Polyparse demonstrates that you can mix-and-match lazy parsers with strict parsers in the different parts of a grammar (by choosing whether to use applicative or monadic style). You can also switch between lazy or strict interpretations of the applicative parts of your grammar (by changing the import that decides which version of the parser primitives is in scope).
I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy.
It can certainly be tricky, and requires a certain amount of experimentation. I think the difficulties are mainly due to the mix of lazy (applicative) and strict (monadic) styles in different non-terminals. A parser that you intend to be lazy, may turn out to be stricter than you hope, because of the strictness of another parser that it depends upon.
Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dear Doaitse, It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it. Period. I do not even manage to write a parser for even a mere digit or a simple character. I have read the tutorial from a to a to z and from z to a and there were a few words I recognized. I mean I'd like to be able to turn "12.05.2009" into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). I am guessing here that when one does define a parsing function, since all the parser combinators aren't function but methods, one *must* also provide a type signature so that the compiler knows the actual *instance* method? Günther

Hi Günther The code below should work for your simple example, provided it hasn't lost formatting when I pasted it in to the email. I was a bit surprised that there is no pSatisfy in this library, but there are parsers for digits, lower case, upper case letters etc. in the Examples module that would otherwise be achieved with pSatisfy. Best wishes Stephen {-# LANGUAGE FlexibleContexts #-} module Demo1 where import Text.ParserCombinators.UU.Examples import Text.ParserCombinators.UU.Parsing -- here's a simple character '@' parser pAtSym :: Symbol p Char Char => p Char pAtSym = pSym '@' test_simple_char = test pAtSym "@" test_simple_char2 = test pAtSym "@@@@@" -- pDigit is supplied in Text.ParserCombinators.UU.Examples test_any_digit = test pDigit "6" -- pNatural is supplied in Text.ParserCombinators.UU.Examples -- It looks like the most likely candidate to parse a -- sequence of digits... test_natural = test pNatural "1234" -- ... and it is! -- parse a date "12.05.2009" as a triple (Int,Int,Int) pDateTriple :: (Symbol p (Char,Char) Char, Applicative p, ExtApplicative p st, Provides st Char Char) => p (Int,Int,Int) pDateTriple = (,,) <$> pNatural <* pDot <*> pNatural <* pDot <*> pNatural pDot :: (Symbol p Char Char, Applicative p) => p [Char] pDot = lift <$> pSym '.' test_date = test pDateTriple "12.05.2009"

It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.
I do not even manage to write a parser for even a mere digit or a simple character.
I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it. Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes:
type TextParser a = Parser Char a
Now, to recognise a "mere digit",
digit :: TextParser Char digit = satisfy Char.isDigit
and for a sequence of digits forming an unsigned integer:
integer :: TextParser Integer integer = do ds <- many1 digit return (foldl1 (\n d-> n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++("expected one or more digits"))
I mean I'd like to be able to turn "12.05.2009" into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :).
date = do a <- integer satisfy (=='.') b <- integer satisfy (=='.') c <- integer return (a,b,c)
Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application.
lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.')
The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument. Apart from lazy space behaviour, the main observable difference between "date" and "lazydate" is when errors are reported on incorrect input. For instance:
fst $ runParser date "12.05..2009" *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
fst $ runParser lazydate "12.05..2009" (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all. I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's >). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult. Regards, Malcolm

Dear Malcom, thanks for helping. I had actually come to Haskell originally because of a parsing problem. I had been using Smalltalk until I started a project which required parsing files. Until then I had not done any RW parsing. Well the route was more a Parsec -> Haskell, wtf is Haskell? Anyway eventually I dropped Smalltalk and got addicted to Haskell. And managed familiarize myself with Haskell and Parsec, the latter as it turned out I didn't even need to solve my original problem. Anyway polyparse certainly is an option, but there are a few things that despite my "list of failures" to use it give uu-parsinglib a special appeal, the breadth-first approach with choice, I find that terrible elegant. Due to some kicks in my behind it seems that I might be able to use Doaitse's combinators now, some more details on that are in another post. Günther Malcolm Wallace schrieb:
It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.
I do not even manage to write a parser for even a mere digit or a simple character.
I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it.
Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes:
type TextParser a = Parser Char a
Now, to recognise a "mere digit",
digit :: TextParser Char digit = satisfy Char.isDigit
and for a sequence of digits forming an unsigned integer:
integer :: TextParser Integer integer = do ds <- many1 digit return (foldl1 (\n d-> n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++("expected one or more digits"))
I mean I'd like to be able to turn "12.05.2009" into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :).
date = do a <- integer satisfy (=='.') b <- integer satisfy (=='.') c <- integer return (a,b,c)
Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application.
lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.')
The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument.
Apart from lazy space behaviour, the main observable difference between "date" and "lazydate" is when errors are reported on incorrect input. For instance:
fst $ runParser date "12.05..2009" *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
fst $ runParser lazydate "12.05..2009" (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits
Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all.
I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's >). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult.
Regards, Malcolm

Dear Gunther, I am providing my solution, on which one can of course specialise in making sure that a valid date is parsed, which would be a bit more cumbersome; how should e.g. error correction be done. I prefer to test afterwards in such situations. Best, Doaitse module Guenther where import Text.ParserCombinators.UU.Parsing import Text.ParserCombinators.UU.BasicInstances import Text.ParserCombinators.UU.Examples hiding (main) import Control.Applicative hiding ((<*), (*>), (<$)) {- The first decision we have to make is what kind of input we are providing. The simplest case is just to assume simple characters, hence for our input type we will use the standard provided stream of Characters: Str Char, so we use the type of our parsers to be the type used in the Examples module; since we do not know whether we wil be using the parsers in a monadic mode too we stay on the safe side ans use the type P_m -} type GP a = P_m (Str Char) a -- GP stands for GuenterParser {- Once we know that our input contains characters, but that in our output we what to have integer values, we start out by building a parser for a single integer , for which we use the function pNatural form the examples-} pDate = (,,) <$> pNatural <* pDot <*> pNatural <* pDot <*> (pNatural ::GP Int) pDot = pSym '.' {- main = do print (test pDate "3.4.1900") print (test pDate "3 4 1900") print (test pDate "..1900")-} -- end of Module Guenther By playing with insertion and deletion costs (e.g. by building a more picky pNatural) one can control the error recovery. Another option to get better error recovery would be to define a specialised instance of Provides which removes spaces. You might even temporarily pSwitch to the use of this state
Period.
I do not even manage to write a parser for even a mere digit or a simple character. I have read the tutorial from a to a to z and from z to a and there were a few words I recognized.
I mean I'd like to be able to turn "12.05.2009" into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :).
I am guessing here that when one does define a parsing function, since all the parser combinators aren't function but methods, one *must* also provide a type signature so that the compiler knows the actual *instance* method?
Günther

On Wed, 27 May 2009, Gü?nther Schmidt wrote:
is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to.
Also see http://www.haskell.org/haskellwiki/Maintaining_laziness

Lazy parsing has been the default for the last ten years in uulib, and is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib ). The whole design of the latter in described in a technical report to which references are given on the web page. It provides also error correction, the ability to use several different kinds of input tokens, and (with some help) ambiguities. If speed is an issue you can insert extra hints which locally change the breadth-first parsing process locally into a somewhat more depth-first form. When compared with Parsec the good news is that usually you do not have to put annotations to get nice results. The older uulib version also performs an abstract interpretation which basically changes the search for which alternative to take from a linear to a logarithmic complexity, but does not provide a monadic structure, in which you use results recognised thus far to construct new parsers. Both the old uulib version and the new version have always had an applicative interface. In the near future elements of the abstract interpretation of the old uulib version will migrate into the new version. It is the advent of GADT's which made this new version feasable. An example of the error correction at work at the following example code: pa, pb, paz :: P_m (Str Char) [Char] pa = lift <$> pSym 'a' pb = lift <$> pSym 'b' p <++> q = (++) <$> p <*> q pa2 = pa <++> pa pa3 = pa <++> pa2 pCount p = (\ a b -> b+1) <$> p <*> pCount p <<|> pReturn 0 pExact 0 p = pReturn [] pExact n p = (:) <$> p <*> pExact (n-1) p paz = pMany (pSym ('a', 'z')) paz' = pSym (\t -> 'a' <= t && t <= 'z', "a .. z", 'k') main :: IO () main = do print (test pa "a") print (test pa "b") print (test pa2 "bbab") print (test pa "ba") print (test pa "aa") print (test (do l <- pCount pa pExact l pb) "aaacabbb") print (test (amb ( (++) <$> pa2 <*> pa3 <|> (++) <$> pa3 <*> pa2)) "aaabaa") print (test paz "ab1z7") print (test paz' "m") print (test paz' "") is loeki:~ doaitse$ ghci -package uu-parsinglib GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package filepath-1.1.0.1 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.1 ... linking ... done. Loading package unix-2.3.1.0 ... linking ... done. Loading package directory-1.0.0.2 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. Loading package uu-parsinglib-2.0.0 ... linking ... done. Prelude> :m Text.ParserCombinators.UU.Examples Prelude Text.ParserCombinators.UU.Examples> main ("a",[]) ("a",[ Deleted 'b' at position 0 expecting one of ["'a'"], Inserted 'a' at position 1 expecting one of ["'a'"]]) ("aa",[ Deleted 'b' at position 0 expecting one of ["'a'"], Deleted 'b' at position 1 expecting one of ["'a'"], Deleted 'b' at position 3 expecting one of ["'a'"], Inserted 'a' at position 4 expecting one of ["'a'"]]) ("a",[ Deleted 'b' at position 0 expecting one of ["'a'"]]) ("a",[ The token 'a'was not consumed by the parsing process.]) (["b","b","b","b"],[ Deleted 'c' at position 3 expecting one of ["'a'","'b'"], Inserted 'b' at position 8 expecting one of ["'b'"]]) (["aaaaa"],[ Deleted 'b' at position 3 expecting one of ["'a'","'a'"]]) ("abz",[ Deleted '1' at position 2 expecting one of ["'a'..'z'"], The token '7'was not consumed by the parsing process.]) ('m',[]) ('k',[ Inserted 'k' at position 0 expecting one of ["a .. z"]]) Prelude Text.ParserCombinators.UU.Examples> Doaitse Swierstra On 27 mei 2009, at 01:52, Günther Schmidt wrote:
Hi all,
is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to.
The files it has to parse is anywhere from 500 MB to 5 GB.
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dear Doaitse, In the days since my original post I had already come to favor the uu-parsing package. I have printed the report and read it every day to figure out how to use it. I cannot follow everything yet, and also hope that won't be necessary in order to use it. :-) My progress is a bit slow, but I'm not giving up. What I do like most, over the other combinatory packages, is the approach of using "breadth-first" when it comes to choice, the idea is certainly enlightening. The packages capability to do "online- / partial parsing" is essential for me. I am a bit surprised about it's "raw" state. The basic combinators and primitives are there but combinators like pChain, pDigit etc. are not predefined and merely present in the examples package. I had gotten quite comfortable with parsec and need to find the right way to "translate" my parsec code to your package. Anyway let me thank you for your work, I really appreciate it very much. Günther S. Doaitse Swierstra schrieb:
Lazy parsing has been the default for the last ten years in uulib, and is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib). The whole design of the latter in described in a technical report to which references are given on the web page. It provides also error correction, the ability to use several different kinds of input tokens, and (with some help) ambiguities. If speed is an issue you can insert extra hints which locally change the breadth-first parsing process locally into a somewhat more depth-first form. When compared with Parsec the good news is that usually you do not have to put annotations to get nice results.
The older uulib version also performs an abstract interpretation which basically changes the search for which alternative to take from a linear to a logarithmic complexity, but does not provide a monadic structure, in which you use results recognised thus far to construct new parsers.
Both the old uulib version and the new version have always had an applicative interface.
In the near future elements of the abstract interpretation of the old uulib version will migrate into the new version. It is the advent of GADT's which made this new version feasable.
An example of the error correction at work at the following example code:
pa, pb, paz :: P_m (Str Char) [Char] pa = lift <$> pSym 'a' pb = lift <$> pSym 'b' p <++> q = (++) <$> p <*> q pa2 = pa <++> pa pa3 = pa <++> pa2
pCount p = (\ a b -> b+1) <$> p <*> pCount p <<|> pReturn 0 pExact 0 p = pReturn [] pExact n p = (:) <$> p <*> pExact (n-1) p
paz = pMany (pSym ('a', 'z'))
paz' = pSym (\t -> 'a' <= t && t <= 'z', "a .. z", 'k')
main :: IO () main = do print (test pa "a") print (test pa "b") print (test pa2 "bbab") print (test pa "ba") print (test pa "aa") print (test (do l <- pCount pa pExact l pb) "aaacabbb") print (test (amb ( (++) <$> pa2 <*> pa3 <|> (++) <$> pa3 <*> pa2)) "aaabaa") print (test paz "ab1z7") print (test paz' "m") print (test paz' "")
is
loeki:~ doaitse$ ghci -package uu-parsinglib GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package filepath-1.1.0.1 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.1 ... linking ... done. Loading package unix-2.3.1.0 ... linking ... done. Loading package directory-1.0.0.2 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. Loading package uu-parsinglib-2.0.0 ... linking ... done. Prelude> :m Text.ParserCombinators.UU.Examples Prelude Text.ParserCombinators.UU.Examples> main ("a",[]) ("a",[ Deleted 'b' at position 0 expecting one of ["'a'"], Inserted 'a' at position 1 expecting one of ["'a'"]]) ("aa",[ Deleted 'b' at position 0 expecting one of ["'a'"], Deleted 'b' at position 1 expecting one of ["'a'"], Deleted 'b' at position 3 expecting one of ["'a'"], Inserted 'a' at position 4 expecting one of ["'a'"]]) ("a",[ Deleted 'b' at position 0 expecting one of ["'a'"]]) ("a",[ The token 'a'was not consumed by the parsing process.]) (["b","b","b","b"],[ Deleted 'c' at position 3 expecting one of ["'a'","'b'"], Inserted 'b' at position 8 expecting one of ["'b'"]]) (["aaaaa"],[ Deleted 'b' at position 3 expecting one of ["'a'","'a'"]]) ("abz",[ Deleted '1' at position 2 expecting one of ["'a'..'z'"], The token '7'was not consumed by the parsing process.]) ('m',[]) ('k',[ Inserted 'k' at position 0 expecting one of ["a .. z"]]) Prelude Text.ParserCombinators.UU.Examples>
Doaitse Swierstra
On 27 mei 2009, at 01:52, GüŸnther Schmidt wrote:
Hi all,
is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to.
The files it has to parse is anywhere from 500 MB to 5 GB.
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (10)
-
Guenther Schmidt
-
Gü?nther Schmidt
-
Günther Schmidt
-
Henning Thielemann
-
Henning Thielemann
-
Malcolm Wallace
-
Malcolm Wallace
-
S. Doaitse Swierstra
-
S.Doaitse Swierstra
-
Stephen Tetley