Parsers (Parsec and Iteratee-based Parsers)

Hi, are there any examples how to build parsers using the library in Oleg's iteratee package? I've been using parsec for almost all my parsing needs, in fact it was parsec that got me started with Haskell. I'd like to try to build a few parsers based on the left-fold-enumerator thingy and compare this approach to parsec. Any good tips how to get started? Günther

2010/1/11 Günther Schmidt
Hi,
are there any examples how to build parsers using the library in Oleg's iteratee package?
I've been using parsec for almost all my parsing needs, in fact it was parsec that got me started with Haskell.
I think you should be in contact with John Lato. Last time we had correspondence he mentioned a hybrid between iteratees and parsec. Jason

2010/1/11 Jason Dagit
2010/1/11 Günther Schmidt
Hi,
are there any examples how to build parsers using the library in Oleg's iteratee package?
I've been using parsec for almost all my parsing needs, in fact it was parsec that got me started with Haskell.
I think you should be in contact with John Lato. Last time we had correspondence he mentioned a hybrid between iteratees and parsec.
I don't know if I'd call it a hybrid, however there is a way to embed Parsec parsers (v.3 only) in iteratee. The necessary code is available at: http://inmachina.net/~jwlato/haskell/ParsecIteratee.hs It's intended to be used for embedding relatively small parsers in iteratees. It does concatenate chunks, so if it looks too far into the stream it can be inefficient. Chunks of up to 2000 characters shouldn't require more than 1 concat (depending on what other functions you're using). This code is available as Public Domain. Thanks to Erik de Castro Lopo for suggesting integrating iteratee and parsec in some manner. It's still pretty new, so suggestions are welcome. If you want to build parsers directly with iteratee, there currently aren't any published tutorials although I can provide some working code if you like. The interface is much lower-level than Parsec. John

Hi John, thanks for responding. As I said I've been using Parsec quite a lot, but wonder if there is a different approach possible/feasible to parsing. Parsec (2x) isn't an "online" parser, ie, it doesn't produce a result before the whole parse is completed. There is AFAIK one alternative, the uulib, but at first glance it seemed very elaborate, so I wonder if Oleg's Iteratee offers something simpler. I am not in particular looking for some sort of parsec-iteratee-hybrid, I'd be quite happy with something entirely based on Iteratee. In the Iteratee package there are 2 sample parsers, one for TIFF and one for WAVE files. I wish I could say that the accompanying documentation is sufficient for me to get the idea, alas it's not. Günther

2010/1/12 Günther Schmidt
Hi John,
thanks for responding. As I said I've been using Parsec quite a lot, but wonder if there is a different approach possible/feasible to parsing. Parsec (2x) isn't an "online" parser, ie, it doesn't produce a result before the whole parse is completed.
This appears to be a common source of confusion. You can produce a result by not trying to parse the entire input in one go, instead parse as much as you want to consume, then return the rest of the input with the tokens that you have collected. For example, in Parsec 2 a regular approach I use is something like: get1 = do token <- parse1token remaining <- getInput return (remaining,token) Jeff

As I said I've been using Parsec quite a lot, but wonder if there is a different approach possible/feasible to parsing. Parsec (2x) isn't an "online" parser, ie, it doesn't produce a result before the whole parse is completed.
There is AFAIK one alternative, the uulib,
In addition, the polyparse library provides "online", or lazy, parsing. Its interface is somewhat similar to parsec, perhaps even simpler. (Actually, you can freely mix lazy and strict parsing - laziness is provided by applicative combinators, strictness by monadic combinators.) I have not yet looked into whether it would be possible to link polyparse to iteratees. Regards, Malcolm

The frisby parser (http://repetae.net/computer/frisby/) that
unfortunately is not well known as it has never been uploaded on
hackage also supports lazy parsing.
titto
2010/1/12 Malcolm Wallace
As I said I've been using Parsec quite a lot, but wonder if there is a different approach possible/feasible to parsing. Parsec (2x) isn't an "online" parser, ie, it doesn't produce a result before the whole parse is completed.
There is AFAIK one alternative, the uulib,
In addition, the polyparse library provides "online", or lazy, parsing. Its interface is somewhat similar to parsec, perhaps even simpler. (Actually, you can freely mix lazy and strict parsing - laziness is provided by applicative combinators, strictness by monadic combinators.) I have not yet looked into whether it would be possible to link polyparse to iteratees.
Regards, Malcolm _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Pasqualino "Titto" Assini, Ph.D. http://quicquid.org/

2010/1/12 Pasqualino "Titto" Assini
The frisby parser (http://repetae.net/computer/frisby/) that unfortunately is not well known as it has never been uploaded on hackage also supports lazy parsing.
Doaitse Swierstra's new version of UU supports online parsing too: http://hackage.haskell.org/package/uu-parsinglib Later versions of the original UU.Parsing look like they incorporate a version the Steps data type from the "Polish Parsers, Step by Step" paper so they will also support online parsing. Its worth re-iterating that the type of the parse result has to support 'lazyness' for the technique to be useful... Best wishes Stephen

On 12 jan 2010, at 00:09, Günther Schmidt wrote:
Hi John,
thanks for responding. As I said I've been using Parsec quite a lot, but wonder if there is a different approach possible/feasible to parsing. Parsec (2x) isn't an "online" parser, ie, it doesn't produce a result before the whole parse is completed.
There is AFAIK one alternative, the uulib, but at first glance it seemed very elaborate, so I wonder if Oleg's Iteratee offers something simpler.
There is the new uu-parsinglib package, which is simpler and documented (and being worked upon). Doaitse
I am not in particular looking for some sort of parsec-iteratee- hybrid, I'd be quite happy with something entirely based on Iteratee. In the Iteratee package there are 2 sample parsers, one for TIFF and one for WAVE files. I wish I could say that the accompanying documentation is sufficient for me to get the idea, alas it's not.
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

John Lato
I don't know if I'd call it a hybrid, however there is a way to embed Parsec parsers (v.3 only) in iteratee. The necessary code is available at:
This post inspired me to write an iteratee wrapper for attoparsec. The
attoparsec library has an incremental parser; this means that we don't
have to use John's lookahead buffer trick, restoring the iteratee
constant-space guarantee. The downside: I don't think the attoparsec
incremental parser is capable of maintaining the source position for
error reporting.
------------------------------------------------------------------------------
{-# LANGUAGE OverloadedStrings #-}
module Data.Attoparsec.Iteratee (parserToIteratee) where
------------------------------------------------------------------------------
import qualified Data.Attoparsec.Incremental as Atto
import Data.Attoparsec.Incremental hiding (Result(..))
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import Data.Iteratee
import Data.Iteratee.WrappedByteString
import Data.Monoid
import Data.Word (Word8)
import Prelude hiding (takeWhile)
-- for the examples at the bottom only
import Control.Monad.Identity
import Data.Char
import Data.ByteString.Internal (w2c)
-- The principle is general enough to work for any 'StreamChunk' type (with
-- appropriate wrapping/unwrapping inserted), but I'm working with
-- "WrappedByteString Word8", sorry
type Stream = StreamG WrappedByteString Word8
type Iteratee m a = IterateeG WrappedByteString Word8 m a
type Enumerator m a = Iteratee m a -> m (Iteratee m a)
-- | Convert an attoparsec 'Parser' into an 'Iteratee'.
parserToIteratee :: (Monad m) =>
Parser a a
-> Iteratee m a
parserToIteratee p =
IterateeG $ \s ->
let r = case s of
(EOF Nothing) -> Atto.parse p ""
(EOF (Just (Err e))) -> Atto.Failed e
(EOF (Just _)) -> Atto.Failed "seek not permitted"
(Chunk s') -> Atto.parse p $ fromWrap s'
in return $ Cont (runChunk r) Nothing
where
runChunk (Atto.Failed m) = throwErr $ Err m
runChunk (Atto.Done rest r) =
IterateeG $ \s -> return $ Done r (addToChunk rest s)
runChunk oldr@(Atto.Partial f) =
IterateeG $ \s ->
case s of
(EOF Nothing) -> enforceDone f
(EOF (Just e)) -> return $ Cont (throwErr e) (Just e)
(Chunk s') -> let x = fromWrap s'
k = if L.null x then oldr else f x
in return $ Cont (runChunk k) Nothing
-- you end an incremental parser by passing it the empty string
enforceDone f =
return $ case f "" of
(Atto.Failed m ) -> Cont (throwErr $ Err m) (Just $ Err m)
(Atto.Done rest r) -> Done r $ Chunk (toWrap rest)
(Atto.Partial _ ) -> Cont (throwErr eoi) (Just eoi)
where
eoi = Err "premature end of input"
-- | lazy bytestring -> wrapped bytestring
toWrap :: L.ByteString -> WrappedByteString Word8
toWrap = WrapBS . S.concat . L.toChunks
-- | wrapped bytestring -> lazy bytestring
fromWrap :: WrappedByteString Word8 -> L.ByteString
fromWrap = L.fromChunks . (:[]) . unWrap
-- | tack a lazy bytestring onto the front of an iteratee 'Stream'
addToChunk :: L.ByteString -> Stream -> Stream
addToChunk s (EOF Nothing) = Chunk $ toWrap s
addToChunk _ x@(EOF _) = x
addToChunk s (Chunk w) = Chunk $ toWrap s `mappend` w
------------------------------------------------------------------------------
-- And a quick example
sp :: Parser r ()
sp = () <$ takeWhile (isSpace . w2c)
digits :: Parser r String
digits = many1 (w2c <$> satisfy (isDigit . w2c))
number :: Parser r Int
number = read <$> digits
numberList :: Parser r [Int]
numberList = liftA2 (:) number (many (sp *> number))
ensureEOF :: Parser r ()
ensureEOF = endOfInput <|> reportError
where
reportError = do
ch <- anyWord8
let msg = concat [ "unexpected character '"
, [w2c ch]
, "'" ]
fail msg
numberListIter :: (Monad m) => Iteratee m [Int]
numberListIter = parserToIteratee $ numberList <* ensureEOF
-- | Turn a strict bytestring into an enumerator
enumBS :: (Monad m) => S.ByteString -> Enumerator m a
enumBS bs = enumPure1Chunk $ WrapBS bs
_example :: [Int]
_example = runIdentity (enumerate numberListIter >>= run)
where
-- example, the source could be any enumerator
enumerate = enumBS "1000 2000 3000 4000 5000 6000 7000"
_exampleWithError :: Either ErrMsg [Int]
_exampleWithError = runIdentity (enumerate numberListIter >>= run . checkErr)
where
enumerate = enumBS "1000 2000 3000 4000 5000 6000 7000q"
-- > *Data.Attoparsec.Iteratee> _example
-- > [1000,2000,3000,4000,5000,6000,7000]
-- > *Data.Attoparsec.Iteratee> _exampleWithError
-- > Left (Err "unexpected character 'q'")
------------------------------------------------------------------------------
G.
--
Gregory Collins

Gregory Collins
John Lato
writes: I don't know if I'd call it a hybrid, however there is a way to embed Parsec parsers (v.3 only) in iteratee. The necessary code is available at:
This post inspired me to write an iteratee wrapper for attoparsec. The attoparsec library has an incremental parser; this means that we don't have to use John's lookahead buffer trick, restoring the iteratee constant-space guarantee. The downside: I don't think the attoparsec incremental parser is capable of maintaining the source position for error reporting.
....and here's a version that might actually work (mea culpa)
------------------------------------------------------------------------------
{-# LANGUAGE OverloadedStrings #-}
module Data.Attoparsec.Iteratee (parserToIteratee) where
------------------------------------------------------------------------------
import qualified Data.Attoparsec.Incremental as Atto
import Data.Attoparsec.Incremental hiding (Result(..))
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import Data.Iteratee
import Data.Iteratee.WrappedByteString
import Data.Word (Word8)
import Prelude hiding (takeWhile)
-- for the examples at the bottom only
import Control.Monad.Identity
import Data.Char
import Data.ByteString.Internal (w2c)
-- The principle is general enough to work for any 'StreamChunk' type (with
-- appropriate wrapping/unwrapping inserted), but I'm working with
-- "WrappedByteString Word8", sorry
type Stream = StreamG WrappedByteString Word8
type Iteratee m = IterateeG WrappedByteString Word8 m
type IterV m = IterGV WrappedByteString Word8 m
type Enumerator m a = Iteratee m a -> m (Iteratee m a)
parserToIteratee :: (Monad m) =>
Parser a a
-> Iteratee m a
parserToIteratee p = IterateeG $ f (\s -> parse p s)
where
f :: (Monad m) =>
(L.ByteString -> Atto.Result a)
-> Stream
-> m (IterV m a)
f k (EOF Nothing) = finalChunk $ k ""
f _ (EOF (Just e)) = reportError e
f k (Chunk s) = chunk (fromWrap s) k
finalChunk :: (Monad m) => Atto.Result a -> m (IterV m a)
finalChunk (Atto.Failed m) =
return $ Cont (error $ show m)
(Just $ Err m)
finalChunk (Atto.Done rest r) = return $ Done r (Chunk $ toWrap rest)
finalChunk (Atto.Partial _) =
return $ Cont (error "parser did not consume all input")
(Just $ Err "parser did not consume all input")
reportError e = return $ Cont (error $ show e) (Just e)
chunk :: (Monad m) =>
L.ByteString
-> (L.ByteString -> Atto.Result a)
-> m (IterV m a)
chunk s k = do
let r = k s
case r of
(Atto.Failed m) -> return $
Cont (throwErr (Err m)) (Just $ Err m)
(Atto.Done rest x) -> return $ Done x (Chunk $ toWrap rest)
(Atto.Partial z) -> return $
Cont (IterateeG $ f z) Nothing
-- | lazy bytestring -> wrapped bytestring
toWrap :: L.ByteString -> WrappedByteString Word8
toWrap = WrapBS . S.concat . L.toChunks
-- | wrapped bytestring -> lazy bytestring
fromWrap :: WrappedByteString Word8 -> L.ByteString
fromWrap = L.fromChunks . (:[]) . unWrap
------------------------------------------------------------------------------
-- And a quick example
sp :: Parser r ()
sp = () <$ takeWhile (isSpace . w2c)
digits :: Parser r String
digits = many1 (w2c <$> satisfy (isDigit . w2c))
number :: Parser r Int
number = read <$> digits
numberList :: Parser r [Int]
numberList = liftA2 (:) number (many (sp *> number))
ensureEOF :: Parser r ()
ensureEOF = endOfInput <|> reportError
where
reportError = do
ch <- anyWord8
let msg = concat [ "unexpected character '"
, [w2c ch]
, "'" ]
fail msg
numberListIter :: (Monad m) => Iteratee m [Int]
numberListIter = parserToIteratee $ numberList <* ensureEOF
-- | Turn a strict bytestring into an enumerator
enumBS :: (Monad m) => S.ByteString -> Enumerator m a
enumBS bs = enumPure1Chunk $ WrapBS bs
_example :: [Int]
_example = runIdentity (enumerate numberListIter >>= run)
where
-- example, the source could be any enumerator
enumerate = enumBS "1000 2000 3000 4000 5000 6000 7000"
_exampleWithError :: Either ErrMsg [Int]
_exampleWithError = runIdentity (enumerate numberListIter >>= run . checkErr)
where
enumerate = enumBS "1000 2000 3000 4000 5000 6000 7000q"
-- > *Data.Attoparsec.Iteratee> _example
-- > [1000,2000,3000,4000,5000,6000,7000]
-- > *Data.Attoparsec.Iteratee> _exampleWithError
-- > Left (Err "unexpected character 'q'")
------------------------------------------------------------------------------
G.
--
Gregory Collins
participants (9)
-
Gregory Collins
-
Günther Schmidt
-
Jason Dagit
-
Jeff Zaroyko
-
John Lato
-
Malcolm Wallace
-
Pasqualino "Titto" Assini
-
S. Doaitse Swierstra
-
Stephen Tetley