
2010/8/6 Dean Herington
At 11:42 PM +0200 8/5/10, C Gosch wrote:
Hi, does anyone here know their way around in Parsec? I'm trying to parse a file which contains some binary parts too. I have been using Parsec 3.0.1 to parse the first ASCII part, but all the parsers are returning Char type tokens, so it would not work with the binary parts .. is there any way to do this? I am using Text.Parsec.ByteString.Lazy.
Note that I'm new to Haskell and Parsec, and doing this within a small project that I basically do to get a grip on Haskell (and Parsec, because it appears to be really nifty and I want to learn more about it).
Thanks for any hints, Christian
Parsing binary parts of a file is not inherently a problem. A parser can return any type; different parsers in your application will likely return different types.
If you include some code we'll be able to help you more specifically.
Dean
You're right, I probably used the wrong words .. I meant that apparently the tokens Parsec uses are of type Char, and I would actually at some point like to continue parsing, but using different tokens. Sorry if I still got it wrong, I'm new :) I can post some code later, as I don't have it here right now. Cheers, Christian

On Fri, Aug 6, 2010 at 11:03 AM, C Gosch
You're right, I probably used the wrong words .. I meant that apparently the tokens Parsec uses are of type Char, and I would actually at some point like to continue parsing, but using different tokens. Sorry if I still got it wrong, I'm new :) I can post some code later, as I don't have it here right now.
Parsec (at least version 3) uses any type of token you want. Quick example off the top of my head, I didn't check if it compiles: -- you have to make lists of your token type an instance of Stream : instance Stream [ MyTokenType ] Identity MyTokenType where uncons [] = return Nothing uncons (x:xs) = return $ Just (x,xs) -- your parser type is going to look like this : type MyParser a = ParsecT [MyTokenType] () Identity a -- assuming your toke type looks like this Data MyTokenType = A Char | B Word8 deriving (Show) -- you need a basic parser from which you can make more complicated ones satisChar :: ( Char -> Bool ) -> MyParser Char satisChar f = tokenPrim prt pos match where prt = show pos p _l _cs = incSourceLine p 1 match (A c) = if f c then Just c else Nothing match _ = Nothing satisBin :: ( Word8 -> Bool ) -> MyParser Word8 satisBin f = tokenPrim prt pos match where prt = show pos p _l _cs = incSourceLine p 1 match (B w) = if f w then Just w else Nothing match _ = Nothing -- You can define basic parsers like this -- parse any letter letter = satisChar (const True) -- parse a specific char, it will return char c = satisChar (==c) -- parse any binary word binary = satisBin (const True) -- parse a specific binary word w = satisBin (==w) -- now you can combine this to make more complicated parsers. ...

David,
thank you for your helpful explanations. So I need to define a new datatype
encapsulating the two I want
to parse and then create a new Stream, is that correct?
Reading your comments and more of the haddock documentation of Parsec,
this came to my mind: say I want to skip over some binary parts of the input
(which is a PDF, containing ascii and
deflated parts as well as images). I found the functions
setPosition/getPosition, which seem to
build on the notion of line and column number. Is there a way to say that I
want to skip
N bytes along the stream? I didn't find an obvious one myself when browsing
through the docs.
Thanks again,
Christian
2010/8/6 David Virebayre
On Fri, Aug 6, 2010 at 11:03 AM, C Gosch
wrote: You're right, I probably used the wrong words .. I meant that apparently the tokens Parsec uses are of type Char, and I would actually at some point like to continue parsing, but using different tokens. Sorry if I still got it wrong, I'm new :) I can post some code later, as I don't have it here right now.
Parsec (at least version 3) uses any type of token you want.
Quick example off the top of my head, I didn't check if it compiles:
-- you have to make lists of your token type an instance of Stream :
instance Stream [ MyTokenType ] Identity MyTokenType where uncons [] = return Nothing uncons (x:xs) = return $ Just (x,xs)
-- your parser type is going to look like this :
type MyParser a = ParsecT [MyTokenType] () Identity a
-- assuming your toke type looks like this
Data MyTokenType = A Char | B Word8 deriving (Show)
-- you need a basic parser from which you can make more complicated ones
satisChar :: ( Char -> Bool ) -> MyParser Char satisChar f = tokenPrim prt pos match where prt = show pos p _l _cs = incSourceLine p 1 match (A c) = if f c then Just c else Nothing match _ = Nothing
satisBin :: ( Word8 -> Bool ) -> MyParser Word8 satisBin f = tokenPrim prt pos match where prt = show pos p _l _cs = incSourceLine p 1 match (B w) = if f w then Just w else Nothing match _ = Nothing
-- You can define basic parsers like this
-- parse any letter letter = satisChar (const True)
-- parse a specific char, it will return char c = satisChar (==c)
-- parse any binary word binary = satisBin (const True)
-- parse a specific binary word w = satisBin (==w)
-- now you can combine this to make more complicated parsers.
...

Hi,
I started using Data.Binary yesterday for a project and it's nice. I'm
not sure it's possible, but perhaps you could switch tools along the
way:
- Parse the ASCII bit with Parsec ;
- Give the remaining ByteString to Data.Binary for the binary part ;
- Switch back and forth as required.
Patrick
On Fri, Aug 6, 2010 at 5:03 AM, C Gosch
2010/8/6 Dean Herington
At 11:42 PM +0200 8/5/10, C Gosch wrote:
Hi, does anyone here know their way around in Parsec? I'm trying to parse a file which contains some binary parts too. I have been using Parsec 3.0.1 to parse the first ASCII part, but all the parsers are returning Char type tokens, so it would not work with the binary parts .. is there any way to do this? I am using Text.Parsec.ByteString.Lazy.
Note that I'm new to Haskell and Parsec, and doing this within a small project that I basically do to get a grip on Haskell (and Parsec, because it appears to be really nifty and I want to learn more about it).
Thanks for any hints, Christian
Parsing binary parts of a file is not inherently a problem. A parser can return any type; different parsers in your application will likely return different types.
If you include some code we'll be able to help you more specifically.
Dean
You're right, I probably used the wrong words .. I meant that apparently the tokens Parsec uses are of type Char, and I would actually at some point like to continue parsing, but using different tokens. Sorry if I still got it wrong, I'm new :) I can post some code later, as I don't have it here right now.
Cheers, Christian
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Ah wait ... I saw that I can get the input stream with
getParserState,
as it is part of the state. If that is the currently remaining stream, I
could use any functions from ByteString
to skip over some parts or do whatever with them, and update the state to
the last position after the part I want
to skip.
Am I correct there, or am I then getting something in Parsec out of sync?
Guess I'll just try :)
Thanks!
Christian
2010/8/6 Patrick LeBoutillier
Hi,
I started using Data.Binary yesterday for a project and it's nice. I'm not sure it's possible, but perhaps you could switch tools along the way:
- Parse the ASCII bit with Parsec ; - Give the remaining ByteString to Data.Binary for the binary part ; - Switch back and forth as required.
Patrick
On Fri, Aug 6, 2010 at 5:03 AM, C Gosch
wrote: 2010/8/6 Dean Herington
At 11:42 PM +0200 8/5/10, C Gosch wrote:
Hi, does anyone here know their way around in Parsec? I'm trying to parse a file which contains some binary parts too. I have been using Parsec 3.0.1 to parse the first ASCII part, but all
parsers are returning Char type tokens, so it would not work with the binary parts .. is there any way to do this? I am using Text.Parsec.ByteString.Lazy.
Note that I'm new to Haskell and Parsec, and doing this within a small project that I basically do to get a grip on Haskell (and Parsec, because it appears to be really nifty and I want to learn more about it).
Thanks for any hints, Christian
Parsing binary parts of a file is not inherently a problem. A parser can return any type; different parsers in your application will likely return different types.
If you include some code we'll be able to help you more specifically.
Dean
You're right, I probably used the wrong words .. I meant that apparently
the the
tokens Parsec uses are of type Char, and I would actually at some point like to continue parsing, but using different tokens. Sorry if I still got it wrong, I'm new :) I can post some code later, as I don't have it here right now.
Cheers, Christian
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

On Friday 06 August 2010 21:02:48, C Gosch wrote:
Ah wait ... I saw that I can get the input stream with getParserState, as it is part of the state. If that is the currently remaining stream, I could use any functions from ByteString to skip over some parts or do whatever with them, and update the state to the last position after the part I want to skip. Am I correct there, or am I then getting something in Parsec out of sync? Guess I'll just try :)
Thanks! Christian
I think using getInput and setInput would be more convenient. - parse text - bs <- getInput - let (binaryResult, remainingInput) = treatBinaryPart bs - setInput remainingInput - continue parsing
participants (4)
-
C Gosch
-
Daniel Fischer
-
David Virebayre
-
Patrick LeBoutillier