ByteString/parsec

HI,
I want to efficiently parse a large collections of files.
The files are in the format :
example title
TITLE
author name
AUTHOR
some lines with summary here
SUMMARY
the real text
TEXT
a list of links
LINKS
I want to use "ByteString" here, but which library should I use to
parse ? "attoparsec" or "bytestringparser",
both export the same interface.
When I use one of these I thaught it would be nice to write something
like this :
fileParser :: Parser Content
fileParser = do
title <- manyTill getInput (string . pack "\nTITLE\n")
author <- manyTill getInput (string. pack "\nTITLE\n")
....
return Content title author ...
But this doesn't work.
Even on a small example :
parseTest (manyTill getInput (string $ pack "SPLIT") (pack "split
the text at SPLIT part two")
I get a stack overflow. Obviously I'm not understanding something here.
Are there any good examples of open source projects which parse
ByteString data ?
thanks in advance,
Pieter
--
Pieter Laeremans

Pieter Laeremans wrote:
fileParser :: Parser Content fileParser = do title <- manyTill getInput (string . pack "\nTITLE\n") author <- manyTill getInput (string. pack "\nTITLE\n") .... return Content title author ...
But this doesn't work.
"getInput" does not consume any input but just returns the remaining input. Therefore it is called infinitely often. If your input were String you could use "anyChar" instead. I don't know how to do it with ByteString (maybe parsec-3?).
I get a stack overflow. Obviously I'm not understanding something here. Are there any good examples of open source projects which parse ByteString data ?
thanks in advance,
Cheers Christian

"Pieter Laeremans"
Are there any good examples of open source projects which parse ByteString data ?
Don't know about "good", but here are some working examples that may or may not be useful to you. Pointers are inside the darcs repo, you can of course 'darcs get http://malde.org/~ketil/biohaskell/biolib' to obtain the whole deal. A simple parser for the even simpler FASTA format: http://malde.org/~ketil/biohaskell/biolib/Bio/Sequence/Fasta.hs Parser to decode the ACE format by tokenizing to Bytestring, then using Parsec: http://malde.org/~ketil/biohaskell/biolib/Bio/Alignment/ACE.hs The repo also contains parsers for other formats like FASTQ, and binary formats like SFF. Please email me with any questions or comments. -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (3)
-
Christian Maeder
-
Ketil Malde
-
Pieter Laeremans