
Thanks for the links! I didn't realize that Iteratee can also do random access. I looked at Oleg's slides a while back, but I haven't wrapped my head around it. On Mon, 2010-03-15 at 08:57 -0500, John Lato wrote:
Hi Dave,
From: david fries
Hello Café
Some time ago I wrote a parser for a project of one our customers. The format was proprietary and binary. The data was structured as a tree with tables pointing to sub tables farther in the file. (Well actually there was one or two cases where branches joined together, so I guess it was a directed graph.) Also it had no defined endianess, some tables were bigendian others were little endian and some were in their very own in-house-endianess. All in all, the typical binary data structure that has been in use and continuously extended for the last 15 years. You know the kind.
My parser was written in C# and wasn't particularly elegant, but it worked reliably. I was wondering how you would parse tree-like structures with Parsec (or other functional parsers)? Up to know, all examples I've seen were of sequential data. To parse trees, you'd essentially need to be able to follow pointers, parse whatever is there and then jump back. I guess I'd have to mess around with the internal state of the parser to do that, which is something I'd rather avoid.
Have you seen Edward Kmett's Parsing Trifecta talk/slides, available at http://comonad.com/reader/2009/iteratees-parsec-and-monoid/ ? I think a similar approach would allow you to parse this data structure. The trick is to create a data source that can read from arbitrary locations.
Also see http://okmij.org/ftp/Haskell/Iteratee/Tiff.hs for an iteratee-based TIFF reader. TIFF files have a similar structure, so this code could apply pretty directly. The parsing is quite low-level, but you could use iteratee-parsec [1] or attoparsec-iteratee [2] to use parser combinators with this technique.
[1] http://hackage.haskell.org/package/iteratee-parsec [2] http://hackage.haskell.org/package/attoparsec-iteratee