
At Thu, 02 Jun 2011 13:52:52 +0200, Ketil Malde wrote:
I have a bunch of old code, parsers etc, which are based on the 'readFile' paradigm:
type Str = Data.ByteString.Lazy.Char8.ByteString -- usually
decodeFoo :: Str -> Foo encodeFoo :: Foo -> Str
readFoo = decodeFoo . readFile writeFoo f = writeFile f . encodeFoo hReadFoo = decodeFoo . hRead : (etc)
This works pretty well, as long as Foo is strict enough that you don't retain all or huge parts of input, and as long as you can process input in a forward, linear fashion. And, like my frequency count above, I can't really see how this can be made much simpler.
This is fine if you never have parse errors and always read to the end of the file. Otherwise, the code above is incorrect and ends up leaking file descriptors. In general, it is very hard to write parsers that parse every possible input and never fail. Thus, for anything other than a toy program, your code actually has to be: readFoo path = bracket (hOpen path) hclose $ hGetContents >=> (\s -> return $! decodeFoo s) Which is still not guaranteed to work if Foo contains thunks, so then you end up having to write: readFoo path = bracket (hOpen path) hclose $ \h -> do s <- hGetContents h let foo = decodeFoo s deepseq foo $ return foo Or, finally, what a lot of code falls back to, inserting gratuitous calls to length: readFoo path = bracket (hOpen path) hclose $ \h -> do s <- hGetContents h length s `seq` return decodeFoo s The equivalent code with the iterIO package would be: readFoo path = enumFile path |$ fooI which seems a lot simpler to me...
Would there be any great advantage to rewriting my stuff to use iterators? Or at least, use iterators for new stuff?
In addition to avoiding edge cases like leaked file descriptors and memory, one of the things I discovered in implementing iterIO is that it's really handy to have your I/O functions be the same as your parsing combinators. So iteratees might actually admit a far simpler implementation of decodeFoo/fooI. More specifically, imagine that you have decodeFoo, and now want to implement decodeBar where a Bar includes some Foos. Unfortunately, having an implementation of decodeFoo in-hand doesn't help you implement decodeBar. You'd have to re-write your function to return residual input, maybe something like: decodeFooReal :: String -> (Foo, String) decodeFoo :: String -> Foo decodeFoo = fst . decodeFooReal and now you implement decodeBar in terms of decodeFooReal, but you have to pass around residual input explicitly, handle parsing failures explicitly, etc.
As I see it, iterators are complex and the dust is only starting to settle on implementations and interfaces, and will introduce more dependencies. So my instinct is to stick with the worse-is-better approach, but I'm willing to be educated.
I fully agree with the point about dependencies and waiting for the dust to settle, though I hope a lot of that changes in a year or so. However, iterIO should already significantly reduce the complexity. David