[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
_______________________________________________
The combination of attoparsec + a streaming adapter for pipes/conduit/streaming should easily be able to handle tens of megabytes per second and hundreds of thousands of lines per second.
Which parses a pipe-separated-value file from the FCC pretty quickly. As I recall it goes through a >100MB file in under three seconds, and it has to do a bunch of other work besides.
I also ported the above code to use Streaming instead of Pipes. I recall that using Streaming master, the parser I use to read the dictionary:
takeTill isEndOfLine <* endOfLine
Handles about 3 million lines per second. I can’t remember what the number is for Pipes but it’s probably similar. That’s really good for such a simple thing to write!
Unfortunately there is a performance bug in Streaming that’s fixed in master but hasn’t been released for a number of months :-/
—Will