
On Aug 30, 2018, at 11:21, Olaf Klinke
[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line? _______________________________________________
The combination of attoparsec + a streaming adapter for pipes/conduit/streaming should easily be able to handle tens of megabytes per second and hundreds of thousands of lines per second. For an example, check out https://github.com/wyager/Callsigns/blob/master/Callsigns.hs Which parses a pipe-separated-value file from the FCC pretty quickly. As I recall it goes through a >100MB file in under three seconds, and it has to do a bunch of other work besides. I also ported the above code to use Streaming instead of Pipes. I recall that using Streaming master, the parser I use to read the dictionary: takeTill isEndOfLine <* endOfLine Handles about 3 million lines per second. I can’t remember what the number is for Pipes but it’s probably similar. That’s really good for such a simple thing to write! Unfortunately there is a performance bug in Streaming that’s fixed in master but hasn’t been released for a number of months :-/ —Will