Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

23 Jul 2013

      Justin Paston-Cooper  writes:
...
Dear All,
Recently I have been doing a lot of CSV processing. I initially tried to
use the Data.Csv (cassava) library provided on Hackage, but I found this to
still be too slow for my needs. In the meantime I have reverted to hacking
something together in C, but I have been left wondering whether a tidy
solution might be possible to implement in Haskell.
Have you tried profiling your cassava implementation? In my experience
I've found it's quite quick. If you have an example of a slow path I'm
sure Johan (cc'd) would like to know about it.
...
I would like to build a library that satisfies the following:
1) Run a function < ... -> a_n -> m (Maybe (b_1, ..., b_n))>>,
with <<m>> some monad and the <<a>>s and <<b>>s being input and output.
2) Be able to specify a maximum record string length and output record
string length, so that the string buffers used for reading and outputting
lines can be reused, preventing the need for allocating new strings for
each record.
3) Allocate only once, the memory where the parsed input values, and output
values are put.
Ultimately this could be rather tricky to enforce. Haskell code
generally does a lot of allocation and the RTS is well optimized to
handle this.

I've often found that trying to shoehorn a non-idiomatic "optimal"
imperative approach into Haskell produces worse performance than the
more readable, idiomatic approach.

I understand this leaves many of your questions unanswered, but I'd give
the idiomatic approach a bit more time before trying to coerce C into
Haskell. Profile, see where the hotspots are and optimize
appropriately. If the profile has you flummoxed, the lists and #haskell
are always willing to help given the time.

Cheers,

- Ben

Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

Ben Gamari