Re: [Haskell-cafe] data analysis question

13 Nov 2014

      On Wed, Nov 12 2014, Christopher Allen  wrote:
...
[Snip]
csv-conduit isn't in the test results because I couldn't figure out how to
use it. pipes-csv is proper streaming, but uses cassava's parsing machinery
and data types. Possibly this is a problem if you have really wide rows but
I've never seen anything that would be problematic in that realm even when
I did a lot of HDFS/Hadoop ecosystem stuff. AFAICT with pipes-csv you're
streaming rows, but not columns. With csv-conduit you might be able to
incrementally process the columns too based on my guess from glancing at
the rather scary code.
Any problems in particular? I've had pretty good luck with
csv-conduit. However, I have noticed that it's rather picky about type
signatures and integrating custom data types isn't straight forward at
first.

csv-conduit also seems to have drawn inspiration from cassava:
http://hackage.haskell.org/package/csv-conduit-0.6.3/docs/Data-CSV-Conduit-C...
...
[Snip]
To that end, take a look at my rather messy workspace here:
https://github.com/bitemyapp/csvtest
I've made a PR for the conduit version:
https://github.com/bitemyapp/csvtest/pull/1

It could certainly be made more performent but it seems to hold up well
in comparison. I would be interested in reading the How I Start Article
and hearing more about your conclusions. Is this focused primarily on
the memory profile or also speed?

Regards,
-Christopher
...
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] data analysis question

Christopher Reichert