Re: [Haskell-cafe] data analysis question

13 Nov 2014

      Somewhat off topic, but: I said csv-conduit because I have some experience
with it. When we were doing some analytic work at FP Complete, a few of us
analyzed both csv-conduit and cassava, and didn't really have a good feel
for which was the better library. We went with csv-conduit[1], but I'd be
really interested in hearing a comparison of the two libraries from someone
who knows about them.

[1] Don't ask what tipped us in that direction, I honestly don't remember
what it was.

On Thu Nov 13 2014 at 9:24:47 AM Christopher Allen 
wrote:
...
Memory profiling only to test how stream-y the streaming was. I didn't
think perf would be that different between them. The way I had to transform
my fold for Pipes was a titch awkward, otherwise happy with it.
If people are that interested in the perf side of things I can setup a
criterion harness and publish those numbers as well.
Mostly I was impressed with:
1. How easy it was to start using the streaming module in Cassava because
it's just a Foldable instance.
2. How Pipes used <600kb of memory.
Your pull request for csv-conduit looks really clean and nice. I've merged
it, thanks for sending it my way!
--- Chris Allen
On Thu, Nov 13, 2014 at 12:26 AM, Christopher Reichert <
creichert07@gmail.com> wrote:
...
On Wed, Nov 12 2014, Christopher Allen  wrote:
...
[Snip]
csv-conduit isn't in the test results because I couldn't figure out how
to
use it. pipes-csv is proper streaming, but uses cassava's parsing
machinery
and data types. Possibly this is a problem if you have really wide rows
but
I've never seen anything that would be problematic in that realm even
when
I did a lot of HDFS/Hadoop ecosystem stuff. AFAICT with pipes-csv you're
streaming rows, but not columns. With csv-conduit you might be able to
incrementally process the columns too based on my guess from glancing at
the rather scary code.
Any problems in particular? I've had pretty good luck with
csv-conduit. However, I have noticed that it's rather picky about type
signatures and integrating custom data types isn't straight forward at
first.
csv-conduit also seems to have drawn inspiration from cassava:
http://hackage.haskell.org/package/csv-conduit-0.6.3/docs/Data-CSV-Conduit-C...
...
[Snip]
To that end, take a look at my rather messy workspace here:
https://github.com/bitemyapp/csvtest
I've made a PR for the conduit version:
https://github.com/bitemyapp/csvtest/pull/1
It could certainly be made more performent but it seems to hold up well
in comparison. I would be interested in reading the How I Start Article
and hearing more about your conclusions. Is this focused primarily on
the memory profile or also speed?
Regards,
-Christopher
...
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] data analysis question

Michael Snoyman