
Somewhat off topic, but: I said csv-conduit because I have some experience
with it. When we were doing some analytic work at FP Complete, a few of us
analyzed both csv-conduit and cassava, and didn't really have a good feel
for which was the better library. We went with csv-conduit[1], but I'd be
really interested in hearing a comparison of the two libraries from someone
who knows about them.
[1] Don't ask what tipped us in that direction, I honestly don't remember
what it was.
On Thu Nov 13 2014 at 9:24:47 AM Christopher Allen
Memory profiling only to test how stream-y the streaming was. I didn't think perf would be that different between them. The way I had to transform my fold for Pipes was a titch awkward, otherwise happy with it.
If people are that interested in the perf side of things I can setup a criterion harness and publish those numbers as well.
Mostly I was impressed with:
1. How easy it was to start using the streaming module in Cassava because it's just a Foldable instance.
2. How Pipes used <600kb of memory.
Your pull request for csv-conduit looks really clean and nice. I've merged it, thanks for sending it my way!
--- Chris Allen
On Thu, Nov 13, 2014 at 12:26 AM, Christopher Reichert < creichert07@gmail.com> wrote:
On Wed, Nov 12 2014, Christopher Allen
wrote: [Snip] csv-conduit isn't in the test results because I couldn't figure out how to use it. pipes-csv is proper streaming, but uses cassava's parsing machinery and data types. Possibly this is a problem if you have really wide rows but I've never seen anything that would be problematic in that realm even when I did a lot of HDFS/Hadoop ecosystem stuff. AFAICT with pipes-csv you're streaming rows, but not columns. With csv-conduit you might be able to incrementally process the columns too based on my guess from glancing at the rather scary code.
Any problems in particular? I've had pretty good luck with csv-conduit. However, I have noticed that it's rather picky about type signatures and integrating custom data types isn't straight forward at first.
csv-conduit also seems to have drawn inspiration from cassava:
http://hackage.haskell.org/package/csv-conduit-0.6.3/docs/Data-CSV-Conduit-C...
[Snip] To that end, take a look at my rather messy workspace here: https://github.com/bitemyapp/csvtest
I've made a PR for the conduit version: https://github.com/bitemyapp/csvtest/pull/1
It could certainly be made more performent but it seems to hold up well in comparison. I would be interested in reading the How I Start Article and hearing more about your conclusions. Is this focused primarily on the memory profile or also speed?
Regards, -Christopher
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe