
I while back following the most recent discussion about filepaths and IO generally, I decided to pick up the torch and try my hand at a solution that delt with all the issues. My conception of the problem can be found here: http://www.haskell.org//pipermail/haskell-cafe/2005-January/008955.html However, I quickly got sidetracked by the character encoding issue, and sequential IO operations generally. I have come up with some code that I think is interesting, not least of which because it is somewhat performant. In the spirit of 'release early, release often' I am now making it available. I ended up developing a sequential IO API and a couple of test cases to drive it. I currently have a simple word count and a md5 implementation. There is a darcs repo at: http://www.eecs.tufts.edu/~rdocki01/filepath/ The code relies on multi-parameter typeclasses, fundeps and unboxed tuples, so its GHC only. The good news: 1) The API is semi-manageable. It is based around Producer, Transformer and Consumer functors (at least I think they can be called functors; I'm not real knowledgeable here). I think that with a little more work (and the input of people more experienced at this stuff than I) it could be sugared into a pretty usable API. 2) It is performant (mostly). At least it outperforms other Haskell IO methods I have tried. My 'wc' is about twice as fast as the current shootout version in informal tests (the shootout code is included in the repo). My md5 can sum somewhere between 2-4Mb/Sec on my hardware. 3) Very importantly, the code appears to have reliable constant-space behavior. 4) We are not artificially forced into the IO or ST monads for performance reasons. Explicit state passing is used where necessary, which seems to have the added benefit of helping the compiler to find good optimizations (pure speculation). The bad news: 1) Mostly performant is still not great. My wc takes about 6 times as long as the C version on my machine, and md5 takes about about 80 times as long. (interestingly, the C md5sum takes about 1/5 of the time as C wc!). However, it is within an order of magnitude of a java implementation of md5 (using the standard digest classes). 2) The performance is pretty fragile. Small changes can cause large performance hits for no easily discernible reason. This probably relates to the way particular optimizations do or don't get applied. The situation is hugely complicated by the typeclass-heavy API, I am sure. 3) The stream-observer paradigm is a somewhat difficult programming environment. My next step is to try some actual character encoding implementations (the original purpose after all) and see how that goes. I'd also like to try gzip and gunzip transformer layers. Any ideas for improvements (including patches!) are welcome. Robert Dockins PS the code currently includes a number of vestigial remnants of false starts, and is generally kind of ugly; you are warned.

On Thu, Mar 17, 2005 at 10:31:20AM -0500, robert dockins wrote:
2) It is performant (mostly). At least it outperforms other Haskell IO methods I have tried. My 'wc' is about twice as fast as the current shootout version in informal tests (the shootout code is included in the repo). My md5 can sum somewhere between 2-4Mb/Sec on my hardware.
You know that http://www.bagley.org/~doug/shootout/ is frozen, don't you? For a current version look at http://shootout.alioth.debian.org/ The current version is fast but ugly. There was some comitee work on Haskell mailing lists to make it prettier, but it didn't make to the shootout yet. For some reason the GHC implementation of count-words on the shootout doesn't compile, perhaps because someone thought it would be a good idea to give two -O options: -O3 -O2 BTW, do we care about such benchmarks? I am going to have some spare time and I could work on Haskell solutions a bit, but I'm not sure it's worth the hassle. Best regards Tomasz

2) It is performant (mostly). At least it outperforms other Haskell IO methods I have tried. My 'wc' is about twice as fast as the current shootout version in informal tests (the shootout code is included in the repo). My md5 can sum somewhere between 2-4Mb/Sec on my hardware.
You know that http://www.bagley.org/~doug/shootout/ is frozen, don't you? For a current version look at http://shootout.alioth.debian.org/ The current version is fast but ugly. There was some comitee work on Haskell mailing lists to make it prettier, but it didn't make to the shootout yet.
Thanks, I do have an old version; it wasn't on bagley.org, but I'm not sure exactly where I found it. I'll compare against the newest version when I get home. The reason I compared to the shootout program was to get a sense of how well the API I was developing stacked up against hand-optimized haskell. So, even getting pretty close is a win as far as I'm concerned. (... quick google ...) I just found Ian's md5 implementation. I'll compare to that as well when I get a chance.
BTW, do we care about such benchmarks? I am going to have some spare time and I could work on Haskell solutions a bit, but I'm not sure it's worth the hassle.
I think they are interesting as an indication of where haskell and GHC in particular are weak. If the techniques developed for optimizing shootout scripts can drive better optimizations or new, better libraries, I think that's worthwhile. OTOH, nobody asks if perl golf (for example) is worthwhile, they just do it for kicks (as far as I can tell).

I've been wondering lately, about the shootout results. Is there some Haskell Language intrinsic reason that OCaml performs so well in comparison to Haskell in the shootout (e.g. even optimized Haskell code will allways be slower because of overhead of language features or something like this), or is a maturity issue. In 5 years from now will we always be able to match or surpass OCaml's efficiency? I realize the benchmarks are arbitrary, but it's just something I've been wondering about. --Bryce On Thu, 17 Mar 2005, robert dockins wrote:
2) It is performant (mostly). At least it outperforms other Haskell IO methods I have tried. My 'wc' is about twice as fast as the current shootout version in informal tests (the shootout code is included in the repo). My md5 can sum somewhere between 2-4Mb/Sec on my hardware.
You know that http://www.bagley.org/~doug/shootout/ is frozen, don't you? For a current version look at http://shootout.alioth.debian.org/ The current version is fast but ugly. There was some comitee work on Haskell mailing lists to make it prettier, but it didn't make to the shootout yet.
Thanks, I do have an old version; it wasn't on bagley.org, but I'm not sure exactly where I found it. I'll compare against the newest version when I get home.
The reason I compared to the shootout program was to get a sense of how well the API I was developing stacked up against hand-optimized haskell. So, even getting pretty close is a win as far as I'm concerned.
(... quick google ...) I just found Ian's md5 implementation. I'll compare to that as well when I get a chance.
BTW, do we care about such benchmarks? I am going to have some spare time and I could work on Haskell solutions a bit, but I'm not sure it's worth the hassle.
I think they are interesting as an indication of where haskell and GHC in particular are weak. If the techniques developed for optimizing shootout scripts can drive better optimizations or new, better libraries, I think that's worthwhile. OTOH, nobody asks if perl golf (for example) is worthwhile, they just do it for kicks (as far as I can tell).
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, 17 Mar 2005 16:40:04 +0100, Tomasz Zielonka
BTW, do we care about such benchmarks?
I'd say "we" (as in Haskell programmers) don't care, but "they" (as in imperative programmers that are looking into maybe trying Haskell) probably do. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862
participants (4)
-
Bryce Bockman
-
robert dockins
-
Sebastian Sylvan
-
Tomasz Zielonka