Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

15 Jun 2010


      deliverable:
...
...
If you just want to optimize it and not compare exactly equal idiomatic code,
you should stop using functional data structures and use a structure that fits
your problem (the ST monad has been designed for that in Haskell), because
compilers do not detect single-threaded usage and rewrite all your code to
something mutable and thereby avoid O(log n) costs.
In practice it is probably easier to write the whole thing against the parallel
Boost Graph Library (a C++ library), since that library provides the
abstractions that you would want. If you go this path, it will probably end up
being 10-100 times faster.
Surely I can rewrite it in C++ or just C and gain speedup, but not
maintainability and ease of prototyping algorithms -- the main reasons
I switched to FP, with added reliability, conciseness, reasonable
syntax bonuses.  It turned out Clojure is quite capable of doing it
well, but has Null Pointer Exceptions and "debug data shape by
running" annoying to an ML programmer; trying Haskell for speedup is
worth it!  It proves less predictable than expected -- e.g. with OCaml
if it typechecks and runs, it usually runs fast.  The point of this
exercise is to identify typical performance bottlenecks in the domain
and see how they can be avoided.  The jury's still out on this one.
Following the algorithm for performance tuning in the previous email
should be enough. Have you obtained heap profiles yet?

-- Don

Re: [Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

Don Stewart