
deliverable:
I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. First I wrote the simulation of the 35 days of data in Clojure and then translated it into Haskell with great help from the glorious #haskell folks. I had to add -A5G -K5G to make it work. It does 10 days OK hovering at 57 GB of RAM; I include profiling of that in sc10days.prof.
At first the Haskell executable goes faster than Clojure, not by an order of magnitude, but by 2-3 times per day simulated. (Clojure also fits well in its 32 GB JVM with compressed references.) However, Haskell gets stuck after a while, and for good. Clearly I'm not doing Haskell optimally here, and would appreciate optimization advice. Here's the code:
The data and problem description is in
http://github.com/alexy/husky/blob/master/Haskell-vs-Clojure-Twitter.md
-- also referred from the main README.md.
The main is in sc.hs, and the algorithm is in SocRun.hs. The original Clojure is in socrun.clj. This is a continuation of active Twitter research and the results will be published, and I'd really like to make Haskell work at this scale and beyond! The seq's sprinkled already did no good. I ran under ghc 6.10 with -O2 with or without - fvia-C, with no difference in stallling, and am working to bring 6.12 to bear now.
Hey. Very cool! When you run it with +RTS -s what amount of time is being spent in garbage collection? What are you main data types? When you compile with -prof -auto-all and do some heap profiling, what do you see? There's an introduction to profiling with GHC's heap and time tools here: http://book.realworldhaskell.org/read/profiling-and-optimization.html#id6777... Either way: * step one: do time profiling * step two: do space/heap profiling * look at the main data types being allocated and improve their representation. * look at the main functions using time, and improve their complexity. * iterate until happy. -- Don