Mining Twitter data in Haskell and Clojure

I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. First I wrote the simulation of the 35 days of data in Clojure and then translated it into Haskell with great help from the glorious #haskell folks. I had to add -A5G -K5G to make it work. It does 10 days OK hovering at 57 GB of RAM; I include profiling of that in sc10days.prof. At first the Haskell executable goes faster than Clojure, not by an order of magnitude, but by 2-3 times per day simulated. (Clojure also fits well in its 32 GB JVM with compressed references.) However, Haskell gets stuck after a while, and for good. Clearly I'm not doing Haskell optimally here, and would appreciate optimization advice. Here's the code: http://github.com/alexy/husky The data and problem description is in http://github.com/alexy/husky/blob/master/Haskell-vs-Clojure-Twitter.md -- also referred from the main README.md. The main is in sc.hs, and the algorithm is in SocRun.hs. The original Clojure is in socrun.clj. This is a continuation of active Twitter research and the results will be published, and I'd really like to make Haskell work at this scale and beyond! The seq's sprinkled already did no good. I ran under ghc 6.10 with -O2 with or without - fvia-C, with no difference in stallling, and am working to bring 6.12 to bear now. Cheers, Alexy

deliverable:
I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. First I wrote the simulation of the 35 days of data in Clojure and then translated it into Haskell with great help from the glorious #haskell folks. I had to add -A5G -K5G to make it work. It does 10 days OK hovering at 57 GB of RAM; I include profiling of that in sc10days.prof.
At first the Haskell executable goes faster than Clojure, not by an order of magnitude, but by 2-3 times per day simulated. (Clojure also fits well in its 32 GB JVM with compressed references.) However, Haskell gets stuck after a while, and for good. Clearly I'm not doing Haskell optimally here, and would appreciate optimization advice. Here's the code:
The data and problem description is in
http://github.com/alexy/husky/blob/master/Haskell-vs-Clojure-Twitter.md
-- also referred from the main README.md.
The main is in sc.hs, and the algorithm is in SocRun.hs. The original Clojure is in socrun.clj. This is a continuation of active Twitter research and the results will be published, and I'd really like to make Haskell work at this scale and beyond! The seq's sprinkled already did no good. I ran under ghc 6.10 with -O2 with or without - fvia-C, with no difference in stallling, and am working to bring 6.12 to bear now.
Hey. Very cool! When you run it with +RTS -s what amount of time is being spent in garbage collection? What are you main data types? When you compile with -prof -auto-all and do some heap profiling, what do you see? There's an introduction to profiling with GHC's heap and time tools here: http://book.realworldhaskell.org/read/profiling-and-optimization.html#id6777... Either way: * step one: do time profiling * step two: do space/heap profiling * look at the main data types being allocated and improve their representation. * look at the main functions using time, and improve their complexity. * iterate until happy. -- Don

I've supplied a profile report there. Since I load the graphs in memory and then walk them a lot, the time seems expected. It allocates a lot, though. The main graph type is type Graph = M.Map User AdjList type AdjList = M.Map Day Reps type User = B.ByteString type Day = Int type Reps = M.Map User Int and I walk it with M.foldWithKey. Folks said it's not strict enough, hence I tried to seq the step function, but to no avail so far.

Would it be possible to use an IntMap instead of your OrdMap? Perhaps zip
your users with [0..] and key off the integer?
As a side note, I threw this package onto Hackage a while ago and may suit
your needs if you decide to move to something like IntMap:
http://hackage.haskell.org/package/EnumMap
It does have a performance hit over an IntMap, but I'm not entirely sure how
large it is.
/jve
On Mon, Jun 14, 2010 at 10:27 AM, braver
I've supplied a profile report there. Since I load the graphs in memory and then walk them a lot, the time seems expected. It allocates a lot, though. The main graph type is
type Graph = M.Map User AdjList type AdjList = M.Map Day Reps type User = B.ByteString type Day = Int type Reps = M.Map User Int
and I walk it with M.foldWithKey. Folks said it's not strict enough, hence I tried to seq the step function, but to no avail so far. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

deliverable:
I've supplied a profile report there. Since I load the graphs in memory and then walk them a lot, the time seems expected. It allocates a lot, though. The main graph type is
type Graph = M.Map User AdjList type AdjList = M.Map Day Reps type User = B.ByteString type Day = Int type Reps = M.Map User Int
and I walk it with M.foldWithKey. Folks said it's not strict enough, hence I tried to seq the step function, but to no avail so far.
Oh, you'll want insertWith'. You might also consider bytestring-trie for the Graph, and IntMap for the AdJList ? -- Don

On Jun 14, 11:40 am, Don Stewart
Oh, you'll want insertWith'.
You might also consider bytestring-trie for the Graph, and IntMap for the AdJList ?
Yeah, I saw jsonb using Trie and thought there's a reason for it. But it's very API-poor compared with Map, e.g. there's not even a fold -- should one toListBy first? -- Alexy

OK, sample data is uploaded to data/sample in the git repo, and README.md updated with the build and run command lines. I've achieved a bit more strictness, again with great help from @dons, @dafis, and other great folks here and #haskell, but it's still slower than Clojure and occupies a bit more RAM -- though not as much as before. Now that the bangs are sprinkled around, it's slowed down. I'll profile this one tomorrow, but it works the 100K samples in a minute, so is wide open to instrumentation! Please help me build the fastest Twitter graph miner in Haskell! I'll keep the Clojure and an OCaml version as a real-world shootout and welcome improvements at any time. I'm especially interested in bringing concurrency to bear on it. http://github.com/alexy/husky -- this post corresponds to tag cafe2 -- Alexy

In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice with a "strange closure" or a stack overflow. So allocation is a problem still. -- Alexy

braver
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB
One tip might be to limit available heap memory by using +RTS -M2G (or whatever your real memory is). If (as seems likely) the RAM usage leads to thrashing (the symptoms being 'top' showing substantially less than 100% CPU usage, and a less responsive system), limiting heap will cause your program to fail faster, which is always an advantage when debugging. Unless you actually expect the working set to be fifty gig, that is. -k -- If I haven't seen further, it is by standing in the footprints of giants

On 15/06/2010 06:09, braver wrote:
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice with a "strange closure" or a stack overflow. So allocation is a problem still.
I'd be happy to help you track this down, but I don't have a machine big enough. Do you have any runs that display a problem with a smaller heap (< 16GB)? If the program is apparently hung, try connecting to it with 'gdb --pid=<pid>' and doing 'info thread' and 'where'. That might give me enough clues to find out where the problem is. Is this with -threaded, BTW? With residency on that scale, I'd expect the parallel GC to help quite a lot. But obviously getting it to not crash/hang is the first priority :) Cheers, Simon

On Jun 15, 6:27 am, Simon Marlow
On 15/06/2010 06:09, braver wrote:
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice with a "strange closure" or a stack overflow. So allocation is a problem still.
I'd be happy to help you track this down, but I don't have a machine big enough. Do you have any runs that display a problem with a smaller heap (< 16GB)?
If the program is apparently hung, try connecting to it with 'gdb --pid=<pid>' and doing 'info thread' and 'where'. That might give me enough clues to find out where the problem is.
Is this with -threaded, BTW? With residency on that scale, I'd expect the parallel GC to help quite a lot. But obviously getting it to not crash/hang is the first priority :)
Simon - thanks for the tips, this is what gdb says when it's stuck at 45 GB when limited with -A5G -M40G: ... 0x00000000004c3c21 in free_mega_group () (gdb) info thread * 1 Thread 0x2b21c1da4dc0 (LWP 10210) 0x00000000004c3c21 in free_mega_group () (gdb) where #0 0x00000000004c3c21 in free_mega_group () #1 0x00000000004c3ff9 in freeChain () #2 0x00000000004c5ab0 in GarbageCollect () #3 0x00000000004bff96 in scheduleDoGC () #4 0x00000000004c0b25 in scheduleWaitThread () #5 0x00000000004bea09 in real_main () #6 0x00000000004beb17 in hs_main () #7 0x00000037d5a1d974 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000402ca9 in _start () I'll also supply heap profiles for small runs shortly. -- Alexy

On 15/06/2010 20:43, braver wrote:
On Jun 15, 6:27 am, Simon Marlow
wrote: On 15/06/2010 06:09, braver wrote:
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed twice with a "strange closure" or a stack overflow. So allocation is a problem still.
I'd be happy to help you track this down, but I don't have a machine big enough. Do you have any runs that display a problem with a smaller heap (< 16GB)?
If the program is apparently hung, try connecting to it with 'gdb --pid=<pid>' and doing 'info thread' and 'where'. That might give me enough clues to find out where the problem is.
Is this with -threaded, BTW? With residency on that scale, I'd expect the parallel GC to help quite a lot. But obviously getting it to not crash/hang is the first priority :)
Simon - thanks for the tips, this is what gdb says when it's stuck at 45 GB when limited with -A5G -M40G:
... 0x00000000004c3c21 in free_mega_group () (gdb) info thread * 1 Thread 0x2b21c1da4dc0 (LWP 10210) 0x00000000004c3c21 in free_mega_group () (gdb) where #0 0x00000000004c3c21 in free_mega_group () #1 0x00000000004c3ff9 in freeChain () #2 0x00000000004c5ab0 in GarbageCollect () #3 0x00000000004bff96 in scheduleDoGC () #4 0x00000000004c0b25 in scheduleWaitThread () #5 0x00000000004bea09 in real_main () #6 0x00000000004beb17 in hs_main () #7 0x00000037d5a1d974 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000402ca9 in _start ()
Thanks. I don't see anything obviously wrong in free_mega_group() - it's part of the memory manager that returns a multi-MB block to the internal free list, and it looks down the free list to find the right place to put it, coalescing with adjacent free blocks if possible. If it is looping here, that means the free list has a cycle, which is very bad indeed. Could you try a few more things for me? - type 'display /i $pc' and then single step with 'si' for a while when it is in this state. That will tell us whether it's looping here or not. - compile with -debug and run again. That turns on a bunch of assertions. You could also try adding +RTS -DS, this turns on more sanity checking (and will slow things down a lot). If you are comfortable giving me a login on your machine then I could debug it directly, let me know. Cheers, Simon

WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new "intern" branch at tag cafe4. Still it can't do the full 35 days for all users. It comes close, however, to 30 days under ghc 6.12 with the IntMap -- just where 6.10 was with Map ByteString. Some profiling is in prof/ subdirectory, with the tag responsible and RTS profiling option in the file name; .prof are -P, and the rest are -hX. When I downsize the sample data to 1 million users, the whole run, with -P profiling, is done in 7.5 minutes. Something happens when tripling that amount. For instance, making -A10G may cause sefgault, after a fast run up to 10 days, then seeming stalling, and a dump of days up to 28 before the segfault. -A5G comes closest, to 30 days, when coupled with -H1G. It's not clear to me how to work -A and -H together. I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4. -- Alexy

Since folks got interested, I've added a doc/ subdirectory (on the "intern" branch) with a PDF defining my karmic social capital mathematically. It is this definition which is faithfully computed both in Clojure and Haskell. I've also added a LICENSE file basically stating that this research is to appear in the proceedings of the European Conference on Complex Systems, Lisbon, September 2010, and should be treated as unpublished research with the paper in copyright but the code under GPL. I.e. if you want to experiment with the definition itself, in addition to making it compute in Haskell, please let me know -- we're always glad to collaborate. The purpose to implement it in a functional language, besides loathing all others, is to enable quick experimentation with the karmic rewards, then running the whole world simulation and seeing who accumulates more social capital. It can also be used in a machine learning approach where it's made to fit an existing ranking function. Thus this computation has to be as fast as possible. -- Alexy

I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
Just a wild guess, but those foldWithKeys make me nervous. The result is strict, the step function tries to be strict, but if you look at the code for Data.IntMap.foldr, it doesn't really give you a handle for propagating that strictness. Unless the foldr definition is inlined into your code, the generic, non-strictly accumulating version might be called. Have you checked that this isn't the case? Also, shouldn't the two foldWithKeys on dm be merged? And, while Maps are strict in their representation, putting them into a non-strict field of a data structure might lose that. As I said, just guessing;-) Claus

On Jun 17, 2:36 pm, "Claus Reinke"
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
Just a wild guess, but those foldWithKeys make me nervous.
The result is strict, the step function tries to be strict, but if you look at the code for Data.IntMap.foldr, it doesn't really give you a handle for propagating that strictness. Unless the foldr definition is inlined into your code, the generic, non-strictly accumulating version might be called. Have you checked that this isn't the case?
Also, shouldn't the two foldWithKeys on dm be merged? And, while Maps are strict in their representation, putting them into a non-strict field of a data structure might lose that.
Claus -- thank you for the suggestions! Alas, it appears there's no strict foldWithKey in either Map or IntMap. Hence I had to cook one up with foldl' and fromList. I also merged both dm folds into one. The resulting tag is cafe5. It still gets killed by memory constraint at day 28. We're making progress here! :) -- Alexy

On 17/06/2010 06:23, braver wrote:
WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new "intern" branch at tag cafe4.
Still it can't do the full 35 days for all users. It comes close, however, to 30 days under ghc 6.12 with the IntMap -- just where 6.10 was with Map ByteString. Some profiling is in prof/ subdirectory, with the tag responsible and RTS profiling option in the file name; .prof are -P, and the rest are -hX.
When I downsize the sample data to 1 million users, the whole run, with -P profiling, is done in 7.5 minutes. Something happens when tripling that amount. For instance, making -A10G may cause sefgault, after a fast run up to 10 days, then seeming stalling, and a dump of days up to 28 before the segfault. -A5G comes closest, to 30 days, when coupled with -H1G. It's not clear to me how to work -A and -H together.
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion. This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26. Here are the stats from the working program: 392,908,177,040 bytes allocated in the heap 174,455,211,920 bytes copied during GC 24,151,940,568 bytes maximum residency (6 sample(s)) 36,857,590,520 bytes maximum slop 64029 MB total memory in use (1000 MB lost due to fragmentation) Generation 0: 62 collections, 0 parallel, 352.35s, 357.13s elapsed Generation 1: 6 collections, 0 parallel, 180.63s, 209.19s elapsed INIT time 0.00s ( 0.11s elapsed) MUT time 1201.47s (1294.29s elapsed) GC time 532.98s (566.33s elapsed) EXIT time 0.00s ( 5.34s elapsed) Total time 1734.46s (1860.74s elapsed) %GC time 30.7% (30.4% elapsed) Alloc rate 327,020,156 bytes per MUT second Productivity 69.3% of total user, 64.6% of total elapsed The slop calculation is off a bit, because slop for pinned objects (ByteStrings) isn't being calculated properly, I should really fix that. Cheers, Simon

marlowsd:
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion.
This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26. Here are the stats from the working program:
392,908,177,040 bytes allocated in the heap 174,455,211,920 bytes copied during GC 24,151,940,568 bytes maximum residency (6 sample(s)) 36,857,590,520 bytes maximum slop 64029 MB total memory in use (1000 MB lost due to fragmentation)
Generation 0: 62 collections, 0 parallel, 352.35s, 357.13s elapsed Generation 1: 6 collections, 0 parallel, 180.63s, 209.19s elapsed
INIT time 0.00s ( 0.11s elapsed) MUT time 1201.47s (1294.29s elapsed) GC time 532.98s (566.33s elapsed) EXIT time 0.00s ( 5.34s elapsed) Total time 1734.46s (1860.74s elapsed)
%GC time 30.7% (30.4% elapsed)
Alloc rate 327,020,156 bytes per MUT second
Productivity 69.3% of total user, 64.6% of total elapsed
Well done!!

On 24 June 2010 13:10, Simon Marlow
On 17/06/2010 06:23, braver wrote:
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion.
This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26.
I foresee Simon not having to pay for any of his beer at ICFP thus summer... Duncan

I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion.
So this was about keeping the program largely unchanged in order to keep the GHC issue repeatable for tracking? Or have you also looked into removing space leaks in the code (there still seemed to be some left in the intern/cafe5 version, iirc)? Alexy: what does the latest version of the code look like - is there an uptodate text connecting all the versions/branches/tags, so that one can find the latest version, and is there a small/tiny data source for profiling purposes?
This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26. Here are the stats from the working program:
392,908,177,040 bytes allocated in the heap
Ouch! If you keep on doing that, we Haskellers will be paged out of reality to make room for the heaps of GHC's executables!-) Claus

Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString. Having translated it from Clojure to Haskell to OCaml, I'm now debugging the logic and perhaps the conceptual data structures. Then better maps will be tried. Then a giant shootout will ensue, now that Haskell finishes! I'll post here when it's ready. -- Alexy

Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString.
It would be useful to have a really tiny data source - no more than 100 entries per Map should be sufficient to confirm or reject hunches about potential leaks by profiling. As it stands, my poor old laptop with a 32bit GHC won't be much use with your sample data, and now that the GHC bug is fixed, the size of the samples would only hide the interesting aspects (from a profiling perspective).
Having translated it from Clojure to Haskell to OCaml,
Translating quickly between strict-by-default and non-strict-by-default languages is always a warning sign: not only is it unlikely to make best use of each language's strengths, but typical patterns in one class of languages simply don't translate directly into the other.
I'm now debugging the logic and perhaps the conceptual data structures. Then better maps will be tried.
No matter what Maps you try, if they are strict in keys and non-strict in values, translating code from strict language needs careful inspection. Most of the higher-order functions in Maps have issues here (eg, repeated use of insertWith is going to build up unevaluated thunks, and so on). I'm not even sure how well binary fares with nested IntMaps (not to mention the occasional "too few bytes" error depending on strictness or package version - it would be useful to have a cabal file, or a README listing the versions of libraries you used). To binary package users/authors: is there a typed version of binary (that is, one that records and checks a representation of the serialized type before actual (de-)serialization)? It would be nice to have such a type check, even though it wouldn't protect against missing bytes or strictness changes.
Then a giant shootout will ensue, now that Haskell finishes! I'll post here when it's ready.
Just make sure Haskell isn't running with brakes on!-) Claus

claus.reinke:
To binary package users/authors: is there a typed version of binary (that is, one that records and checks a representation of the serialized type before actual (de-)serialization)? It would be nice to have such a type check, even though it wouldn't protect against missing bytes or strictness changes.
There is a wrapper, that wraps all encodes in a typeOf (iirC), I think by Edward Kmett, but I couldn't find e.g. binary-typed on Hackage. -- Don

On Mon, Jun 28, 2010 at 2:32 PM, Don Stewart
claus.reinke:
To binary package users/authors: is there a typed version of binary (that is, one that records and checks a representation of the serialized type before actual (de-)serialization)? It would be nice to have such a type check, even though it wouldn't protect against missing bytes or strictness changes.
There is a wrapper, that wraps all encodes in a typeOf (iirC), I think by Edward Kmett, but I couldn't find e.g. binary-typed on Hackage.
In Happstack.Data there's a wrapper type called Object which we serialize through which has a type-rep field and a ByteString field, and the deserialize instance function checks that the type-string filed matches "show (typeOf result)": http://hackage.haskell.org/packages/archive/happstack-data/0.5.0.2/doc/html/... A similar technique might be approrpiate for 'binary' or 'cereal'. Antoine

On Thu, 24 Jun 2010 19:25:35 -0700 (PDT), braver
Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString.
Having translated it from Clojure to Haskell to OCaml, I'm now debugging the logic and perhaps the conceptual data structures. Then better maps will be tried. Then a giant shootout will ensue, now that Haskell finishes! I'll post here when it's ready.
Is the OCaml version available somewhere ? -- Nicolas Pouillard http://nicolaspouillard.fr

On 24/06/2010 21:40, Claus Reinke wrote:
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion.
So this was about keeping the program largely unchanged in order to keep the GHC issue repeatable for tracking? Or have you also looked into removing space leaks in the code (there still seemed to be some left in the intern/cafe5 version, iirc)?
I haven't touched the code - I'll leave that to the expertise of haskell-cafe! :) It would be nice to have a boiled-down version of this for a benchmark though. Cheers, Simon

Simon -- amazing feat! Thanks for tracking it down. I'll now happily rely on the Haskell version if it is fast enough :). -- Alexy

Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...) -- Alexy

deliverable:
Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...)
It still lives in darcs. Nightly builds are here: http://www.haskell.org/ghc/dist/stable/dist/ You'll want to check with Simon that the patch got pushed, though, first. -- Don

On 25/06/2010 04:23, Don Stewart wrote:
deliverable:
Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...)
It still lives in darcs.
Nightly builds are here: http://www.haskell.org/ghc/dist/stable/dist/
You'll want to check with Simon that the patch got pushed, though, first.
The patch got pushed to HEAD yesterday http://darcs.haskell.org/cgi-bin/darcsweb.cgi?r=ghc;a=darcs_commitdiff;h=201... and was in last night's builds (24 Jun): http://www.haskell.org/ghc/dist/current/dist/ You could easily cherry-pick the patch into the stable branch, but we're not planning to do any more releases of 6.12 so that we can focus on 6.14. Cheers, Simon

So I wonder what the timings for Haskell, O'Caml and Clojure are now, given the patch to GHC. -- Jason Dusek Linux User #510144 | http://counter.li.org/

braver wrote:
On Jun 14, 11:40 am, Don Stewart
wrote: Oh, you'll want insertWith'.
You might also consider bytestring-trie for the Graph, and IntMap for the AdJList ?
Yeah, I saw jsonb using Trie and thought there's a reason for it. But it's very API-poor compared with Map, e.g. there's not even a fold -- should one toListBy first?
I find that surprising. Have you looked in Data.Trie.Convenience? The API of Data.Map is rather bloated so I've pushed most of it out of the main module in order to clean things up. There are only a small number of functions in the Data.Map interface I haven't had a chance to implement yet. For folding, the `foldMap`, `foldr`, and `foldl` functions are provided via the Data.Foldable interface. The Data.Traversable class is also implemented if you need to make changes to the trie along the way. These all give generic folding over the values stored in the trie. If you need access to the keys during folding you can use `foldrWithKey`, though it has to reconstruct the keys, which doesn't sound good for your use case. `toListBy` is a convenience wrapper around `foldrWithKey` which supports list fusion, so it has the same advantages and disadvantages compered to the Foldable/Traversable functions. If there's a particular function you still need, let me know and I can add an implementation for it. In terms of optimizing your code, one thing you'll surely want to do is to construct an intern table (Trie Int, IntMap ByteString) so that you only have to deal with Ints internally rather than ByteStrings. I haven't looked at your code yet to see how this would fit in, but it's almost always a requisite trick for handling large text corpora. -- Live well, ~wren

Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true? I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine and it simplifies all kinds of exploratory data mining and debugging to keep it as a String, so I'll try to get the most mileage from other things before interning. What's the exact relationship between Trie and Map and their respective performance? -- Alexy

deliverable:
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true?
I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine and it simplifies all kinds of exploratory data mining and debugging to keep it as a String, so I'll try to get the most mileage from other things before interning.
bytestring seems appropriate.
What's the exact relationship between Trie and Map and their respective performance?
Tries specialized to bytestring keys should outperform the generic Map. -- Don

On Tuesday 15 June 2010 23:26:10, Don Stewart wrote:
deliverable:
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true?
I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine and it simplifies all kinds of exploratory data mining and debugging to keep it as a String, so I'll try to get the most mileage from other things before interning.
bytestring seems appropriate.
What's the exact relationship between Trie and Map and their respective performance?
Tries specialized to bytestring keys should outperform the generic Map.
That would be desirable. I've done some profiling with the sample data, and found that - if we subtract the times for loading and saving the graphs - some 35-40% of the time is spent looking up ByteStrings in Maps. That's far too much for my liking. I'm not sure whether the lookup for e.g. an Int key would be much faster, but I suspect it would be. I've also fiddled a bit with the strictness and removed a bit of unnecessary work, reduced the heap usage by ~20%, MUT times by ~15% and GC times by ~50% (all for the tests on my box with a measly 1GB RAM). It's still a far cry from a racehorse, but at least I can now run the sample data for the entire 35 days without having my box thrashing madly :) The result of my endeavours is attached. Cheers, Daniel

braver wrote:
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true?
That was probably me saying that I had worked on some more efficient implementations than those currently in use. Alas, the more efficient ones seem to alter the order of enumerating the values, and I haven't had a chance to fix that (so they're not being used currently). As for general efficiency, I haven't run any benchmarks for folds vs other datastructures so I can only speculate. If you're using foldl or foldr instead of foldrWithKey[1] then the performance should be comparable to Data.Map. For both data structures, all they do is traverse the spine and enumerate the leaves, so the indexing data on the spine nodes is irrelevant.
What's the exact relationship between Trie and Map and their respective performance?
Data.Trie is a big-endian patricia tree with node fusion, which is the same datastructure as Data.IntMap. The only difference is that IntMap can only take fixed-length keys (namely Ints) whereas Trie can take variable length keys (i.e., any string of bits). On the other hand, Data.Map is a form of balanced tree, which is rather different. The API documentation gives links to the relevant papers. The API documentation also gives their asymptotic performance. I don't have all the numbers for Trie yet because my sources didn't give them and I haven't had the chance to prove them myself, but they will be similar to those for IntMap. In general, because it is specialized for the key type, Trie will be more efficient than (Map Bytestring) just as IntMap is more efficient than (Map Int). The exact performance characteristics can vary depending on the specific program, however. Because patricia trees and balanced trees are such different structures, they excel for different usage patterns. For example, if you wanted to get (or enumerate) the submap containing only the keys beginning with "foo", tries will be much more efficient than the alternatives. However, if you need to use functions like foldrWithKey often, then tries will be at a disadvantage since they must reconstruct the keys.[1] As I recall, tries are also at an advantage if you often merge/meld maps; though I don't recall the algorithm Data.Map uses to be sure of that. Similarly, since Map, IntMap, and Trie are persistent data structures, if you're not using that persistence then you may get better performance out of an ephemeral data structure like hash tables. Though you should first check to make sure that IntMap + interning is insufficient, since a big part of hash table's performance characteristics comes from the interning implicit in the hash function. Choosing the right tool depends on how you want to use it. [1] If you do need to use foldrWithKey often, an easy way to improve performance is to use a Trie (ByteString,Foo) and store the complete key alongside the value. This way you can use the faster foldl/foldr and still have access to the key. This will increase memory consumption, of course, though Trie does try to maximize sharing of key segments, so it won't cost as much as initially expected. -- Live well, ~wren

On Tue, Jun 15, 2010 at 11:24 PM, braver
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true?
I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine and it simplifies all kinds of exploratory data mining and debugging to keep it as a String, so I'll try to get the most mileage from other things before interning.
What's the exact relationship between Trie and Map and their respective performance?
The new "The Performance of Haskell containers package" paper compares the performance of, among other things, Maps holding Strings/ByteString. It also improves the performance of many operations on these. I think it's very relevant to your work. http://fox.ucw.cz/papers/containers/containers.pdf Johan

On Jun 24, 5:07 am, Johan Tibell
The new "The Performance of Haskell containers package" paper compares the performance of, among other things, Maps holding Strings/ByteString. It also improves the performance of many operations on these. I think it's very relevant to your work.
Thanks -- will take a look. I've also translated the program into OCaml, which needed some small tweaks to runtime options as well and now finishes in 14 minutes. While investigating, found this comparison of functional and imperative map implementations by Mauricio Fernandez, the awesomest OCaml thinker: http://eigenclass.org/R2/writings/finite-map-benchmarks The functional ones are applicable to Haskell directly. My result is with the standard imperative Hashtbl. Will be fun to compare to dons' judy binding... -- Alexy

Op 14-06-10 07:00, braver schreef:
I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. First I wrote the simulation of the 35 days of data in Clojure and then translated it into Haskell with great help from the glorious #haskell folks. I had to add -A5G -K5G to make it work. It does 10 days OK hovering at 57 GB of RAM; I include profiling of that in sc10days.prof.
At first the Haskell executable goes faster than Clojure, not by an order of magnitude, but by 2-3 times per day simulated. (Clojure also fits well in its 32 GB JVM with compressed references.) However, Haskell gets stuck after a while, and for good. Clearly I'm not doing Haskell optimally here, and would appreciate optimization advice. Here's the code:
The data and problem description is in
http://github.com/alexy/husky/blob/master/Haskell-vs-Clojure-Twitter.md
-- also referred from the main README.md.
The main is in sc.hs, and the algorithm is in SocRun.hs. The original Clojure is in socrun.clj. This is a continuation of active Twitter research and the results will be published, and I'd really like to make Haskell work at this scale and beyond! The seq's sprinkled already did no good. I ran under ghc 6.10 with -O2 with or without - fvia-C, with no difference in stallling, and am working to bring 6.12 to bear now.
Cheers, Alexy _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
If you just want to optimize it and not compare exactly equal idiomatic code, you should stop using functional data structures and use a structure that fits your problem (the ST monad has been designed for that in Haskell), because compilers do not detect single-threaded usage and rewrite all your code to something mutable and thereby avoid O(log n) costs. In practice it is probably easier to write the whole thing against the parallel Boost Graph Library (a C++ library), since that library provides the abstractions that you would want. If you go this path, it will probably end up being 10-100 times faster. -- Best Regards, Ron de Bruijn,

If you just want to optimize it and not compare exactly equal idiomatic code, you should stop using functional data structures and use a structure that fits your problem (the ST monad has been designed for that in Haskell), because compilers do not detect single-threaded usage and rewrite all your code to something mutable and thereby avoid O(log n) costs.
In practice it is probably easier to write the whole thing against the parallel Boost Graph Library (a C++ library), since that library provides the abstractions that you would want. If you go this path, it will probably end up being 10-100 times faster.
Surely I can rewrite it in C++ or just C and gain speedup, but not maintainability and ease of prototyping algorithms -- the main reasons I switched to FP, with added reliability, conciseness, reasonable syntax bonuses. It turned out Clojure is quite capable of doing it well, but has Null Pointer Exceptions and "debug data shape by running" annoying to an ML programmer; trying Haskell for speedup is worth it! It proves less predictable than expected -- e.g. with OCaml if it typechecks and runs, it usually runs fast. The point of this exercise is to identify typical performance bottlenecks in the domain and see how they can be avoided. The jury's still out on this one. -- Alexy

deliverable:
If you just want to optimize it and not compare exactly equal idiomatic code, you should stop using functional data structures and use a structure that fits your problem (the ST monad has been designed for that in Haskell), because compilers do not detect single-threaded usage and rewrite all your code to something mutable and thereby avoid O(log n) costs.
In practice it is probably easier to write the whole thing against the parallel Boost Graph Library (a C++ library), since that library provides the abstractions that you would want. If you go this path, it will probably end up being 10-100 times faster.
Surely I can rewrite it in C++ or just C and gain speedup, but not maintainability and ease of prototyping algorithms -- the main reasons I switched to FP, with added reliability, conciseness, reasonable syntax bonuses. It turned out Clojure is quite capable of doing it well, but has Null Pointer Exceptions and "debug data shape by running" annoying to an ML programmer; trying Haskell for speedup is worth it! It proves less predictable than expected -- e.g. with OCaml if it typechecks and runs, it usually runs fast. The point of this exercise is to identify typical performance bottlenecks in the domain and see how they can be avoided. The jury's still out on this one.
Following the algorithm for performance tuning in the previous email should be enough. Have you obtained heap profiles yet? -- Don
participants (15)
-
Antoine Latter
-
braver
-
Claus Reinke
-
Daniel Fischer
-
Don Stewart
-
Duncan Coutts
-
Ivan Miljenovic
-
Jason Dusek
-
Johan Tibell
-
John Van Enk
-
Ketil Malde
-
Nicolas Pouillard
-
Ron de Bruijn
-
Simon Marlow
-
wren ng thornton