
On 17/06/2010 06:23, braver wrote:
WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new "intern" branch at tag cafe4.
Still it can't do the full 35 days for all users. It comes close, however, to 30 days under ghc 6.12 with the IntMap -- just where 6.10 was with Map ByteString. Some profiling is in prof/ subdirectory, with the tag responsible and RTS profiling option in the file name; .prof are -P, and the rest are -hX.
When I downsize the sample data to 1 million users, the whole run, with -P profiling, is done in 7.5 minutes. Something happens when tripling that amount. For instance, making -A10G may cause sefgault, after a fast run up to 10 days, then seeming stalling, and a dump of days up to 28 before the segfault. -A5G comes closest, to 30 days, when coupled with -H1G. It's not clear to me how to work -A and -H together.
I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4.
An update on this: with the help of Alex I tracked down the problem (an integer overflow bug in GHC's memory allocator), and his program now runs to completion. This is the largest program (in terms of memory requirements) I've ever seen anyone run using GHC. In fact there was no machine in our building capable of running it, I had to fire up the largest Amazon EC2 instance available (68GB) to debug it - this bug cost me $26. Here are the stats from the working program: 392,908,177,040 bytes allocated in the heap 174,455,211,920 bytes copied during GC 24,151,940,568 bytes maximum residency (6 sample(s)) 36,857,590,520 bytes maximum slop 64029 MB total memory in use (1000 MB lost due to fragmentation) Generation 0: 62 collections, 0 parallel, 352.35s, 357.13s elapsed Generation 1: 6 collections, 0 parallel, 180.63s, 209.19s elapsed INIT time 0.00s ( 0.11s elapsed) MUT time 1201.47s (1294.29s elapsed) GC time 532.98s (566.33s elapsed) EXIT time 0.00s ( 5.34s elapsed) Total time 1734.46s (1860.74s elapsed) %GC time 30.7% (30.4% elapsed) Alloc rate 327,020,156 bytes per MUT second Productivity 69.3% of total user, 64.6% of total elapsed The slop calculation is off a bit, because slop for pinned objects (ByteStrings) isn't being calculated properly, I should really fix that. Cheers, Simon