
Am Dienstag, 29. Juli 2008 13:12 schrieb Rafael Gustavo da Cunha Pereira Pinto:
Thanks Daniel!
I only asked, because I was comparing the versions:
1) Steve's first version, without g x took forever to run and ate all the stack. 2) Mine (with Map) ran somewhat fast, but it takes almost 2 minutes running and eats 48MB of stack 3) Steve's last version (with g x) took less than 30s on my machine and needed no more than 8MB of stack.
Odd, I get quite different results. Steve's first, with type signature f :: Int -> Integer -> Int runs fine, although it takes 760 seconds and allocates *tons* when interpreted, takes 50 seconds when compiled (with -O2, that's my standard). I suspect you ran it with type signature f :: Int -> Int -> Int ? Then you hit a loop containing -68 for f 1 113383, no wonder it doesn't finish :-) Yours does far better here: ./rafa +RTS -sstderr 1,794,748,252 bytes allocated in the heap 110,383,616 bytes copied during GC (scavenged) 45,186,920 bytes copied during GC (not scavenged) 35,016,704 bytes maximum residency (7 sample(s)) 3424 collections in generation 0 ( 2.74s) 7 collections in generation 1 ( 1.26s) 68 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 22.79s ( 25.17s elapsed) GC time 4.00s ( 4.28s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 26.79s ( 29.45s elapsed) %GC time 14.9% (14.5% elapsed) Alloc rate 78,751,568 bytes per MUT second Productivity 85.1% of total user, 77.4% of total elapsed Though that difference may partly be due to different compiling options (-O2 for me). But I doubt it consumes much stack, the Map, which takes a lot of memory, should live on the heap. Steve's new version doesn't differ much from the first with respect to running time and memory usage, takes about 50 seconds here and runs happily within 1MB.
If I understood it right, using (g x) on foldl' forced the strictness of the thunk (acc+1), without the need of using the bang pattern.
No, the strictness of the acc parameter was found by the strictness analyser (at least with optimised compilation), neither foldl' nor g play a role here. If my reading of the core is correct, without optimisations, the strictness isn't inferred and I would've thought that is largely responsible for the approximately doubled running time and more than doubled allocation figures vs. -O2. Then each of the ((...(1+1)+1...)+1) thunks would only be evaluated when the strictness of foldl' forces the evaluation of max (a,b) (g k). Since these thunks contain some 100 nested additions on average, that should cost considerable time and allocation, but as none contains more than a few hundred additions, it'd be no problem for the stack. However, things apparently aren't so simple, because with -O0, a strictifying acc with a bang makes no difference :-( What blows the stack pretty fast is a gigantic thunk of max (max ... max (max (0,0) (g 1)) (g 2) ... (g 999999)) which is constructed when using foldl instead of foldl' and compiling without optimisations (with -O1 already, all necessary strictness is seen, and the compiler makes foldl behave like foldl' here). The essential thing here is to avoid that thunk of maxes. Best done by using foldl' and compiling with optimisations.
I'll try to play with the strictness on my code, trying to take advantage of both the Map and strictness.
Rafael
Cheers, Daniel