
Hello everyone- I'm working on some project Euler problems today, and I'm stuck on one. It's not the problem itself that's the problem, it's that finding the maximum of a list makes me run out of heap space! http://projecteuler.net/index.php?section=problems&id=14 my code: import Data.List f :: Int -> Int -> Int f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1) answer = (foldl' max 0) $ map (f 1) [1 .. 999999] I tried using 'foldl' max ' instead of 'maximum' because I thought that foldl' was supposed to work better than foldl or something...I could be confused on that point. Anyway, here's what I get... / _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.4, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help. Loading package base-1.0 ... linking ... done. Prelude> :l ..\test.hs Compiling Main ( ..\test.hs, interpreted ) Ok, modules loaded: Main. *Main> answer GHC's heap exhausted: current limit is 268435456 bytes; Use the `-M<size>' option to increase the total heap size. I also tried writing a simple 'max' function that I thought was tail recursive, and that that would help. Same error. I'm finding it hard to believe that finding a maximum of a list of a million small integers would cause this kind of overflow...is it really that big of a problem? Any help would be appreciated. I have a feeling I'll run into this problem again in the future.

Steve Klabnik wrote:
Hello everyone-
I'm working on some project Euler problems today, and I'm stuck on one. It's not the problem itself that's the problem, it's that finding the maximum of a list makes me run out of heap space!
http://projecteuler.net/index.php?section=problems&id=14 http://projecteuler.net/index.php?section=problems&id=14
my code:
import Data.List
f :: Int -> Int -> Int f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1)
answer = (foldl' max 0) $ map (f 1) [1 .. 999999]
I tried using 'foldl' max ' instead of 'maximum' because I thought that foldl' was supposed to work better than foldl or something...I could be confused on that point. Anyway, here's what I get...
/ _ \ /\ /\/ __(_) / /_\// /_/ / / | | GHC Interactive, version 6.4, for Haskell 98. / /_\\/ __ / /___| | http://www.haskell.org/ghc/ \____/\/ /_/\____/|_| Type :? for help.
Loading package base-1.0 ... linking ... done. Prelude> :l ..\test.hs Compiling Main ( ..\test.hs, interpreted ) Ok, modules loaded: Main. *Main> answer GHC's heap exhausted: current limit is 268435456 bytes; Use the `-M<size>' option to increase the total heap size.
I also tried writing a simple 'max' function that I thought was tail recursive, and that that would help. Same error. I'm finding it hard to believe that finding a maximum of a list of a million small integers would cause this kind of overflow...is it really that big of a problem?
Hello Steve, First, is there a reason to use such an old GHC? Newer versions produce much better code. That said, certainly 6.4 is sufficient for this problem. foldl' is the right choice here, it is strict in the accumulating parameter and that is what you want. The problem is the two parameters in f. When you make the recursive call 'f (acc + 1) ...', a thunk is being created to hold that calculation (acc + 1). It isn't being evaluated right away, so the thunk remains. On the next iteration, acc is incremented again. You are effectively building up the expression 'acc + 1 + 1 + 1 + 1 ...', perhaps quite deeply. This takes up heap space; apparently too much space. Similarly, the (x `div` 2) and (3 * x + 1) calculations are being suspended as thunks, though x gets forced on the next call, since it is tested by the guards. Adding the pragma {-# LANGUAGE BangPatterns #-} at the top of the file, you could then annotate acc: f !acc x this seems to run in constant space for me. Note that there is a much more concise way to write f, though I doubt it would be much faster. Consider 'iterate' and 'takeWhile'. Finally, compiling to a binary with ghc -O2 will yield much better speed than ghci. And GHC 6.8.3 would provide a further 20%+ speed boost, if installing it is an option on your environment. Hope this helps, Braden Shepherdson shepheb

Steve Klabnik wrote:
Hello everyone-
I'm working on some project Euler problems today, and I'm stuck on one. It's not the problem itself that's the problem, it's that finding the maximum of a list makes me run out of heap space!
http://projecteuler.net/index.php?section=problems&id=14 http://projecteuler.net/index.php?section=problems&id=14
my code:
import Data.List
f :: Int -> Int -> Int f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1)
answer = (foldl' max 0) $ map (f 1) [1 .. 999999]
I tried using 'foldl' max ' instead of 'maximum' because I thought that foldl' was supposed to work better than foldl or something...I could be confused on that point. Anyway, here's what I get...
I've had the same problem when solving that particular puzzle. The code that solves the puzzle brute-force is following: f x | even x = x `div` 2 | otherwise = 3*x + 1 col n = (takeWhile (/=1) $ iterate f n) ++ [1] ls = [(length $ col n, n) | n <- [1..1000000]] Now, with maximumBy (\(a,_) (b,_) -> compare a b) ls I run out of stack space. If I'm not mistaken, maxmimumBy uses foldl, so maybe it blows up in the number of comparisons it stacks up (rather than reduce these immediately). Using foldl' or a custom made function (I used one so I could print intermediate results): m' x [] = x m' (l,e) ((l',e'):xs) | l > l' = m' (l,e) xs | otherwise = trace ("max: " ++ show (l',e')) (m' (l',e') xs) the problem can be solved. It does take a little while (over two minutes). For all Euler puzzles I've solved so far I've always tried to minimize "run time" + "development time", not just the run time. The solution you gave (using foldl') works on my PC (GHC(i) 6.8.2 (Ubuntu package), Linux, 2GB of RAM). That is: it doesn't run out of stack space. Unfortunately I wasn't able to get an answer out of it either before I had to kill the process. I'm not sure which function is so greedy on memory. Maybe you could try making the arguments for f strict. The thing could probably be heavily optimized (like many Euler puzzles) by adding state and keeping intermediate results (for f) in an array. Niels

Steve Klabnik wrote:
f :: Int -> Int -> Int f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1) One more remark: you're using Ints. This type is limited to [-2^31..2^31]. Maybe you should use Integers, or only specify that the in- and output should be numeric, or perhaps even omit the type annotation altogether.
Try running: f 1 113383 (with Int and Integer/no type annotation) Niels

I tried to solve the same problem and came out with a O(log n) solution. It is pretty quick, since I keep track of the steps on a map structure. Even though it runs really fast, it hogs on memory just like Steve's version. Following is the full source, for your consideration: ======================Euler14.hs====================== import qualified Data.Map as Map import qualified Data.List as List type Table=Map.Map Integer Integer rank::Table->Integer->(Table,Integer) rank s n= (s',r) where nxt=if even n then (n `div` 2) else (3*n+1) r=case (Map.lookup n s) of Just a -> a Nothing-> (1 + (snd $ rank s nxt)) s'=Map.insert n r s search::Integer->Integer search n = case List.findIndex (\a -> a==ms) sw of Just a -> toInteger(a) +1 Nothing -> -1 where s=Map.singleton 1 1 sw=searchWork s [1..n] ms=maximum sw searchWork::Table->[Integer]->[Integer] searchWork s []=[] searchWork s (i:is)= r:(searchWork s' is) where (s',r)=rank s i main = do print $ search 1000000 ======================Euler14.hs======================

First of all, thanks for pointing out the old ghci. I had installed it on a lab computer by just googling 'ghci windows' and that was what came up. My home machine has the newest one, and is also considerably beefier hardware-wise. I think you're onto something with the Int/Integer thing....when using a type signature of "Integer", "f 1 113383" gives "248" immediately. Compiling and running the code with "Integer" types on my home machine yields "525".... which Euler says isn't the right answer? Also, on the strictness annotations: Do you put them in the type declaration? Or in the pattern match on the lhs of the declaration?

Am Montag, 28. Juli 2008 06:01 schrieb Steve Klabnik:
First of all, thanks for pointing out the old ghci. I had installed it on a lab computer by just googling 'ghci windows' and that was what came up. My home machine has the newest one, and is also considerably beefier hardware-wise.
I think you're onto something with the Int/Integer thing....when using a type signature of "Integer", "f 1 113383" gives "248" immediately. Compiling and running the code with "Integer" types on my home machine yields "525".... which Euler says isn't the right answer?
You might want to re-read the question :)
Also, on the strictness annotations: Do you put them in the type declaration? Or in the pattern match on the lhs of the declaration?
You want the strictness annotations in the pattern match. Strictness annotations can also appear in data declarations/definitions, but AFAIK not in type signatures. Cheers, Daniel

I think you're onto something with the Int/Integer thing....when using a type signature of "Integer", "f 1 113383" gives "248" immediately. Compiling and running the code with "Integer" types on my home machine yields "525".... which Euler says isn't the right answer? Because you're not providing the right number. :-) Also, on the strictness annotations: Do you put them in the type declaration? Or in the pattern match on the lhs of the declaration? I've tested your code with strictness annotations and it appears to not make a difference. GHC employs several optimization techniques, one of
Steve Klabnik wrote: those being strictness analysis, so maybe it is already using a strict, unboxed integer. The real speed-up (a non-linear one) here is not to re-calculate every sequence over and over again, but keep it in a map/array (as suggested by Rafael and me). I've found some Euler puzzles are impossible to solve without this technique. Niels P.S.: If you're really going for speed, compile (not interpret) the code (using -O -fvia-C, and there's some more stuff in the manual) using the latest greatest version of GHC.

Am Montag, 28. Juli 2008 18:15 schrieb Niels Aan de Brugh:
Steve Klabnik wrote:
I think you're onto something with the Int/Integer thing....when using a type signature of "Integer", "f 1 113383" gives "248" immediately. Compiling and running the code with "Integer" types on my home machine yields "525".... which Euler says isn't the right answer?
Because you're not providing the right number. :-)
Also, on the strictness annotations: Do you put them in the type declaration? Or in the pattern match on the lhs of the declaration?
I've tested your code with strictness annotations and it appears to not make a difference. GHC employs several optimization techniques, one of those being strictness analysis, so maybe it is already using a strict, unboxed integer.
Using -O2, GHC produces the same core with or without strictness annotations, the 'acc' parameter of f is an unboxed Int, so the optimiser indeed sees it is needed. The Integer parameter unfortunately can't be unboxed. A native 64-bit integer type might gain something.
The real speed-up (a non-linear one) here is not to re-calculate every sequence over and over again, but keep it in a map/array (as suggested by Rafael and me).
Using a Map, though easier to code, doesn't buy you much here, indeed less than a few easy considerations where the maximal length cannot occur. The problem is that you have to update the Map often, which is rather costly, also the Map takes a lot of memory. Using an array is more convoluted, but much faster (if done right).
I've found some Euler puzzles are impossible to solve without this technique.
At least, it would take far longer to get the solution.
Niels
P.S.: If you're really going for speed, compile (not interpret) the code (using -O -fvia-C, and there's some more stuff in the manual) using the latest greatest version of GHC.
Nowadays, the native code generator is not necessarily slower than -fvia-C, often faster. So the best should be to test both, -O2 and -O2 -fvia-C -optc-On, for some medium sized inputs and see which actually is faster. Cheers, Daniel

Daniel Fischer wrote:
Using -O2, GHC produces the same core with or without strictness annotations, the 'acc' parameter of f is an unboxed Int, so the optimiser indeed sees it is needed. The Integer parameter unfortunately can't be unboxed. A native 64-bit integer type might gain something.
Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64 will never overflow. :-) I was wondering how the Integer type is implemented. If I were to implement such a type my first approach would be to keep a list of digits (and perform calculation like I do by hand). But operations on Integers are very fast, even on very big numbers, so I guess that's not the approach taken by the GHC developers. (For example, calculating the faculties of [1..1000] takes a little under 300ms to perform on my Intel PC; if I don't pipe to /dev/null it takes about 10s because of all the digits being printed.) Having a type like Integer on board makes many of the puzzles on Project Euler a shoo-in. One of the questions is indeed to calculate the last n digits of 1000!.
Using a Map, though easier to code, doesn't buy you much here, indeed less than a few easy considerations where the maximal length cannot occur. The problem is that you have to update the Map often, which is rather costly, also the Map takes a lot of memory. Using an array is more convoluted, but much faster (if done right).
The problem with an array is finding a good upper limit. One could of course copy the whole array if a new upper bound is found, the costs would amortize over the whole run. Having a really big and sparse array is no good either (no/bad locality of reference). Oh well, the puzzle itself definitely isn't worth the effort, but maybe some custom made heap structure would perform best.
At least, it would take far longer to get the solution.
Yes, but sometimes the difference is several days vs. a couple of minutes.
Nowadays, the native code generator is not necessarily slower than -fvia-C, often faster. So the best should be to test both, -O2 and -O2 -fvia-C -optc-On, for some medium sized inputs and see which actually is faster.
Interesting. In which release has the back-end been reworked? Does it work equally well for architectures other than Intel? Niels

Am Montag, 28. Juli 2008 23:35 schrieb Niels Aan de Brugh:
Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64 will never overflow. :-)
Fairly small risk. You can't be sure beforehand, though.
I was wondering how the Integer type is implemented.
Two constructors, using native machine integers and their arithmetics when possible because it's way faster, using GMP for larger numbers. There's work going on to replace GMP, but I don't know the state. -- | Arbitrary-precision integers. data Integer = S# Int# -- small integers #ifndef ILX | J# Int# ByteArray# -- large integers #else | J# Void BigInteger -- .NET big ints foreign type dotnet "BigInteger" BigInteger #endif
If I were to implement such a type my first approach would be to keep a list of digits (and perform calculation like I do by hand).
I'd do something along those lines, too - of course I wouldn't use decimal digits, rather use base 2^16 (easy to control overflow then). But if I had to do it in earnest, I'd research some algorithms first.
Having a type like Integer on board makes many of the puzzles on Project Euler a shoo-in. One of the questions is indeed to calculate the last n digits of 1000!.
Don't you mean the sum of the digits of 100! ? The idea behind those questions was to have people write their own BigInt library, from the days when they weren't so widespread. Nowadays everything fits into 64 bits (overflow prevention may be necessary, though).
Using a Map, though easier to code, doesn't buy you much here, indeed less than a few easy considerations where the maximal length cannot occur. The problem is that you have to update the Map often, which is rather costly, also the Map takes a lot of memory. Using an array is more convoluted, but much faster (if done right).
The problem with an array is finding a good upper limit.
1000000 is a good upper limit. What makes the code convoluted is how you handle the values above :)
One could of course copy the whole array if a new upper bound is found, the costs would amortize over the whole run. Having a really big and sparse array is no good either (no/bad locality of reference).
Oh well, the puzzle itself definitely isn't worth the effort, but maybe some custom made heap structure would perform best.
At least, it would take far longer to get the solution.
Yes, but sometimes the difference is several days vs. a couple of minutes.
Or even years vs. seconds.
Nowadays, the native code generator is not necessarily slower than -fvia-C, often faster. So the best should be to test both, -O2 and -O2 -fvia-C -optc-On, for some medium sized inputs and see which actually is faster.
Interesting. In which release has the back-end been reworked?
I think it was the 6.8 branch. I can't remember which, but I had programmes that performed better when compiled via C, others with the native code generator, though I never found big differences. But when going via C, don't forget -optc-On (I habitually use n = 3), otherwise gcc won't do much with the generated C. Generally everything got faster with the 6.8 branch, let's see what 6.10 does.
Does it work equally well for architectures other than Intel?
No Idea, you might ask on glasgow-haskell-users, somebody there should know.
Niels
Cheers, Daniel

On Mon, Jul 28, 2008 at 12:15 PM, Niels Aan de Brugh
Because you're not providing the right number. :-)
You know, I realized that this morning after I woke up....ha ha. Sometimes a good night's sleep is all it takes...
I've tested your code with strictness annotations and it appears to not make a difference. GHC employs several optimization techniques, one of those being strictness analysis, so maybe it is already using a strict, unboxed integer.
Ah. I figured that I probably didn't have to worry about it manually...
The real speed-up (a non-linear one) here is not to re-calculate every sequence over and over again, but keep it in a map/array (as suggested by Rafael and me). I've found some Euler puzzles are impossible to solve without this technique.
Niels
P.S.: If you're really going for speed, compile (not interpret) the code (using -O -fvia-C, and there's some more stuff in the manual) using the latest greatest version of GHC.
I'll keep this in mind. Right now I'm going for just doing it, but I'll heed your advice for future problems. Finally, what I ended up doing: f :: Integer -> Integer -> Integer f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1) g :: Integer -> (Integer, Integer) g x = (f 1 x, x) answer = (foldl' max (0,0)) $ map g [1 .. 999999] main = putStrLn( show answer) Again, not the most efficient, but pretty clean. Runs in just under a minute on my machine. Thanks again!

Since this is the "beginners" list, could someone explain me why using g
made everything run like the wind, with almost no memory?
I am puzzled! :-(
On Mon, Jul 28, 2008 at 17:32, Steve Klabnik
Finally, what I ended up doing:
f :: Integer -> Integer -> Integer f acc x | x == 1 = acc | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1)
g :: Integer -> (Integer, Integer) g x = (f 1 x, x)
answer = (foldl' max (0,0)) $ map g [1 .. 999999]
main = putStrLn( show answer)
-- Rafael Gustavo da Cunha Pereira Pinto Electronic Engineer, MSc.

Am Dienstag, 29. Juli 2008 01:53 schrieb Rafael Gustavo da Cunha Pereira Pinto:
Since this is the "beginners" list, could someone explain me why using g made everything run like the wind, with almost no memory?
I'd rate "run like the wind" as an exaggeration since the completely unoptimised C version runs in 1.6 seconds vs. 54 seconds for the code below (49 seconds if using Int for the chain length) on my box. And the "g" isn't important for speed or memory, just for getting the correct result. Two factors make it run in constant memory without stack overflow: 1. strictness 2. laziness. The strictness of foldl' is the reason that every g k is evaluated without delay and can immediately be thrown away (unless it's the maximum so far, in which case the previous maximum can be discarded) and the laziness of map and enumFromTo (remember that [a .. b] is syntactic sugar for enumFromTo a b) allows the list to be consumed as it is generated. Let us trace a few evaluation steps of answer: answer ---> foldl' max (0,0) $ map g (enumFromTo 1 999999) -- First, we must see whether the list is empty or not, -- 1 <= 999999, so the list is not empty ---> foldl' max (0,0) $ map g (1 : enumFromTo (1+1) 999999) ---> foldl' max (0,0) $ g 1 : map g (enumFromTo (1+1) 999999) ---> foldl' max (max (0,0) (g 1)) $ map g (enumFromTo (1+1) 999999) -- Now, due to the strictness of foldl', max (0,0) (g 1) has to be evaluated -- far enough to know if it is _|_ or not. For that, we obviously have to -- compare (0,0) and g 1, hence we must evaluate g 1 far enough to see if -- (0,0) <= g 1. Now g 1 = (f 1 1, 1) and we must check whether 0 <= f 1 1. -- However, f 1 1 = 1, so yes, and max (0,0) (g 1) turns out to be (1,1) ---> foldl' max (1,1) $ map g (enumFromTo (1+1) 999999) -- Now to decide if the list is empty, (1+1) must be compared to 999999, -- for that it must be evaluated, so ---> foldl' max (1,1) $ map g (2 : enumFromTo (2+1) 999999) ---> foldl' max (1,1) $ g 2 : map g (enumFromTo (2+1) 999999) ---> foldl' max (max (1,1) (g 2)) $ map g (enumFromTo (2+1) 999999) -- Is max (1,1) (g 2) _|_ or not? To find out evaluate g 2 = (2,2) ---> foldl' max (2,2) $ map g (enumFromTo (2+1) 999999) ---> foldl' max (2,2) $ map g (3 : enumFromTo (3+1) 999999) ---> foldl' max (2,2) $ g 3 : map g (enumFromTo (3+1) 999999) ---> foldl' max (max (2,2) (g 3)) $ map g (enumFromTo (3+1) 999999) ---> foldl' max (8,3) $ map g (enumFromTo (3+1) 999999) ... ---> foldl' max (max (8,3) (g 4)) $ map g (enumFromTo (4+1) 999999) ---> foldl' max (8,3) $ map g (enumFromTo (4+1) 999999) and so on. Hardly any memory needed. But you have more than 100,000,000 calls to f. Now if you store previous results in a Map, you save many of these calls (to rank in your code), but you pay for it with memory usage (and a couple of million expensive insertions into the Map). For every entry in the Map, apart from the space for key and value, you also have an Int holding the size of the subtree and pointers to the two subtrees of the node, on a 32-bit box that's a theoretical minimal overhead of 12 bytes per entry, probably rather 20 in reality, so a Map Integer Integer with one million entries would use some 30 MB memory. Hope that helps, Daniel
I am puzzled! :-(
On Mon, Jul 28, 2008 at 17:32, Steve Klabnik
wrote: Finally, what I ended up doing:
f :: Integer -> Integer -> Integer f acc x
| x == 1 = acc | | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1)
g :: Integer -> (Integer, Integer) g x = (f 1 x, x)
answer = (foldl' max (0,0)) $ map g [1 .. 999999]
main = putStrLn( show answer)

Thanks Daniel!
I only asked, because I was comparing the versions:
1) Steve's first version, without g x took forever to run and ate all the
stack.
2) Mine (with Map) ran somewhat fast, but it takes almost 2 minutes running
and eats 48MB of stack
3) Steve's last version (with g x) took less than 30s on my machine and
needed no more than 8MB of stack.
If I understood it right, using (g x) on foldl' forced the strictness of the
thunk (acc+1), without the need of using the bang pattern.
I'll try to play with the strictness on my code, trying to take advantage of
both the Map and strictness.
Rafael
On Mon, Jul 28, 2008 at 23:15, Daniel Fischer
Am Dienstag, 29. Juli 2008 01:53 schrieb Rafael Gustavo da Cunha Pereira Pinto:
Since this is the "beginners" list, could someone explain me why using g made everything run like the wind, with almost no memory?
I'd rate "run like the wind" as an exaggeration since the completely unoptimised C version runs in 1.6 seconds vs. 54 seconds for the code below (49 seconds if using Int for the chain length) on my box. And the "g" isn't important for speed or memory, just for getting the correct result.
Two factors make it run in constant memory without stack overflow: 1. strictness 2. laziness.
The strictness of foldl' is the reason that every g k is evaluated without delay and can immediately be thrown away (unless it's the maximum so far, in which case the previous maximum can be discarded) and the laziness of map and enumFromTo (remember that [a .. b] is syntactic sugar for enumFromTo a b) allows the list to be consumed as it is generated.
Let us trace a few evaluation steps of answer: answer ---> foldl' max (0,0) $ map g (enumFromTo 1 999999) -- First, we must see whether the list is empty or not, -- 1 <= 999999, so the list is not empty ---> foldl' max (0,0) $ map g (1 : enumFromTo (1+1) 999999) ---> foldl' max (0,0) $ g 1 : map g (enumFromTo (1+1) 999999) ---> foldl' max (max (0,0) (g 1)) $ map g (enumFromTo (1+1) 999999) -- Now, due to the strictness of foldl', max (0,0) (g 1) has to be evaluated -- far enough to know if it is _|_ or not. For that, we obviously have to -- compare (0,0) and g 1, hence we must evaluate g 1 far enough to see if -- (0,0) <= g 1. Now g 1 = (f 1 1, 1) and we must check whether 0 <= f 1 1. -- However, f 1 1 = 1, so yes, and max (0,0) (g 1) turns out to be (1,1) ---> foldl' max (1,1) $ map g (enumFromTo (1+1) 999999) -- Now to decide if the list is empty, (1+1) must be compared to 999999, -- for that it must be evaluated, so ---> foldl' max (1,1) $ map g (2 : enumFromTo (2+1) 999999) ---> foldl' max (1,1) $ g 2 : map g (enumFromTo (2+1) 999999) ---> foldl' max (max (1,1) (g 2)) $ map g (enumFromTo (2+1) 999999) -- Is max (1,1) (g 2) _|_ or not? To find out evaluate g 2 = (2,2) ---> foldl' max (2,2) $ map g (enumFromTo (2+1) 999999) ---> foldl' max (2,2) $ map g (3 : enumFromTo (3+1) 999999) ---> foldl' max (2,2) $ g 3 : map g (enumFromTo (3+1) 999999) ---> foldl' max (max (2,2) (g 3)) $ map g (enumFromTo (3+1) 999999) ---> foldl' max (8,3) $ map g (enumFromTo (3+1) 999999) ... ---> foldl' max (max (8,3) (g 4)) $ map g (enumFromTo (4+1) 999999) ---> foldl' max (8,3) $ map g (enumFromTo (4+1) 999999)
and so on. Hardly any memory needed. But you have more than 100,000,000 calls to f. Now if you store previous results in a Map, you save many of these calls (to rank in your code), but you pay for it with memory usage (and a couple of million expensive insertions into the Map). For every entry in the Map, apart from the space for key and value, you also have an Int holding the size of the subtree and pointers to the two subtrees of the node, on a 32-bit box that's a theoretical minimal overhead of 12 bytes per entry, probably rather 20 in reality, so a Map Integer Integer with one million entries would use some 30 MB memory.
Hope that helps, Daniel
I am puzzled! :-(
On Mon, Jul 28, 2008 at 17:32, Steve Klabnik
Finally, what I ended up doing:
f :: Integer -> Integer -> Integer f acc x
| x == 1 = acc | | even x = f (acc + 1) (x `div` 2) | otherwise = f (acc + 1) (3 * x + 1)
g :: Integer -> (Integer, Integer) g x = (f 1 x, x)
answer = (foldl' max (0,0)) $ map g [1 .. 999999]
main = putStrLn( show answer)
-- Rafael Gustavo da Cunha Pereira Pinto Electronic Engineer, MSc.

Am Dienstag, 29. Juli 2008 13:12 schrieb Rafael Gustavo da Cunha Pereira Pinto:
Thanks Daniel!
I only asked, because I was comparing the versions:
1) Steve's first version, without g x took forever to run and ate all the stack. 2) Mine (with Map) ran somewhat fast, but it takes almost 2 minutes running and eats 48MB of stack 3) Steve's last version (with g x) took less than 30s on my machine and needed no more than 8MB of stack.
Odd, I get quite different results. Steve's first, with type signature f :: Int -> Integer -> Int runs fine, although it takes 760 seconds and allocates *tons* when interpreted, takes 50 seconds when compiled (with -O2, that's my standard). I suspect you ran it with type signature f :: Int -> Int -> Int ? Then you hit a loop containing -68 for f 1 113383, no wonder it doesn't finish :-) Yours does far better here: ./rafa +RTS -sstderr 1,794,748,252 bytes allocated in the heap 110,383,616 bytes copied during GC (scavenged) 45,186,920 bytes copied during GC (not scavenged) 35,016,704 bytes maximum residency (7 sample(s)) 3424 collections in generation 0 ( 2.74s) 7 collections in generation 1 ( 1.26s) 68 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 22.79s ( 25.17s elapsed) GC time 4.00s ( 4.28s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 26.79s ( 29.45s elapsed) %GC time 14.9% (14.5% elapsed) Alloc rate 78,751,568 bytes per MUT second Productivity 85.1% of total user, 77.4% of total elapsed Though that difference may partly be due to different compiling options (-O2 for me). But I doubt it consumes much stack, the Map, which takes a lot of memory, should live on the heap. Steve's new version doesn't differ much from the first with respect to running time and memory usage, takes about 50 seconds here and runs happily within 1MB.
If I understood it right, using (g x) on foldl' forced the strictness of the thunk (acc+1), without the need of using the bang pattern.
No, the strictness of the acc parameter was found by the strictness analyser (at least with optimised compilation), neither foldl' nor g play a role here. If my reading of the core is correct, without optimisations, the strictness isn't inferred and I would've thought that is largely responsible for the approximately doubled running time and more than doubled allocation figures vs. -O2. Then each of the ((...(1+1)+1...)+1) thunks would only be evaluated when the strictness of foldl' forces the evaluation of max (a,b) (g k). Since these thunks contain some 100 nested additions on average, that should cost considerable time and allocation, but as none contains more than a few hundred additions, it'd be no problem for the stack. However, things apparently aren't so simple, because with -O0, a strictifying acc with a bang makes no difference :-( What blows the stack pretty fast is a gigantic thunk of max (max ... max (max (0,0) (g 1)) (g 2) ... (g 999999)) which is constructed when using foldl instead of foldl' and compiling without optimisations (with -O1 already, all necessary strictness is seen, and the compiler makes foldl behave like foldl' here). The essential thing here is to avoid that thunk of maxes. Best done by using foldl' and compiling with optimisations.
I'll try to play with the strictness on my code, trying to take advantage of both the Map and strictness.
Rafael
Cheers, Daniel
participants (5)
-
Braden Shepherdson
-
Daniel Fischer
-
Niels Aan de Brugh
-
Rafael Gustavo da Cunha Pereira Pinto
-
Steve Klabnik