[Newbie] Problem with Data.Map (or something else ?)

Dears Haskellers, As an Haskell newbie, I'm learning Haskell by trying to resolve Euler Project problems (http://projecteuler.net/ ). I'm hanging on problem 14 (Collatz problem). I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ? Here's the code : import qualified Data.List as List import qualified Data.Map as Map f n | even n = n `div` 2 | otherwise = 3 * n + 1 chain m n = let chain' cn cm | Map.member cn m = Map.map (+ (m Map.! cn)) cm | otherwise = chain' (f cn) $! Map.insert cn 1 (Map.map (+1) cm) in chain' n Map.empty chains n = List.foldl' (\m i -> Map.union m (chain m i)) (Map.singleton 1 1) [2..n] maxCollatz c1@(_,l1) c2@(_,l2) | l1 < l2 = c2 | otherwise = c1 maxChain = List.foldl' maxCollatz (0,0) . Map.toList . chains main = let n = 1000000 in putStrLn $ show $ maxChain n Hope someone can help me, I really don't see what is th problem... Best regards, Bruno.

Hello Bruno, Monday, March 31, 2008, 7:51:43 PM, you wrote:
I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ?
first step is to reduce n and see whether program will finish and how memreqs depends on value of n
main = let n = 1000000
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

The program ends for values up to 400000 :
*Main> :set +s
*Main> maxChain 1000
(871,179)
(0.09 secs, 3697648 bytes)
*Main> maxChain 10000
(6171,262)
(0.73 secs, 31560008 bytes)
*Main> maxChain 100000
(77031,351)
(9.31 secs, 347122064 bytes)
*Main> maxChain 200000
(156159,383)
(19.32 secs, 709303708 bytes)
<< This one take about 10 minutes and swap a lot >>
*Main> maxChain 300000
(230631,443)
(38.02 secs, 1083800124 bytes)
<< This one swap a lot and does not end in less than 10 minutes >>
*Main> maxChain 400000
The ratio memreq/n seems to be more or less constant :
*Main> 3697648/1000
3697.648
*Main> 31560008/10000
3156.0008
*Main> 347122064/100000
3471.22064
*Main> 709303708/200000
3546.51854
*Main> 1083800124/300000
3612.66708
Thank you,
Bruno.
2008/3/31, Bulat Ziganshin
Hello Bruno,
Monday, March 31, 2008, 7:51:43 PM, you wrote:
I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ?
first step is to reduce n and see whether program will finish and how memreqs depends on value of n
main = let n = 1000000
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

"Bruno Carnazzi"
The program ends for values up to 400000 :
Wild guess here - I know nothing about the problem, and haven't examined your program in detail - but could it be that you default to Int, and that it wraps silently at some power of two, thereby making your algorithm wrap around? Try to stick some 'Integer' type annotations in there, and see if it helps. -k (cetera censeo...) -- If I haven't seen further, it is by standing in the footprints of giants

I've done this modification with no more success :
import qualified Data.List as List
import qualified Data.Map as Map
f :: Integer -> Integer
f n | even n = n `div` 2
| otherwise = 3 * n + 1
chain m n =
let chain' cn cm | Map.member cn m = Map.map (+ (m Map.! cn)) cm
| otherwise = chain' (f cn) $! Map.insert cn 1
(Map.map (+1) cm)
in chain' n Map.empty
chains n = List.foldl' (\m i -> Map.union m (chain m i))
(Map.singleton 1 1) [2..n]
maxCollatz c1@(_,l1) c2@(_,l2) | l1 < l2 = c2
| otherwise = c1
maxChain = List.foldl' maxCollatz (0,0) . Map.toList . chains
main =
let n = 1000000
in putStrLn $ show $ maxChain n
Best regards,
Bruno.
2008/3/31, Ketil Malde
"Bruno Carnazzi"
writes: The program ends for values up to 400000 :
Wild guess here - I know nothing about the problem, and haven't examined your program in detail - but could it be that you default to Int, and that it wraps silently at some power of two, thereby making your algorithm wrap around? Try to stick some 'Integer' type annotations in there, and see if it helps.
-k
(cetera censeo...)
-- If I haven't seen further, it is by standing in the footprints of giants

On Mon, Mar 31, 2008 at 6:00 PM, Bruno Carnazzi
I've done this modification with no more success :
import qualified Data.List as List import qualified Data.Map as Map
f :: Integer -> Integer
f n | even n = n `div` 2 | otherwise = 3 * n + 1
chain m n = let chain' cn cm | Map.member cn m = Map.map (+ (m Map.! cn)) cm | otherwise = chain' (f cn) $! Map.insert cn 1 (Map.map (+1) cm) in chain' n Map.empty
This function raises a red flag for me. The collatz sequence gets _very_ big, with very many distinct numbers. You are saving the length of the resulting chain for each of those numbers, which is going to take a lot of memory. But Map is also lazy in its values, so the values you are storing once chain finishes will look like: 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 Instead of 22, which is taking quite a lot of memory as well. My impression is that the caching approach is just a bad idea. It is not necessary to solve the problem efficiently; a brute force approach runs in under 1 minute in constant memory for me. Try to simplify your approach. I'd suggest generating a list representing the collatz sequence starting at the number, then just taking its 'length'. Do that for each number and find the maximum. There should be no need for Data.Map. Luke

2008/3/31, Bruno Carnazzi
Dears Haskellers,
As an Haskell newbie, I'm learning Haskell by trying to resolve Euler Project problems (http://projecteuler.net/ ). I'm hanging on problem 14 (Collatz problem).
I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ?
Others have pointed potential source of memory leaks, but I must say that using Data.Map for the cache in the first place appear to me as a very bad idea... Data.Map by nature take much more place than necessary. You have an integer index, why not use an array instead ?
import Data.Array import Data.List import Data.Ord
syrs n = a where a = listArray (1,n) $ 0:[ syr n x | x <- [2..n]] syr n x = if x' <= n then a ! x' else 1 + syr n x' where x' = if even x then x `div` 2 else 3 * x + 1
main = print $ maximumBy (comparing snd) $ assocs $ syrs 1000000
This solution takes 2 seconds (on my machine) to resolve the problem. On the other hand, now that I have read your solution, I see that using Map was the least of the problem... All those Map.map, while retaining the original Map... Your solution is too clever (twisted) for its own good, I suggest you aim for simplicity next time. -- Jedaï

chaddai.fouche:
2008/3/31, Bruno Carnazzi
: Dears Haskellers,
As an Haskell newbie, I'm learning Haskell by trying to resolve Euler Project problems (http://projecteuler.net/ ). I'm hanging on problem 14 (Collatz problem).
I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ?
Others have pointed potential source of memory leaks, but I must say that using Data.Map for the cache in the first place appear to me as a very bad idea... Data.Map by nature take much more place than necessary. You have an integer index, why not use an array instead ?
import Data.Array import Data.List import Data.Ord
syrs n = a where a = listArray (1,n) $ 0:[ syr n x | x <- [2..n]] syr n x = if x' <= n then a ! x' else 1 + syr n x' where x' = if even x then x `div` 2 else 3 * x + 1
main = print $ maximumBy (comparing snd) $ assocs $ syrs 1000000
This solution takes 2 seconds (on my machine) to resolve the problem.
On the other hand, now that I have read your solution, I see that using Map was the least of the problem... All those Map.map, while retaining the original Map... Your solution is too clever (twisted) for its own good, I suggest you aim for simplicity next time.
and if its got Int indices, Data.IntMap is always a better option than Data.Map and usually outperforms the HashTable type, while being pure.

2008/4/1, Chaddaï Fouché
2008/3/31, Bruno Carnazzi
: Dears Haskellers,
As an Haskell newbie, I'm learning Haskell by trying to resolve Euler Project problems (http://projecteuler.net/ ). I'm hanging on problem 14 (Collatz problem).
I've written the following program... Which does not end in a reasonable time :( My algorithm seems ok to me but I see that memory consumption is gigantic... Is this a memory problem with Data.Map ? Or an infinite loop ? (Where ?) In a more general way, how can I troubleshoot these kind of problem ?
Others have pointed potential source of memory leaks, but I must say that using Data.Map for the cache in the first place appear to me as a very bad idea... Data.Map by nature take much more place than necessary. You have an integer index, why not use an array instead ?
Because I don't know anything about arrays in Haskell. Thank you for pointing this, I have to read some more Haskell manuals :)
import Data.Array import Data.List import Data.Ord
syrs n = a where a = listArray (1,n) $ 0:[ syr n x | x <- [2..n]] syr n x = if x' <= n then a ! x' else 1 + syr n x' where x' = if even x then x `div` 2 else 3 * x + 1
main = print $ maximumBy (comparing snd) $ assocs $ syrs 1000000
The logic and the complexity in this algorithm is comparable to mine but the performance difference is huge, which is not very intuitive in my mind (There is no "1+1+1+1+1..." problem with array ?)
This solution takes 2 seconds (on my machine) to resolve the problem.
On the other hand, now that I have read your solution, I see that using Map was the least of the problem... All those Map.map, while retaining the original Map... Your solution is too clever (twisted) for its own good, I suggest you aim for simplicity next time.
-- Jedaï
Thank you, Bruno.

2008/4/1, Bruno Carnazzi
Because I don't know anything about arrays in Haskell. Thank you for pointing this, I have to read some more Haskell manuals :)
A good place to learn about Haskell's array (which come in many flavours) is this wiki page : http://www.haskell.org/haskellwiki/Modern_array_libraries
import Data.Array import Data.List import Data.Ord
syrs n = a where a = listArray (1,n) $ 0:[ syr n x | x <- [2..n]] syr x = if x' <= n then 1 + a ! x' else 1 + syr x' where x' = if even x then x `div` 2 else 3 * x + 1
main = print $ maximumBy (comparing snd) $ assocs $ syrs 1000000
The logic and the complexity in this algorithm is comparable to mine but the performance difference is huge, which is not very intuitive in my mind (There is no "1+1+1+1+1..." problem with array ?)
Array or Map isn't really the problem here (my algorithm with a Map instead only take 6s to find the solution) as I thought at first. The main problem in your code I think is that because of Map.map, you create multiple copies of your smaller Maps in memory and union force them to materialize, while the fact that you don't evaluate the value means the GC won't collect them. Anyway, your algorithm by itself is pretty slow I think, since for every step to a number which is not already recorded you must add 1 to all the numbers you passed on the way. -- Jedaï
participants (6)
-
Bruno Carnazzi
-
Bulat Ziganshin
-
Chaddaï Fouché
-
Don Stewart
-
Ketil Malde
-
Luke Palmer