Memory allocation in extractor function (newbie question)

Hello, While trying to write a program for the countPaths Code Jam problem I ran into what seems to me as a weird behaviour in terms of memory allocation. The task is to count the number of way you can spell a certain "word" by walking some path on a board of letters. Being a newbie I started with a very straightforward recursive approach, which is not effective, but still helps to improve understanding of Haskell basics. Quite naturally I quickly hit the point where the program becomes too slow, so I tried to profile and optimize the code. I used 3 data types: data GridEntry = GridEntry {letter :: Char, neighbors :: [Point]} data Area = Area {letters :: (Array (Int, Int) GridEntry), width :: Int, height :: Int} data Point = Point { x :: Int, y :: Int} deriving (Show, Eq) With the recursive approach I had to retrieve GridEntries (a letter in a certain point of grid together with the list of its neighbours on the grid) a lot. Quite surprisingly I saw that this operation was overly expensive in my program. The most time was spent in these 2 functions: countPathsFrom :: Area -> Int -> String -> Point -> Int countPathsFrom _ count [] _ = count countPathsFrom area count (l:[]) point = count + if (letter (letterAt area point) == l) then 1 else 0 countPathsFrom area count (l:ls) point = let gridPt = letterAt area point points = neighbors gridPt in if (letter gridPt == l) then countForPoints area ls count points else count letterAt :: Area -> Point -> GridEntry letterAt area (Point x y) = letters area ! (x, y) The profiling log (./paths +RTS -P) shows the following time/space behaviour for them: countPathsFrom Main 181 11830476 37.8 52.3 59.5 82.2 28 1325013312 letterAt Main 184 11830476 21.6 29.9 21.6 29.9 16 757150464 What puzzles me here is how the function letterAt generates 64 bytes of garbage per call. Since there are no side-effects in this section of program - and noone can damage the 'area' structure - I would expect the compiler/runtime to use a reference to the data in the 'area' structure when returning values from letterAt. Instead it seems to make a copy of GridEntry and return that. The question here is whether (why?) this behaviour is normal and if it is possible to optimize the code in such cases. Thanks in advance, -Alexander

Alexander Kireyev wrote:
While trying to write a program for the countPaths Code Jam problem I ran into what seems to me as a weird behaviour in terms of memory allocation... The profiling log (./paths +RTS -P) shows the following time/space behaviour for them...
Hi Alexander, I'm not sure I can fully explain your profiling behavior either, but I'll take a stab at what the problem might be anyway. You didn't show us the code for countForPoints. I'll bet you wrote something like countForPoints area ls count points = sum $ map (countPathsFrom area (count + 1) ls) points Unfortunately, the standard sum function is too lazy in many situations. And if you wrote out the recursion for countForPoints by hand, you may have run into the same problem. In either case, you'll be accumulating huge amounts of unevaluated (+) thunks instead of adding up the count as you go along. If this is the problem, you can fix it by using foldl', a strict version of foldl from Data.List, instead of sum: countForPoints area ls count points = foldl' (+) 0 $ map (countPathsFrom area (count + 1) ls) points Hope this helps, Yitz
participants (2)
-
Alexander Kireyev
-
Yitzchak Gale