Profiling help (Warning: Euler spoilers)

I have written the program below, which solves problem 14 from Project Euler, which asks us to find the hailstone number below 1 million that takes the longest to get to 1. The program solves the problem using dynamic programming using a Data.Map. It completes in under a minute (barely), but allocates a ridiculous amount of memory and actually runs slower than the naive one-liner it was supposed to replace. I have an intuition on what the problem is, but would like to verify it by profiling. Unfortunately, I don't know enough about how profiling works in GHC to diagnose the problem. Hopefully the Cafe can help. Here's the code. scratch.hs: This is the main program, which just calls the problemFourteen function. The EP datatype just contains an int representing the problem number and an (IO ()) which prints out the solution --------------------------------------------------------------------------------------------------------------------------- import Euler1to20 -- contains problems 1 through 20 import Euler -- Contains various utility functions including the EC type which represents Euler Problems main = proc where (EP num proc) = problemFourteen --------------------------------------------------------------------------------------------------------------------------- Euler1to20.hs: contains the following lines, among others --------------------------------------------------------------------------------------------------------------------------- problemFourteen = EP 14 $ print $ (key, fromJust $ findIndex (==key) keys) where (keys, results) = runState (mapM findChainLength [1..10^6]) $ M.fromList [(1,1)] key = maximum keys findChainLength::Integer->State (M.Map Integer Integer) Integer findChainLength n = do m <- get case M.lookup n m of Nothing -> let nextLink = if even n then div n 2 else 3*n+1 in do len <- liftM (1+) (findChainLength nextLink) m2 <- get m2 `seq` put (M.insert n len m2) return len Just n'-> return n' --------------------------------------------------------------------------------------------------------------------------- My guess is that the problem is in the recursive call to findChainLength, which is not in tail position. I think that this is causing the code to keep too many old copies of the map around. Before I go rewriting everything, however, I'd like to verify that this is indeed the problem, so the next step is to compile and run with profiling turned on. After quickly looking at the Real World Haskell chapter on profiling, I do the following. Compiling and running: --------------------------------------------------------------------------------------------------------------------------- ghc -prof --make -auto-all -fforce-recomp scratch.hs [1 of 3] Compiling Euler ( Euler.hs, Euler.o ) [2 of 3] Compiling Euler1to20 ( Euler1to20.hs, Euler1to20.o ) [3 of 3] Compiling Main ( scratch.hs, scratch.o ) Linking scratch ... ./scratch +RTS -sstderr -hc -p -K100M (525,837798) 9,503,907,112 bytes allocated in the heap 42,785,764,376 bytes copied during GC (scavenged) 15,697,610,688 bytes copied during GC (not scavenged) 421,548,032 bytes maximum residency (193 sample(s)) 17787 collections in generation 0 (650.00s) 193 collections in generation 1 (1748.37s) 782 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 18.68s ( 19.19s elapsed) GC time 2398.37s (2401.47s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 50.48s ( 50.50s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2467.53s (2471.17s elapsed) %GC time 97.2% (97.2% elapsed) Alloc rate 508,742,769 bytes per MUT second Productivity 0.8% of total user, 0.8% of total elapsed --------------------------------------------------------------------------------------------------------------------------- I can see from the above that the maximum resident memory at one time is 782Mb, which seems excessive, since I don't expect the Map to contain more than around 2M entries. Also, I had to use "-K 100M" to increase the stack space or the program bombs out with a stack overflow exception. Since I compiled with profiling information, running the program produced the following scratch.prof: --------------------------------------------------------------------------------------------------------------------------- Wed Mar 3 19:41 2010 Time and Allocation Profiling Report (Final) scratch +RTS -sstderr -hc -p -K100M -RTS total time = 18.68 secs (934 ticks @ 20 ms) total alloc = 5,327,238,992 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc findChainLength Euler1to20 93.8 90.5 CAF Euler1to20 5.2 8.0 problemFourteen Euler1to20 1.0 1.5 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 1 0 0.0 0.0 100.0 100.0 CAF Main 212 2 0.0 0.0 0.0 0.0 main Main 218 1 0.0 0.0 0.0 0.0 CAF GHC.Handle 164 4 0.0 0.0 0.0 0.0 CAF Euler1to20 139 21 5.2 8.0 100.0 100.0 findChainLength Euler1to20 220 3168610 93.8 90.5 93.8 90.5 problemFourteen Euler1to20 219 1 1.0 1.5 1.0 1.5 CAF Control.Monad.State.Lazy 137 1 0.0 0.0 0.0 0.0 --------------------------------------------------------------------------------------------------------------------------- The above tells me that almost all the CPU time and memory allocation happens in findChainLength, but that isn't really a surprise. I think the next step is to start adding cost centers to get a more detailed profile, but here is where I hit the end of my ghc profiling knowledge. Can anyone help with some general strategies for how to use the profiling tools in GHC to track this down? Of course once the culprit has been found, I have to figure out what to *do* about it, but one thing at a time... James
participants (1)
-
James Jackson