
Thanks again. Problem solved,... well, under control. ;-) Op 2010-10-05 12:07, Daniel Fischer wrote:
On Tuesday 05 October 2010 10:06:03, Jeroen van Maanen wrote:
A few weeks ago I was greatly helped by members of this list to expose an error that was hiding in my, by now quite extensive, Haskell program. I should still thank Daniel Fisher for his lucid explanation of how an overdose of strictness can prevent a try-expression from wrapping an exception.
Now that the error is gone, I face a new challenge: when I let my program run for a while (it is a computationally intensive task that never stops by design), after fifteen minutes or so the CPU usage drops from nearly 100% to around 15% and a few minutes later the process dies with the message:
"lexau: memory allocation failed (requested 2097152 bytes)"
The whole thing is a tail biting snail pit of threads that are communicating through MVars. Every thread runs a tail recursive function. (I read in another post that it is not a good idea to use explicit recursion, but when I compared alternatives the fold variants produced even worse stack/heap problems.)
That depends. Using a combinator (folds etc.) gives you more elegant, more general and usually more readable code. Using a low-level direct recursion however gives you more control over strictness, speed and allocation behaviour. Sometimes you have to stick your hands in the low-level details to get adequate performance.
The thread that is likely to cause the problem is an optimizer that tries many possible improvements of a complex data structure and incrementally applies the successful ones. I use a strict foldl style in an attempt to limit the memory used by the optimizer.
May be not strict enough (or too strict). What's the suspect function?
I spent most of the time looking in the wrong module. It turned out to be a function that prepares an 'oracle' for my optimizer that was not strict enough.
Frankly, however, I have no idea what eats up so much heap space.
Now I have always proudly labeled myself a 'printf programmer', but I am afraid that I am going to need some profiling tool to determine where the problem is. Any suggestions where I should start?
As a very first measure, run your programme with the "-hT" RTS-option ( $ ./lexau +RTS -hT -RTS args -- wait until it dies or ctrl-C it when CPU usage has dropped $ hp2ps -c lexau.hp -- look at the graph in lexau.ps ).
It was kind of hard to interpret. In the end it was the huge amount of 4-tuples that got me on track. I converted the 4-tuples in two modules to data types (with different constructors). That allowed me to see where the memory went.
If that doesn't reveal the source of the problem, compile for profiling,
$ ghc -O2 -prof -auto-all -osuf p_o -hisuf p_hi --make lexau -o profLexau
and run with some heap profiling options, http://darcs.haskell.org/download/docs/6.12.3/html/users_guide/profiling.htm... explains how.
Cheers, Jeroen
P.S. For an idea of what is living in the snail pit, have a look at:
I had meant to write "snake pit", but given the lack of speed and the amount of trailing goo I guess "snail pit" is even more accurate.
http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipel... http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model...
Ewww.
So in fact the culprit turned out to be the function updateRationals in the module http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Distr... It is still eating more time than the actual optimizer, so suggestions for improvement are still welcome.
(I know, HistoryTree.hs badly needs to be split up into smaller modules.)
+ 5
That is going to be my next focus.