Flying Dutchman sailing blind

Jeroen van Maanen

5 Oct 2010 5 Oct '10

8:06 a.m.

A few weeks ago I was greatly helped by members of this list to expose an error that was hiding in my, by now quite extensive, Haskell program. I should still thank Daniel Fisher for his lucid explanation of how an overdose of strictness can prevent a try-expression from wrapping an exception. Now that the error is gone, I face a new challenge: when I let my program run for a while (it is a computationally intensive task that never stops by design), after fifteen minutes or so the CPU usage drops from nearly 100% to around 15% and a few minutes later the process dies with the message: "lexau: memory allocation failed (requested 2097152 bytes)" The whole thing is a tail biting snail pit of threads that are communicating through MVars. Every thread runs a tail recursive function. (I read in another post that it is not a good idea to use explicit recursion, but when I compared alternatives the fold variants produced even worse stack/heap problems.) The thread that is likely to cause the problem is an optimizer that tries many possible improvements of a complex data structure and incrementally applies the successful ones. I use a strict foldl style in an attempt to limit the memory used by the optimizer. Frankly, however, I have no idea what eats up so much heap space. Now I have always proudly labeled myself a 'printf programmer', but I am afraid that I am going to need some profiling tool to determine where the problem is. Any suggestions where I should start? Cheers, Jeroen P.S. For an idea of what is living in the snail pit, have a look at: http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipel... http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model... (I know, HistoryTree.hs badly needs to be split up into smaller modules.)

Show replies by date

Daniel Fischer

5 Oct 5 Oct

10:07 a.m.

On Tuesday 05 October 2010 10:06:03, Jeroen van Maanen wrote:

...

A few weeks ago I was greatly helped by members of this list to expose an error that was hiding in my, by now quite extensive, Haskell program. I should still thank Daniel Fisher for his lucid explanation of how an overdose of strictness can prevent a try-expression from wrapping an exception.

Now that the error is gone, I face a new challenge: when I let my program run for a while (it is a computationally intensive task that never stops by design), after fifteen minutes or so the CPU usage drops from nearly 100% to around 15% and a few minutes later the process dies with the message:

"lexau: memory allocation failed (requested 2097152 bytes)"

The whole thing is a tail biting snail pit of threads that are communicating through MVars. Every thread runs a tail recursive function. (I read in another post that it is not a good idea to use explicit recursion, but when I compared alternatives the fold variants produced even worse stack/heap problems.)

That depends. Using a combinator (folds etc.) gives you more elegant, more general and usually more readable code. Using a low-level direct recursion however gives you more control over strictness, speed and allocation behaviour. Sometimes you have to stick your hands in the low-level details to get adequate performance.

...

The thread that is likely to cause the problem is an optimizer that tries many possible improvements of a complex data structure and incrementally applies the successful ones. I use a strict foldl style in an attempt to limit the memory used by the optimizer.

May be not strict enough (or too strict). What's the suspect function?

...

Frankly, however, I have no idea what eats up so much heap space.

Now I have always proudly labeled myself a 'printf programmer', but I am afraid that I am going to need some profiling tool to determine where the problem is. Any suggestions where I should start?

As a very first measure, run your programme with the "-hT" RTS-option ( $ ./lexau +RTS -hT -RTS args -- wait until it dies or ctrl-C it when CPU usage has dropped $ hp2ps -c lexau.hp -- look at the graph in lexau.ps ). If that doesn't reveal the source of the problem, compile for profiling, $ ghc -O2 -prof -auto-all -osuf p_o -hisuf p_hi --make lexau -o profLexau and run with some heap profiling options, http://darcs.haskell.org/download/docs/6.12.3/html/users_guide/profiling.htm... explains how.

...

Cheers, Jeroen

P.S. For an idea of what is living in the snail pit, have a look at:

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/P ipeline/Concurrent.hs?view=markup http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/M odel/HistoryTree.hs?view=markup

Ewww.

...

(I know, HistoryTree.hs badly needs to be split up into smaller modules.)

+ 5

Jeroen van Maanen

13 Oct 13 Oct

12:52 p.m.

Thanks again. Problem solved,... well, under control. ;-) Op 2010-10-05 12:07, Daniel Fischer wrote:

...

On Tuesday 05 October 2010 10:06:03, Jeroen van Maanen wrote:

...
A few weeks ago I was greatly helped by members of this list to expose an error that was hiding in my, by now quite extensive, Haskell program. I should still thank Daniel Fisher for his lucid explanation of how an overdose of strictness can prevent a try-expression from wrapping an exception.

Now that the error is gone, I face a new challenge: when I let my program run for a while (it is a computationally intensive task that never stops by design), after fifteen minutes or so the CPU usage drops from nearly 100% to around 15% and a few minutes later the process dies with the message:

"lexau: memory allocation failed (requested 2097152 bytes)"

The whole thing is a tail biting snail pit of threads that are communicating through MVars. Every thread runs a tail recursive function. (I read in another post that it is not a good idea to use explicit recursion, but when I compared alternatives the fold variants produced even worse stack/heap problems.)

That depends. Using a combinator (folds etc.) gives you more elegant, more general and usually more readable code. Using a low-level direct recursion however gives you more control over strictness, speed and allocation behaviour. Sometimes you have to stick your hands in the low-level details to get adequate performance.

...
The thread that is likely to cause the problem is an optimizer that tries many possible improvements of a complex data structure and incrementally applies the successful ones. I use a strict foldl style in an attempt to limit the memory used by the optimizer.

May be not strict enough (or too strict). What's the suspect function?

I spent most of the time looking in the wrong module. It turned out to be a function that prepares an 'oracle' for my optimizer that was not strict enough.

...

...
Frankly, however, I have no idea what eats up so much heap space.

Now I have always proudly labeled myself a 'printf programmer', but I am afraid that I am going to need some profiling tool to determine where the problem is. Any suggestions where I should start?

As a very first measure, run your programme with the "-hT" RTS-option ( $ ./lexau +RTS -hT -RTS args -- wait until it dies or ctrl-C it when CPU usage has dropped $ hp2ps -c lexau.hp -- look at the graph in lexau.ps ).

It was kind of hard to interpret. In the end it was the huge amount of 4-tuples that got me on track. I converted the 4-tuples in two modules to data types (with different constructors). That allowed me to see where the memory went.

...

If that doesn't reveal the source of the problem, compile for profiling,

$ ghc -O2 -prof -auto-all -osuf p_o -hisuf p_hi --make lexau -o profLexau

and run with some heap profiling options, http://darcs.haskell.org/download/docs/6.12.3/html/users_guide/profiling.htm... explains how.

...
Cheers, Jeroen

P.S. For an idea of what is living in the snail pit, have a look at:

I had meant to write "snake pit", but given the lack of speed and the amount of trailing goo I guess "snail pit" is even more accurate.

...

...
http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipel... http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model...

Ewww.

So in fact the culprit turned out to be the function updateRationals in the module http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Distr... It is still eating more time than the actual optimizer, so suggestions for improvement are still welcome.

...

...
(I know, HistoryTree.hs badly needs to be split up into smaller modules.)

+ 5

That is going to be my next focus.

Daniel Fischer

1:53 p.m.

On Wednesday 13 October 2010 14:52:58, Jeroen van Maanen wrote:

...

So in fact the culprit turned out to be the function updateRationals in the module

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/D istribution/MDL.hs?view=markup

It is still eating more time than the actual optimizer, so suggestions for improvement are still welcome.

First, approximateRational :: Double -> Rational approximateRational x = let (m, e) = decodeFloat x in if e >= 0 then (m * (2 ^ e)) % 1 else m % (2 ^ (-e)) is exactly toRational, so there's no need for that function. Also, there's a much faster implementation of toRational (for Double and Float) underway, I think it will be in the first GHC 7 release due in a couple of weeks. Anyway, using toRational will let you profit from that with no additional effort when you get a compiler with that patch. Second, data Threshold = Threshold { theBoundA :: Rational , theBoundB :: Rational , theCountA :: Integer , theCountB :: Integer } deriving Show I have not looked much at the code, but it seems likely that you will want strict fields there, data Threshold = Threshold { theBoundA :: !Rational , ... } but that's to be tested later. Third, mapToThresholds :: [Threshold] -> [Rational] -> [(Rational, Rational)] mapToThresholds _ [] = [] mapToThresholds thresholds@((Threshold boundA boundB intA intB) : moreThresholds) rationals@(x : moreRationals) | x > boundB = mapToThresholds moreThresholds rationals | x > boundA = let width = boundB - boundA count = fromInteger (intB - intA) mapped = (((x - boundA) * count) / width) + (fromInteger intA) in (mapped, x) : mapToThresholds thresholds moreRationals | True = error $ "Rational is too small: " ++ (show x) ++ " < " ++ (show boundA) mapToThresholds [] (x : _) = error $ "Rational is too big: " ++ (show x) will probably profit from making mapped strict, let ... in mapped `seq` (mapped, x) : mapToThreholds ... Now updateRationals: updateRationals :: Integer -> [(Integer, Rational)] -> Integer -> [(Integer, Rational)] updateRationals previousWeight previousRationals w else let mapped = mapToThresholds thresholds boundaries mappedIntervals = zip ((0, 0) : mapped) mapped ((_, a), (_, b)) = foldl1' maxMappedInterval mappedIntervals That's no good, unfortunately. maxMappedInterval :: ((Rational, Rational), (Rational, Rational)) -> ((Rational, Rational), (Rational, Rational)) -> ((Rational, Rational), (Rational, Rational)) maxMappedInterval ((ma, a), (mb, b)) ((mc, c), (md, d)) = if md - mc > mb - ma then ((mc, c), (md, d)) else ((ma, a), (mb, b)) foldl1' evaluates the result of maxMappedInterval to weak head normal form, that is to the outermost constructor. Depending on what the optimiser does, that may or may not evaluate the condition md - mc > mb - ma, but it will *not* look at a, b, c, d, and at least the second components of the inner pairs happily build thunks, possibly keeping references to the elements of the list already processed, so keeping stuff from being garbage collected. What you need is a strict type to contain your Rationals, data SRQ = SRQ !Rational !Rational !Rational !Rational maxMappedInterval :: SRQ -> ((Rational,Rational)) -> SRQ maxMappedInterval s@(SRQ ma a mb b) ((mc,c),(md,d)) | mb - ma < md - mc = SRQ mc c md d | otherwise = s Then the foldl1' will evaluate all components and you don't get thunks or space leaks.

Jeroen van Maanen

15 Oct 15 Oct

7:45 a.m.

On 2010-10-13 15:53, Daniel Fischer wrote:

...

On Wednesday 13 October 2010 14:52:58, Jeroen van Maanen wrote:

...
So in fact the culprit turned out to be the function updateRationals in the module

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/D istribution/MDL.hs?view=markup

It is still eating more time than the actual optimizer, so suggestions for improvement are still welcome.

First,

approximateRational :: Double -> Rational [...]

is exactly toRational, so there's no need for that function. Also, there's a much faster implementation of toRational (for Double and Float) underway, I think it will be in the first GHC 7 release due in a couple of weeks. Anyway, using toRational will let you profit from that with no additional effort when you get a compiler with that patch.

This was one of the first tricky bits that I converted from Java (in fact, it was written in Clojure, a Lisp variant that runs on the Java virtual machine) to Haskell. I was fighting with implicit type inference, and I didn't know then how to fix the type of the value returned by the random number generator. Thanks for pointing it out.

...

Second, [Third ... Now updateRationals]

Thanks for taking me through this step-by-step. It helps me tremendously in developing an intuition about what is going on in my program. It feels a bit like when I started writing XSLT style-sheets. ;-) Looking at the graphical presentation of heap usage[1] I guess that this solved most of my heap problems. However the large amount allocated to TSO (I guess that is Thread State Object) suggests to me that I still have an issue with stack usage here. Another thing that I noticed when I went over my algorithm again, is that the list of weight/bound pairs is sorted according to the bound component on each recursion step. This also happens when the produced data structure is used later on in the program: I sort all pairs with a weight smaller than a certain threshold by bound. Is there a some prepackaged data structure around that would efficiently support this usage pattern? Oh, and I Googled for "Haskell BLACKHOLE", but didn't find an explanation of what it is. Is the amount of black holes on the heap of my program an indication of some inefficiency that can be removed? Cheers, Jeroen [1] http://lexau.org/pub/lexau-heap.pdf

Daniel Fischer

9:32 a.m.

On Friday 15 October 2010 09:45:46, Jeroen van Maanen wrote:

...

On 2010-10-13 15:53, Daniel Fischer wrote:

...
On Wednesday 13 October 2010 14:52:58, Jeroen van Maanen wrote:

...
So in fact the culprit turned out to be the function updateRationals in the module

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExA u/D istribution/MDL.hs?view=markup

It is still eating more time than the actual optimizer, so suggestions for improvement are still welcome.

First,

approximateRational :: Double -> Rational [...]

is exactly toRational, so there's no need for that function. Also, there's a much faster implementation of toRational (for Double and Float) underway, I think it will be in the first GHC 7 release due in a couple of weeks. Anyway, using toRational will let you profit from that with no additional effort when you get a compiler with that patch.

This was one of the first tricky bits that I converted from Java (in fact, it was written in Clojure, a Lisp variant that runs on the Java virtual machine) to Haskell. I was fighting with implicit type inference, and I didn't know then how to fix the type of the value returned by the random number generator. Thanks for pointing it out.

...
Second, [Third ... Now updateRationals]

Thanks for taking me through this step-by-step. It helps me tremendously in developing an intuition about what is going on in my program. It feels a bit like when I started writing XSLT style-sheets. ;-)

Looking at the graphical presentation of heap usage[1] I guess that this solved most of my heap problems.

According to that profile, your live heap starts out at between 3500 and 4000 kilobytes and decreases from then on. It may be less than optimal, but a heap problem looks quite different. You can get more info by doing profiling runs with -hc, -hb and so on.

...

However the large amount allocated to TSO (I guess that is Thread State Object) suggests to me that I still have an issue with stack usage here.

You seem to fork/spark a lot of threads, so a lot of TSOs are generated.

...

Another thing that I noticed when I went over my algorithm again, is that the list of weight/bound pairs is sorted according to the bound component on each recursion step. This also happens when the produced data structure is used later on in the program: I sort all pairs with a weight smaller than a certain threshold by bound. Is there a some prepackaged data structure around that would efficiently support this usage pattern?

Oh, and I Googled for "Haskell BLACKHOLE", but didn't find an explanation of what it is. Is the amount of black holes on the heap of my program an indication of some inefficiency that can be removed?

I think section 8 of http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf will be useful reading. As a guess, it may help to set up some data single-threadedly before going parallel.

...

Cheers, Jeroen

[1] http://lexau.org/pub/lexau-heap.pdf

Stephen Tetley

5 Oct 5 Oct

11:46 a.m.

On 5 October 2010 09:06, Jeroen van Maanen wrote:

...

P.S. For an idea of what is living in the snail pit, have a look at:

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipel... http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model...

(I know, HistoryTree.hs badly needs to be split up into smaller modules.)

A minor style tip... You are using ShowS family functions (showString, showChar) to generate strings which is good as it avoids (++). However, to make the code clearer you might want to code up a little helper library with ShowS versions of the usual pretty print combinators. e.g. all the functions like parens: parens :: ShowS -> ShowS parens s = showChar '(' . s . showChar ')' Having a library of these functions usually pays off (I think there is one on Hackage but I can't remember its name). You could use a pretty printing library, but they are somewhat less efficient as the have to do work measuring line lengths for fitting lines to screen width. Best wishes Stephen

Bas van Dijk

6 Oct 6 Oct

7:18 a.m.

On Tue, Oct 5, 2010 at 1:46 PM, Stephen Tetley wrote:

...

On 5 October 2010 09:06, Jeroen van Maanen wrote:

...
P.S. For an idea of what is living in the snail pit, have a look at:

http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipel... http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model...

(I know, HistoryTree.hs badly needs to be split up into smaller modules.)

A minor style tip...

You are using ShowS family functions (showString, showChar) to generate strings which is good as it avoids (++). However, to make the code clearer you might want to code up a little helper library with ShowS versions of the usual pretty print combinators.

e.g. all the functions like parens:

parens :: ShowS -> ShowS parens s = showChar '(' . s . showChar ')'

Having a library of these functions usually pays off (I think there is one on Hackage but I can't remember its name). You could use a pretty printing library, but they are somewhat less efficient as the have to do work measuring line lengths for fitting lines to screen width.

Best wishes

Stephen _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Instead of a ShowS you might want to use a DString[1] which is basically a newtype wrapper around a ShowS. The advantage of a DString is that it has an instance for IsString which allows you to write overloaded string literals of type DString. You might also take a look at string-combinators[2] which provides handy polymorphic functions to build and combine string-like values. For example it provides: paren :: (Monoid s, IsString s) => s -> s (I've no idea why I named this 'paren' instead of 'parens' I will change this in the next version) Regards, Bas [1] http://hackage.haskell.org/package/dstring [2] http://hackage.haskell.org/package/string-combinators

5376

Age (days ago)

5386

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Bas van Dijk
Daniel Fischer
Jeroen van Maanen
Stephen Tetley