Diagnosing : Large memory usage + low CPU

Hello, I am testing a simple algorithm and find that during part of the execution this is fast and uses acceptable memory. However, when it gets to a certain point the memory climbs to 75-80 % of the OS's memory and CPU plummets to a mere 5% at most. In the part of the application that executes ok I have calls as follows: let r9' = evalTagger'' (tagFun suffixCapsFreq) $ test let a9' = tagginAccuracy r9' putStrLn ("Suffix(3) + Caps + Freq tag result done = " ++ (show a9')) The slow one has: let r12' = evalTagger'' ruleSuffixCapsFreq $ test let a12' = tagginAccuracy r12' putStrLn ("Rules + Suffix(3) + Caps + Freq tag result done = " ++ (show a12')) The difference lies in the following functions: ruleApplication :: TransformationRule -> POSTags -> Maybe Tag ruleApplication (NextTagRule (Replacement old new) next) z = do (_, _, proposed) <- Z.safeCursor z (_, _, nextProposed) <- rightCursor z if proposed == old && nextProposed == next then Just new else Nothing .... updateState :: (TransformationRule,Int) -> POSTags -> POSTags updateState r = Z.fromList . reverse . Z.foldlz' (update r) [] where update (r,_) !xs z = case ruleApplication r z of Just tag -> (token, correct, tag):xs Nothing -> e:xs where e@(token, correct, _proposed) = Z.cursor z rulesT :: [(TransformationRule, Int)] -> POSTags -> POSTags rulesT rs state = L.foldl' tag state rs where tag !s rule = updateState rule s (.>) :: (POSTags -> Tag) -> (POSTags -> Tag) -> POSTags -> Tag f .> g = \ x -> case (g x) of "" -> f x t -> t (|>) :: (POSTags -> POSTags) -> (POSTags -> Tag) -> POSTags -> POSTags f |> g = \ x -> f $ tagFun g x let ruleT = rulesT top10Rules let suffixCapsFreq = suffixT .> capitalizeT .> freqT let ruleSuffixCapsFreq = ruleT |> suffixCapsFreq where top10Rules is a list. Note that suffixCapsFreq and tagFun are common to both calls. I have placed bangs in many places and used eager evaluation in the folds. However nothing seems to help. Can anyone tell me how I should diagnose this problem. Any suggestions are welcome. TIA, Hugo F.

Hi Hugo What is a POSTags and how big do you expect it to be? Generally I'd recommend you first try to calculate the size of your data rather than try to strictify things, see Johan Tibell's very useful posts: http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html Once you know the size of your data - you can decide if it is too big to comfortably work with in memory. If it is too big you need to make sure you're are streaming[*] it rather than forcing it into memory. If POSTags is large, I'd be very concerned about the top line of updateState - reversing lists (or sorting them) simply doesn't play well with streaming. [*] Even in a lazy language like Haskell, streaming data isn't necessarily automatic.

Hello, On 11/29/2011 10:57 PM, Stephen Tetley wrote:
Hi Hugo
What is a POSTags and how big do you expect it to be?
type Token = String type Tag = String type NGramTag = (Token, Tag, Tag) type POSTags = Z.Zipper NGramTag
Generally I'd recommend you first try to calculate the size of your data rather than try to strictify things, see Johan Tibell's very useful posts:
http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
According to size in String I am expecting a maximum of 50 Mega. Profiling (after a painful 80 minutes) shows: total alloc = 20,350,382,592 bytes Way too much.
Once you know the size of your data - you can decide if it is too big to comfortably work with in memory. If it is too big you need to make sure you're are streaming[*] it rather than forcing it into memory.
If POSTags is large, I'd be very concerned about the top line of updateState - reversing lists (or sorting them) simply doesn't play well with streaming.
The zipper does quite a bit of reversing and appending. I also need to reverse lists to retain the order of the characters (text). I also do sorting but I have eliminated this in the tests. So my question: how can one "force" the reversing and append? Anyone? TIA, Hugo F.
[*] Even in a lazy language like Haskell, streaming data isn't necessarily automatic.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hello Hugo, Can you do a heap profile (+RTS -hT, or maybe use one of the other options if you've got a profiling copy lying around)? Try using smaller data if it's taking too long; usually the profile will still look the same, unless it's a particular type of input that is triggering bad behavior. There is not enough detail in your code for me to use my psychic debugging skills, unfortunately. Edward Excerpts from Hugo Ferreira's message of Wed Nov 30 09:23:53 -0500 2011:
Hello,
On 11/29/2011 10:57 PM, Stephen Tetley wrote:
Hi Hugo
What is a POSTags and how big do you expect it to be?
type Token = String type Tag = String
type NGramTag = (Token, Tag, Tag)
type POSTags = Z.Zipper NGramTag
Generally I'd recommend you first try to calculate the size of your data rather than try to strictify things, see Johan Tibell's very useful posts:
http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
According to size in String I am expecting a maximum of 50 Mega. Profiling (after a painful 80 minutes) shows:
total alloc = 20,350,382,592 bytes
Way too much.
Once you know the size of your data - you can decide if it is too big to comfortably work with in memory. If it is too big you need to make sure you're are streaming[*] it rather than forcing it into memory.
If POSTags is large, I'd be very concerned about the top line of updateState - reversing lists (or sorting them) simply doesn't play well with streaming.
The zipper does quite a bit of reversing and appending. I also need to reverse lists to retain the order of the characters (text). I also do sorting but I have eliminated this in the tests.
So my question: how can one "force" the reversing and append? Anyone?
TIA, Hugo F.
[*] Even in a lazy language like Haskell, streaming data isn't necessarily automatic.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hi Edward, On 12/01/2011 07:55 AM, Edward Z. Yang wrote:
Hello Hugo,
Can you do a heap profile (+RTS -hT, or maybe use one of the other options if you've got a profiling copy lying around)?
I have attached a profiling session (showing types). I am surprised to see that the "[]" consumes so much data. Where is this coming from? Need to analyse this more closely.
Try using smaller data if it's taking too long; usually the profile will still look the same, unless it's a particular type of input that is triggering bad behavior.
The case above is for test data that is about 1/5 of the original data.
There is not enough detail in your code for me to use my psychic debugging skills, unfortunately.
I have very little knowledge of Haskell in order to interpret this stuff correctly, even so I think we still need your "psychic debugging skills" B-) Any idea how I can track what's generating all those "[]" ? Note that the (,,) seems to be the NGramTag. data which is basically used as a list (Zipper). regards, Hugo F.
Edward
Excerpts from Hugo Ferreira's message of Wed Nov 30 09:23:53 -0500 2011:
Hello,
On 11/29/2011 10:57 PM, Stephen Tetley wrote:
Hi Hugo
What is a POSTags and how big do you expect it to be?
type Token = String type Tag = String
type NGramTag = (Token, Tag, Tag)
type POSTags = Z.Zipper NGramTag
Generally I'd recommend you first try to calculate the size of your data rather than try to strictify things, see Johan Tibell's very useful posts:
http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.ht... http://blog.johantibell.com/2011/06/computing-size-of-hashmap.html
According to size in String I am expecting a maximum of 50 Mega. Profiling (after a painful 80 minutes) shows:
total alloc = 20,350,382,592 bytes
Way too much.
Once you know the size of your data - you can decide if it is too big to comfortably work with in memory. If it is too big you need to make sure you're are streaming[*] it rather than forcing it into memory.
If POSTags is large, I'd be very concerned about the top line of updateState - reversing lists (or sorting them) simply doesn't play well with streaming.
The zipper does quite a bit of reversing and appending. I also need to reverse lists to retain the order of the characters (text). I also do sorting but I have eliminated this in the tests.
So my question: how can one "force" the reversing and append? Anyone?
TIA, Hugo F.
[*] Even in a lazy language like Haskell, streaming data isn't necessarily automatic.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Excerpts from Hugo Ferreira's message of Fri Dec 02 05:57:00 -0500 2011:
I have attached a profiling session (showing types). I am surprised to see that the "[]" consumes so much data. Where is this coming from? Need to analyse this more closely.
For an -hT profile, what that actually means is your lists are using lots of memory.
Any idea how I can track what's generating all those "[]" ? Note that the (,,) seems to be the NGramTag. data which is basically used as a list (Zipper).
For that, I recommend rebuilding with profiling and use the RTS flag -hc. For more details on how to profile programs like this, check out: http://blog.ezyang.com/2011/06/pinpointing-space-leaks-in-big-programs/ Edward

Hello, First and foremost thanks for the link Edward. I have read up your stuff. On 12/05/2011 06:29 AM, Edward Z. Yang wrote:
Excerpts from Hugo Ferreira's message of Fri Dec 02 05:57:00 -0500 2011:
I have attached a profiling session (showing types). I am surprised to see that the "[]" consumes so much data. Where is this coming from? Need to analyse this more closely.
For an -hT profile, what that actually means is your lists are using lots of memory.
Funny enough I cannot get this option to work. All the other -h options work fine though.
Any idea how I can track what's generating all those "[]" ? Note that the (,,) seems to be the NGramTag. data which is basically used as a list (Zipper).
For that, I recommend rebuilding with profiling and use the RTS flag -hc.
Ok, so I ran this and as follows: time nice -n 19 ./postagger +RTS -p -hc -L50 &> tmp19.txt hp2ps -e8in -c postagger.hp Now I see that "rsplit_" seems to be the culprit for the initial peaks in memory use. However I also see in the profile that this function seems to be responsible for only a small amount of memory generated. Why such a big discrepancy between the live heap and the profile's total memory? Another question is, how can I cange the code below to avoid such a peak? I already added ! to no avail. rsplit :: Eq a => a -> [a] -> ([a], [a]) rsplit sep l = let (ps, xs, _) = rsplit_ sep l in (ps, xs) rsplit_ :: Eq a => a -> [a] -> ([a], [a], Bool) rsplit_ sep l = foldr (splitFun sep) ([], [], False) l where splitFun _ e !a@(!px, !xs, True) = (e:px, xs, True) splitFun sep e !a@(!px, !xs, False) | e == sep = (px, xs, True) | otherwise = (px, e:xs, False) toTrainingInstance' :: String -> NGramTag toTrainingInstance' s = let (token, tag) = rsplit '/' s in (token, tag, "") toTrainingCorpus s = let (token, tag) = rsplit '/' s in (token, tag, "") evalTaggers' _ = do h <- IO.openFile "brown-pos-train.txt" IO.ReadMode c <- IO.hGetContents h let train = toTrainingInstances $ map toTrainingInstance' $ words c .... i <- IO.openFile "brown-pos-test.txt" IO.ReadMode d <- IO.hGetContents i let test = Z.fromList $ map toTrainingCorpus $ words d ... Anyone see an obvious change that needs to be made? TIA, Hugo F.
participants (3)
-
Edward Z. Yang
-
Hugo Ferreira
-
Stephen Tetley