
Hello, First and foremost thanks for the link Edward. I have read up your stuff. On 12/05/2011 06:29 AM, Edward Z. Yang wrote:
Excerpts from Hugo Ferreira's message of Fri Dec 02 05:57:00 -0500 2011:
I have attached a profiling session (showing types). I am surprised to see that the "[]" consumes so much data. Where is this coming from? Need to analyse this more closely.
For an -hT profile, what that actually means is your lists are using lots of memory.
Funny enough I cannot get this option to work. All the other -h options work fine though.
Any idea how I can track what's generating all those "[]" ? Note that the (,,) seems to be the NGramTag. data which is basically used as a list (Zipper).
For that, I recommend rebuilding with profiling and use the RTS flag -hc.
Ok, so I ran this and as follows: time nice -n 19 ./postagger +RTS -p -hc -L50 &> tmp19.txt hp2ps -e8in -c postagger.hp Now I see that "rsplit_" seems to be the culprit for the initial peaks in memory use. However I also see in the profile that this function seems to be responsible for only a small amount of memory generated. Why such a big discrepancy between the live heap and the profile's total memory? Another question is, how can I cange the code below to avoid such a peak? I already added ! to no avail. rsplit :: Eq a => a -> [a] -> ([a], [a]) rsplit sep l = let (ps, xs, _) = rsplit_ sep l in (ps, xs) rsplit_ :: Eq a => a -> [a] -> ([a], [a], Bool) rsplit_ sep l = foldr (splitFun sep) ([], [], False) l where splitFun _ e !a@(!px, !xs, True) = (e:px, xs, True) splitFun sep e !a@(!px, !xs, False) | e == sep = (px, xs, True) | otherwise = (px, e:xs, False) toTrainingInstance' :: String -> NGramTag toTrainingInstance' s = let (token, tag) = rsplit '/' s in (token, tag, "") toTrainingCorpus s = let (token, tag) = rsplit '/' s in (token, tag, "") evalTaggers' _ = do h <- IO.openFile "brown-pos-train.txt" IO.ReadMode c <- IO.hGetContents h let train = toTrainingInstances $ map toTrainingInstance' $ words c .... i <- IO.openFile "brown-pos-test.txt" IO.ReadMode d <- IO.hGetContents i let test = Z.fromList $ map toTrainingCorpus $ words d ... Anyone see an obvious change that needs to be made? TIA, Hugo F.