
Hello Andrea, Friday, October 20, 2006, 2:23:16 PM, you wrote:
well, you gave me a wonderfully clear introduction to Haskell GC, and now I have a better understanding of the output of the various profiling I'm doing. Thank you very much!
i had the same problems too :)
Still, I cannot understand my specific problem, that is to say, why the function that reads a file retains so much memory.
I did some test and the results are puzzling: - I tried reading the feed and directly converting it into the opml chunk to be inserted into the opml component of my StateT monad. The problem becomes far worse. Here the output of a heap profile: http://gorgias.mine.nu/haskell/a.out.feed2opml.ps as you can see, after opening one feed (397868 bytes), closing it, opening another one (410052 bytes), closing it and reopening the first one brings memory consumption to 152 Mega.
first, GC don't occurs automatically when you close file. you can help GHC by using performGC from System.Mem. i does it in my own prog second, each Char in ghc occupies 12 bytes (!), so each of your files occupies about 5 mb of memory. if you will count the previous problem, the 2 or 3 files can be hels in memory at the same time (just because they was not yet GCd) so memory usage may become, say, 10 mb multiplying this at 2.5 or even 3 factor which i described in previous letter means, say, 30 mb used and then the only description i've found why 150 megs are used it's because multiple copies of the same data is hold in different forms - i.e. first copy is original file contents as one large string, second is contents splitted to lines, third is internal feed format, so on
Using the intermediate datatype (that is to say, reading the feed, transforming it into my datatype and then to the opml tree), reduces the problem: http://gorgias.mine.nu/haskell/a.out.feed2feedNotStrict.ps only 92 Mega of memory consumption for the very same operations.
Making the intermediate datatype strict gives almost the same results: http://gorgias.mine.nu/haskell/a.out.feed2feedStrict.ps 98 Mega.
using +RTS -c option and performGC after building opml tree and after closing feed should help you
Now, I come to believe the file reading is indeed strict, and that my problem could be related to StateT laziness.
Does this makes sense?
i'm not sure but guess that all monad transformers are strict. hope that someone else will clear this point
I'm now going to try to implement my opml state as a IORef and use a ReaderT monad to see if something new happens.
ps: if your program uses a lot if string, FPS will be a very great. it don;t change the GC behavior, just makes everything 10 times smaller :)
yes, but I'm using HXT and this is using normal strings to store xml text nodes. So I could have some improvements with IO but not that much in memory consumption, unless I totally change my implementation.
ask library authors :)
Anyway, even if I could reduce from 152 to 15 mega the memory consumption for reading 2 feeds, I'd be running out of memory, on my laptop, in one day instead that 5 minutes. Anyway I should face the fact that it is not the string implementation in Haskell that is causing the problem. The problem is probably me!
it is the world where we live :) yesterday i need to close Windows Task Manager because it was running for 2 weeks and its memory usage grown to 100 megs! :) but things is not so bad. with +RTS -c switch your program will reach some maximum memory usage and it will not grow further. or you can use alternatively +RTS -F2 switch - it will be faster that -c, require more memory and will not suffer from one GHC bug plus, running performGC at right points (right when you have a lot of garbage) should substantially decrease this maximum try it and please write me about results -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com