
Wait, do ByteStrings show up on a heap profile, if the space is allocated with malloc? Anyway, I think my tests still show that the memory used by the process doesn't grow simply by adding more data, if you are no longer added keys to the map. ----- Original Message -----
From: Brandon Moore
To: Aleksandar Dimitrov ; "haskell-cafe@haskell.org" Cc: Sent: Tuesday, May 31, 2011 1:43 PM Subject: Re: [Haskell-cafe] How on Earth Do You Reason about Space? I can't reproduce heap usage growing with the size of the input file.
I made a word list from Project Gutenberg's copy of "War and Peace" by
tr -sc '[[:alpha:]]' '\n' < pg2600.txt > words.txt
Using 1, 25, or 1000 repetitions of this ~3MB wordlist shows about 100MB of address space used according to top, and no more than 5MB or so of haskell heap used according to the memory profile, with a flat memory profile.
Is your memory usage growing with the size of the input file, or the size of the histogram?
I was worried data sharing might mean your keys retain entire 64K chunks of the input. However, it seems enumLines depends on the StringLike ByteString instance, which just converts to and from String. That can't be efficient, but I suppose it avoids excessive sharing.
The other thing that occurs to me is that the total size of your keys would also be approximately the size of the input file if you were using plain text without each word split onto a separate line.
Brandon