Re: [Haskell-cafe] Trying to reduce memory costs of String duplicates - Oh my god it works

Hi Luke, Brian,
oh my god it works!
I'm just saying it because that's a first!
Late last night I had already suspected that laziness might be one of the
reasons why the memoization technique showed no effect on memory
consumption and after Brian's email that's exactly where I tried again,
using Bang Patterns.
IT WORKED!
The memory consumption is now down by 20 fold!
Thank you guys very very much!
Günther
Am 06.09.2009, 03:24 Uhr, schrieb Brian Sniffen
Remember that each string will still be read, so you may still allocate a great deal of memory and generate a great deal of garbage. You will just release it before long, rather than hold it for the life of your program. Remember also that the memoization itself is lazy: it does no good if you read all the file in, then force the memoizations along with the computation of interest.
-Brian
2009/9/5 Günther Schmidt
: Hi Luke,
thanks, this is some very good advice as I find many duplicates in the data I have to iterate over.
However so far I'm unable to tell wether this actually works or not, I tried it a couple of times under different settings but it showed to difference in memory consumption. The same mem peeks as before.
Do you have some code that where you could see a before and after?
Günther
Am 05.09.2009, 19:38 Uhr, schrieb Luke Palmer
: 2009/9/5 Günther Schmidt
: Hi all,
I'm reading in a data of 216k records into a map of Key, Values Pairs, the values being strings.
As it happens out of 216k String values there really are only about 6.6k distinct string values, so I could save a lot of RAM if I was able to "insert" only actually *new* string values into the map and use references to (string) values that already are in memory instead.
Is there a container that would, if I wanted to insert an element, return a pair of either the previously inserted, equal value and the container unchanged, or the new, previously unknown value and the new container amended by that element?
I believe a memoization of the identity function will do what you want:
import qualified Data.MemoCombinators as Memo
share = Memo.list Memo.char id
Then pass any string through share to make/get a cached version.
You might want to limit the scope of share -- eg. put it in a where clause for the function where you're using it -- so that it doesn't eat memory for the lifetime of your program, only for when you need it.
Luke
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (1)
-
Günther Schmidt