
Günther Schmidt wrote:
Hi all,
I'm reading in a data of 216k records into a map of Key, Values Pairs, the values being strings.
As it happens out of 216k String values there really are only about 6.6k distinct string values, so I could save a lot of RAM if I was able to "insert" only actually *new* string values into the map and use references to (string) values that already are in memory instead.
Is there a container that would, if I wanted to insert an element, return a pair of either the previously inserted, equal value and the container unchanged, or the new, previously unknown value and the new container amended by that element?
If by "strings" you allow ByteStrings, then you could use the bytestring-trie package[1]. This will be more worthwhile than other approaches if your 6.6k strings have a lot of repeated prefixes, since repeated prefixes will be shared among the unique strings. Something like the following should work: intern :: ByteString -> Trie ByteString -> (ByteString, Trie ByteString) intern s t = (fromJust (lookup s t'), t') where t' = alterBy (\_ _ -> maybe (Just s) Just) s s t [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie -- Live well, ~wren