
Don Stewart wrote:
tphyahoo:
I'm trying to run a HAppS web site with a large amount of data: stress testing happstutorial.com. Well, 20 million records doesn't sound that large by today's standards, but anyway that's my goal for now. I have a standard Data.Map.Map as the base structure for one of my macid data tables (jobs), but I noticed something that is probably causing problems for me. Even a simple 20 million record with int/int key values causes an out of memory error for me in ghci,
Int keys, Int values eh?
Does using IntMap help?
Interesting. Map Int Int, IntMap Int and [(Int, Int)] use pretty much the same amount of memory, assuming that no keys or values are shared: Map Int Int: data Map k a = Tip | Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a) - all 'Tip's are shared. - For each (key, value) pair, there is one 'Bin', needing tag[*] (1 word) size (unpacked, 1 word) key, value (pointer, tag, value = 3 words each) two more pointers (1 word each) for a total of 10 words per element. IntMap Int: data IntMap a = Nil | Tip {-# UNPACK #-} !Key a | Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) - one 'Tip' per element: tag (1 word) + key (1 word) + value (3 words) - one 'Bin' per element (minus 1): tag (1 word) + prefix, mask (1 word each) + 2 pointers (1 word each) for a total of 10 words per element. [(Int, Int)]: - one '(:)' per element: - tag (1 word) + 2 pointers (1 word each) - one '(Int, Int)' per element: - tag (1 word) + key (3 words) + value (3 words) for a total of 10 words per element, again. Now if we want to save memory, we can specialise Data.Map to Int keys and values, saving 4 words per element, and encode the external leaves (Tips) in the constructor, saving another word: data IntIntMap = Tip | BinBB !Size !Int !Int IntIntMap IntIntMap | BinBT !Size !Int !Int IntIntMap | BinTB !Size !Int !Int IntIntMap | BinTT !Size !Int !Int -- sprinkle {-# UNPACK #-} as needed That's 5 words per elements. Would that be worthwhile? Bertram [*] actually, the info pointer in the Spineless, Tagless G-machine.