
On Thu, Jun 24, 2010 at 09:28:15AM -0400, Edward Kmett wrote:
On Thu, Jun 24, 2010 at 8:27 AM, Johan Tibell
wrote: The space overhead per key/value pair is 6 words (48 bytes on a 64-bit architecture) when using lazy values but only 4 words (32 bytes) per key/value pair when using strict (unpacked) values, a 50% difference. This really starts to matter with big enough data sets (as seen in the recent Twitter analysis thread). When work with Big Data it's often desirable to fit as much data in RAM as possible as the result of many algorithms (think machine learning or search ranking) differs with the amount of data you can hold in memory.
Something to consider.
I definitely agree that unboxing can help a great deal with performance and space utilization.
However, as containers does not currently require any exotic extensions, I think that perhaps a type family -based generic map would belong in another 'unboxed-containers' or 'adaptive-containers' package (both of which currently exist on hackage), as it dramatically extends the language extension footprint of containers, taking it from something that easily runs across a wide array of Haskell implementations to something very ghc-specific.
If Map was strict in keys and values, we could have data Unstrict a = U a to unstrictify the values. I don't know if this a good solution, though :). Cheers, -- Felipe.