Proposal: add laziness to Data.Map / IntMap

5 Aug 2008

      ...
Adrian Hey wrote:
Using explicit seqs rather than strict data types is actually faster,
for reasons that are a bit of a mystery to me. I'm not sure what cost
Bertram is talking about, but AFAIK ghc uses the same info pointer
mechanism for all heap records, including unevaluated thunks (although
the info pointers will point to different things of course). But the
cost of pattern matching on *evaluated* AVL nodes should be independent
of strictness annotation AFAICS.
Thanks for chiming in Adrian. Just to get started I removed the strictness
annotations from the Data.Map Bin constructor, made no other changes, and
ran a
silly benchmark (at the end of this email). The version without bangs is
actually faster than the version currently shipping. I get about 10.5 sec
for
the lazy version and 11.5 sec for the strict version (2.1Ghz Intel Core)

I'll repeat that in bold for people just skimming this thread:

__Removing Strictness Annotations Makes It Go Faster__

The reason I think is that the helper functions bin, join and balance
already
provide just enough strictness, as they need to inspect the size field. The
strictness analyzer can then do its job. The case for IntMap is tricker,
as there is no implicit strictness in the code so removing the bangs causes
stack overflows. Still working on that one.

Here are the benchmarks. The lazy version also evaluates "keySum dmap"
slightly
faster (repeated inserts) and its a tie for "keySum smap" (sequential
inserts).
I admit this benchmark is goofy, if you have a better one please share.

Scott

import qualified Data.Map as Map
import Data.List as List

n = 1000000
rkeys = [ (i*122789) `mod` 1006471 | i<-[0..] ] :: [Int]
dkeys = map (`div`1000) rkeys :: [Int]
skeys = [0..] :: [Int]

shuffle (a:b:c:d:e:f:g:h:rest) = e:a:h:d:c:b:g:f: shuffle rest

keySum = List.foldl' (+) 0 . Map.keys

rmap = Map.fromList    . take n . shuffle $ rkeys `zip` [0..]
dmap = Map.fromList    . take n . shuffle $ dkeys `zip` [0..]
smap = Map.fromAscList . take n . shuffle $ skeys `zip` [0..]

mix (x:xs) (y:ys) = x : y : mix xs ys
mix _ [] = []
mix [] _ = []

rkeys2 = [ (i*122789) `mod` 1006471 | i<-[0..] ] :: [Int]
rlooks = [ (i*122819) `mod` 1006471 | i<-[0..] ] :: [Int]

rlook =
  List.foldl'
    (\k s -> case Map.lookup k rmap of Nothing -> s; Just x -> s+x)
    0 (take n $ rkeys2 `mix` drop 1000 rlooks)

main = print rlook  -- or print (keySum dmap) or whatever

Scott Dillard

Adrian Hey

tags

participants (2)