
Felipe Lessa wrote:
On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart
wrote: Looks like the Map reading/showing via association lists could do with further work.
Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
From binary-0.5:
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where put m = put (Map.size m) >> mapM_ put (Map.toAscList m) get = liftM Map.fromDistinctAscList get
instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get
Can't get better, I think.
We can improve it slightly (about 20% runtime in dons example [*]): instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where get = liftM (Map.fromDistinctAscList . map strictValue) get where strictValue (k,v) = (v `seq` k, v) The point is that Data.Map.Map is strict in the keys, but not in the values of the map. In the case of deserialisation this means the values will be thunks that hang on to the Daya.Binary buffer.
Now, from containers-0.2.0.0:
fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx _ -> error "fromDistinctAscList build" build c n xs' = seq nr $ build (buildR nr c) nl xs' where nl = n `div` 2 nr = n - nl - 1
buildR n c l ((k,x):ys) = build (buildB l k x c) n ys buildR _ _ _ [] = error "fromDistinctAscList buildR []" buildB l k x c r zs = c (bin k x l r) zs
The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting
fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Eliminating the 'length' call helps, too, improving runtime by another about 5%. The result is still a factor of 1.7 slower than reading the list of key/value pairs. Bertram [*] Notes on timings: 1) I used `rnf` for all timings, as in my previous mail. 2) I noticed that in my previous measurements, the GC time for the Data.Map tests was excessively large (70% and more), so I used +RTS -H32M this time. This resulted in a significant runtime improvement of about 30%. 3) Do your own measurements! Some code to play with is available here: http://int-e.home.tlink.de/haskell/MapTest.hs http://int-e.home.tlink.de/haskell/Map.hs