
felipe.lessa:
On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart
wrote: Looks like the Map reading/showing via association lists could do with further work.
Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
From binary-0.5:
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where put m = put (Map.size m) >> mapM_ put (Map.toAscList m) get = liftM Map.fromDistinctAscList get
instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get
Can't get better, I think. Now, from containers-0.2.0.0:
fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx _ -> error "fromDistinctAscList build" build c n xs' = seq nr $ build (buildR nr c) nl xs' where nl = n `div` 2 nr = n - nl - 1
buildR n c l ((k,x):ys) = build (buildB l k x c) n ys buildR _ _ _ [] = error "fromDistinctAscList buildR []" buildB l k x c r zs = c (bin k x l r) zs
The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting
fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas?
This idea was the motivation for the new Seq instance, which uses internals to build quickly. Encoding to disk, the dictionary, $ time ./binary /usr/share/dict/cracklib-small "done" ./binary /usr/share/dict/cracklib-small 0.07s user 0.01s system 94% cpu 0.088 total Decoding, $ time ./binary dict.gz 52848 "done" ./binary dict.gz 0.07s user 0.01s system 97% cpu 0.079 total instance (Binary e) => Binary (Seq.Seq e) where put s = put (Seq.length s) >> Fold.mapM_ put s get = do n <- get :: Get Int rep Seq.empty n get where rep xs 0 _ = return $! xs rep xs n g = xs `seq` n `seq` do x <- g rep (xs Seq.|> x) (n-1) g Just a lot better. :) So ... Data.Map, we're looking at you! -- Don