Data.Binary poor read performance

Hi, In an application I'm writing with Data.Binary I'm seeing very fast write performance (instant), but much slower read performance. Can you advise where I might be going wrong? The data type I'm serialising is roughly: Map String [Either (String,[String]) [(String,Int)]] A lot of the String's are likely to be identical, and the end file size is 1Mb. Time taken with ghc -O2 is 0.4 seconds. Various questions/thoughts I've had: 1) Is reading a lot slower than writing by necessity? 2) The storage for String seems to be raw strings, which is nice. Would I get a substantial speedup by moving to bytestrings instead of strings? If I hashed the strings and stored common ones in a hash table is it likely to be a big win? 3) How long might you expect 1Mb to take to read? Thanks for the library, its miles faster than the Read/Show I was using before - but I'm still hoping that reading 1Mb of data can be instant :-) Thanks Neil

ndmitchell:
Hi,
In an application I'm writing with Data.Binary I'm seeing very fast write performance (instant), but much slower read performance. Can you advise where I might be going wrong?
Can you try binary 0.5 , just released 20 mins ago? There was definitely some slow downs due to inlining that I've mostly fixed in this release.
The data type I'm serialising is roughly: Map String [Either (String,[String]) [(String,Int)]]
A lot of the String's are likely to be identical, and the end file size is 1Mb. Time taken with ghc -O2 is 0.4 seconds.
Map serialisation was sub-optimal. That's been improved today's release.
Various questions/thoughts I've had:
1) Is reading a lot slower than writing by necessity?
Nope. Shouldn't be.
2) The storage for String seems to be raw strings, which is nice. Would I get a substantial speedup by moving to bytestrings instead of strings? If I hashed the strings and stored common ones in a hash table is it likely to be a big win?
Yep and maybe.
3) How long might you expect 1Mb to take to read?
Thanks for the library, its miles faster than the Read/Show I was using before - but I'm still hoping that reading 1Mb of data can be instant :-)
Tiny fractions of a second. $ cat A.hs import qualified Data.ByteString as B import System.Environment main = do [f] <- getArgs print . B.length =<< B.readFile f $ du -hs /usr/share/dict/cracklib-small 472K /usr/share/dict/cracklib-small $ time ./A /usr/share/dict/cracklib-small 477023 ./A /usr/share/dict/cracklib-small 0.00s user 0.01s system 122% cpu 0.005 total If you're not seeing results like that, with binary 0.5, let's look deeper. -- Don

Neil Mitchell wrote:
2) The storage for String seems to be raw strings, which is nice. Would I get a substantial speedup by moving to bytestrings instead of strings? If I hashed the strings and stored common ones in a hash table is it likely to be a big win?
Bytestrings should help. The big wins in this application are likely to be cache issues, though the improved memory/GC overhead is nice too. If you have many identical strings then you will save lots by memoizing your strings into Integers, and then serializing that memo table and the integerized version of your data structure. The amount of savings decreases as the number of duplications decrease, though since you don't need the memo table itself you should be able to serialize it in a way that doesn't have much overhead. A refinement on memoization for when you have many partial matches but few total matches, is to chunk the strings into a series of partial matches, which are then integerized. The trick is deciding on how big to make your chunks (which is to say, optimizing the reuse ratio). If you want an industrial solution like this, it may be worth looking into Haskell bindings for SRILM or IRSTLM. Depending on how rough your type signature was, you may also consider using bytestring-trie to replace the Map String portion. For the keys of your map this will give the bytestring optimization as well as prefix sharing. You could also replace the [String] with Trie Pos where Pos gives the position in the list, or Trie() if order is irrelevant. And you could replace [(String,Int)] with Trie (Map Pos Int) or similar depending on whether position is important (thus Trie (Set Int)) or whether you can avoid conflicting Ints for the same String (thus Trie Int). Without knowing more I'm not sure how well Trie [Either (String, Trie Pos) (Trie (Map Pos Int))] will match your use case, but it's worth considering. The one thing of note is that Trie uses patricia trees like IntMap does rather than using balanced trees like Map does. Again, I'm not sure whether the differences will matter for your use case. -- Live well, ~wren

wren:
Neil Mitchell wrote:
2) The storage for String seems to be raw strings, which is nice. Would I get a substantial speedup by moving to bytestrings instead of strings? If I hashed the strings and stored common ones in a hash table is it likely to be a big win?
Bytestrings should help. The big wins in this application are likely to be cache issues, though the improved memory/GC overhead is nice too.
Here's a quick demo using Data.Binary directly. First, let's read in the dictionary file, and build a big, worst-case finite map of words to their occurence (always 1). import Data.Binary import Data.List import qualified Data.ByteString.Char8 as B import System.Environment import qualified Data.Map as M main = do [f] <- getArgs s <- B.readFile f let m = M.fromList [ (head n, length n) | n <- (group . B.lines $ s) ] encodeFile "dict" m print "done" So that writes a "dict" file with a binary encoded Map ByteString Int. Using ghc -O2 --make for everying. $ time ./A /usr/share/dict/cracklib-small "done" ./A /usr/share/dict/cracklib-small 0.28s user 0.03s system 94% cpu 0.331 total Yields a dictionary file Map: $ du -hs dict 1.3M dict Now, let's read back in and decode it back to a Map main = do [f] <- getArgs m <- decodeFile f print (M.size (m :: M.Map B.ByteString Int)) print "done" Easy enough: $ time ./A dict +RTS -K20M 52848 "done" ./A dict +RTS -K20M 1.51s user 0.06s system 99% cpu 1.582 total Ok. So 1.5s to decode a 1.3M Map. There may be better ways to build the Map since we know the input will be sorted, but the Data.Binary instance can't do that. Since decode/encode are a nice pure function on lazy bytestrings, we can try a trick of compressing/decompressing the dictionary in memory. Compressing the dictionary: import Data.Binary import Data.List import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy as L import System.Environment import qualified Data.Map as M import Codec.Compression.GZip main = do [f] <- getArgs s <- B.readFile f let m = M.fromList [ (head n, length n) | n <- (group . B.lines $ s) ] L.writeFile "dict.gz" . compress . encode $ m print "done" Pretty cool, imo, is "compress . encode": $ time ./A /usr/share/dict/cracklib-small "done" ./A /usr/share/dict/cracklib-small 0.38s user 0.02s system 85% cpu 0.470 total Ok. So building a compressed dictionary takes only a bit longer than uncompressed one (zlib is fast), $ du -hs dict.gz 216K dict.gz Compressed dictionary is much smaller. Let's load it back in and unpickle it: main = do [f] <- getArgs m <- (decode . decompress) `fmap` L.readFile f print (M.size (m :: M.Map B.ByteString Int)) print "done" Also cute. But how does it run: $ time ./A dict.gz 52848 "done" ./A dict.gz 0.28s user 0.03s system 98% cpu 0.310 total Interesting. So extracting the Map from a compressed bytestring in memory is a fair bit faster than loading it directly, uncompressed from disk. Neil, does that give you some ballpark figures to work with? -- Don

On Mon, 2009-02-23 at 17:03 -0800, Don Stewart wrote:
Here's a quick demo using Data.Binary directly.
[...]
$ time ./A dict +RTS -K20M 52848 "done" ./A dict +RTS -K20M 1.51s user 0.06s system 99% cpu 1.582 total
Ok. So 1.5s to decode a 1.3M Map. There may be better ways to build the Map since we know the input will be sorted, but the Data.Binary instance can't do that.
[...]
$ time ./A dict.gz 52848 "done" ./A dict.gz 0.28s user 0.03s system 98% cpu 0.310 total
Interesting. So extracting the Map from a compressed bytestring in memory is a fair bit faster than loading it directly, uncompressed from disk.
That's actually rather surprising. The system time is negligible and the difference between total and user time does not leave much for time wasted doing i/o. So that's a real difference in user time. So what is going on? We're doing the same amount of binary decoding in each right? We're also allocating the same number of buffers, in fact slightly more in the case that uses compression. The time taken to cat a meg through an Handle using lazy bytestring is nothing. So where is all that time going? Duncan

Hello Duncan, Tuesday, February 24, 2009, 5:13:05 AM, you wrote:
That's actually rather surprising. The system time is negligible and the
yes, this looks highly suspicious. we use 1.5 seconds for 1.3 mb file, and 0.3 seconds for 0.3 mb file, so it looks like readFile monopolize cpu here, being able to read only 1mb per second. decoding into list of strings (instead of building map) may help to check this assumption -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

dons:
wren:
Neil Mitchell wrote:
2) The storage for String seems to be raw strings, which is nice. Would I get a substantial speedup by moving to bytestrings instead of strings? If I hashed the strings and stored common ones in a hash table is it likely to be a big win?
Bytestrings should help. The big wins in this application are likely to be cache issues, though the improved memory/GC overhead is nice too.
Here's a quick demo using Data.Binary directly.
Now, let's read back in and decode it back to a Map
main = do [f] <- getArgs m <- decodeFile f print (M.size (m :: M.Map B.ByteString Int)) print "done"
Easy enough:
$ time ./A dict +RTS -K20M 52848 "done" ./A dict +RTS -K20M 1.51s user 0.06s system 99% cpu 1.582 total
Compressed dictionary is much smaller. Let's load it back in and unpickle it:
main = do [f] <- getArgs m <- (decode . decompress) `fmap` L.readFile f print (M.size (m :: M.Map B.ByteString Int)) print "done"
Also cute. But how does it run:
$ time ./A dict.gz 52848 "done" ./A dict.gz 0.28s user 0.03s system 98% cpu 0.310 total
Interesting. So extracting the Map from a compressed bytestring in memory is a fair bit faster than loading it directly, uncompressed from disk.
Note the difference, as Duncan and Bulat pointed out, is a bit surprising. Perhaps the Map instance is a bit weird? We already know that bytestring IO is fine. Just serialising straight lists of pairs, import Data.Binary import Data.List import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Lazy as L import System.Environment import qualified Data.Map as M import Codec.Compression.GZip main = do [f] <- getArgs s <- B.readFile f let m = [ (head n, length n) | n <- (group . B.lines $ s) ] L.writeFile "dict.gz" . encode $ m print "done" $ time ./binary /usr/share/dict/cracklib-small "done" ./binary /usr/share/dict/cracklib-small 0.13s user 0.04s system 99% cpu 0.173 total $ du -hs dict 1.3M dict And reading them back in, main = do [f] <- getArgs m <- decode `fmap` L.readFile f print (length (m :: [(B.ByteString,Int)])) print "done" $ time ./binary dict 52848 "done" ./binary dict 0.04s user 0.01s system 99% cpu 0.047 total Is fast. So there's some complication in the Map serialisation. Adding in zlib, to check, main = do [f] <- getArgs s <- B.readFile f let m = [ (head n, length n) | n <- (group . B.lines $ s) ] L.writeFile "dict.gz" . compress . encode $ m print "done" $ time ./binary /usr/share/dict/cracklib-small "done" ./binary /usr/share/dict/cracklib-small 0.25s user 0.03s system 100% cpu 0.277 total Compression takes longer, as expected, and reading it back in, main = do [f] <- getArgs m <- (decode . decompress) `fmap` L.readFile f print (length (m :: [(B.ByteString,Int)])) print "done" $ time ./binary dict.gz 52848 "done" ./binary dict.gz 0.03s user 0.01s system 98% cpu 0.040 total About the same. Looks like the Map reading/showing via association lists could do with further work. Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?). -- Don

On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart
Looks like the Map reading/showing via association lists could do with further work.
Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
From binary-0.5:
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where put m = put (Map.size m) >> mapM_ put (Map.toAscList m) get = liftM Map.fromDistinctAscList get instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get Can't get better, I think. Now, from containers-0.2.0.0: fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx _ -> error "fromDistinctAscList build" build c n xs' = seq nr $ build (buildR nr c) nl xs' where nl = n `div` 2 nr = n - nl - 1 buildR n c l ((k,x):ys) = build (buildB l k x c) n ys buildR _ _ _ [] = error "fromDistinctAscList buildR []" buildB l k x c r zs = c (bin k x l r) zs The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas? -- Felipe.

felipe.lessa:
On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart
wrote: Looks like the Map reading/showing via association lists could do with further work.
Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
From binary-0.5:
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where put m = put (Map.size m) >> mapM_ put (Map.toAscList m) get = liftM Map.fromDistinctAscList get
instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get
Can't get better, I think. Now, from containers-0.2.0.0:
fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx _ -> error "fromDistinctAscList build" build c n xs' = seq nr $ build (buildR nr c) nl xs' where nl = n `div` 2 nr = n - nl - 1
buildR n c l ((k,x):ys) = build (buildB l k x c) n ys buildR _ _ _ [] = error "fromDistinctAscList buildR []" buildB l k x c r zs = c (bin k x l r) zs
The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting
fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas?
This idea was the motivation for the new Seq instance, which uses internals to build quickly. Encoding to disk, the dictionary, $ time ./binary /usr/share/dict/cracklib-small "done" ./binary /usr/share/dict/cracklib-small 0.07s user 0.01s system 94% cpu 0.088 total Decoding, $ time ./binary dict.gz 52848 "done" ./binary dict.gz 0.07s user 0.01s system 97% cpu 0.079 total instance (Binary e) => Binary (Seq.Seq e) where put s = put (Seq.length s) >> Fold.mapM_ put s get = do n <- get :: Get Int rep Seq.empty n get where rep xs 0 _ = return $! xs rep xs n g = xs `seq` n `seq` do x <- g rep (xs Seq.|> x) (n-1) g Just a lot better. :) So ... Data.Map, we're looking at you! -- Don

On Tue, Feb 24, 2009 at 5:36 AM, Don Stewart
This idea was the motivation for the new Seq instance, which uses internals to build quickly.
The problem is that fromDistinctAscList is the best we can do right now. We don't have something like (|>) that runs in O(1) time, and trying to insert each element would give O(n log n) instead of O(n). In fact, we don't even know if length is the culprit, although I'm highly suspicious of it. Maybe there should be Data.Map.Internal like Data.ByteString.Internal so we can mess with the datatypes directly but without strong API compatibility guarantees? -- Felipe.

Hello,
On Tue, Feb 24, 2009 at 2:36 AM, Don Stewart
This idea was the motivation for the new Seq instance, which uses internals to build quickly.
Encoding to disk, the dictionary,
$ time ./binary /usr/share/dict/cracklib-small "done" ./binary /usr/share/dict/cracklib-small 0.07s user 0.01s system 94% cpu 0.088 total
Decoding, $ time ./binary dict.gz 52848 "done" ./binary dict.gz 0.07s user 0.01s system 97% cpu 0.079 total
instance (Binary e) => Binary (Seq.Seq e) where put s = put (Seq.length s) >> Fold.mapM_ put s get = do n <- get :: Get Int rep Seq.empty n get where rep xs 0 _ = return $! xs rep xs n g = xs `seq` n `seq` do x <- g rep (xs Seq.|> x) (n-1) g
Just a lot better. :)
So ... Data.Map, we're looking at you!
Indeed, that was the motivation for writing the patch for Seq. [Ross, thank you again for the help.] I had performance issues with lists, but noticed that switching to Sequence wasn't helping at all. This new definition takes advantage of the features that Seq has and List doesn't. Regarding Map, I like Felipe's idea of having a separate Internal. I know that Lemmih has a few other implementations of Map (compactMap, BerkeleyDB). If I remember correctly, he made BerkeleyDB an instance of Binary. That can probably give us some insight too. Paulo

Felipe Lessa wrote:
The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting
fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Excellent idea, what does stop you? fromDistinctAscList is already a function with a precondition that is not checked.
Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas?
One really doesn't want to see the actual implementation (except for debugging or tuning purposes), but maybe conversions to and from a fully (and uniquely) balanced binary tree are desirable. (a basic binary tree data type is still missing on hackage). I think, equal maps should not yield different serialization results just because the internal tree structure happens to be different, but that's of course debatable. Cheers Christian Maybe the discussion should be continued on libraries@haskell.org?

Hello Felipe, Tuesday, February 24, 2009, 11:24:19 AM, you wrote:
Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas?
just make a copy of its implementation to test btw, i always thought that it should be a way to overcome any export lists and go directly to module internals. limiting export is the way to protect programmer from errors, not security feature, and it should be left to programmer to decide when he don't need it. compilers should just be able to check whether some module/library imported abstractly or with internals too. this will render useless all those .Internals modules that now we need to add everywhere -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

2009/2/24 Bulat Ziganshin
Hello Felipe,
Tuesday, February 24, 2009, 11:24:19 AM, you wrote:
Too bad 'Map' is exported as an abstract data type and it's not straighforward to test this conjecture. Any ideas?
just make a copy of its implementation to test
btw, i always thought that it should be a way to overcome any export lists and go directly to module internals. limiting export is the way to protect programmer from errors, not security feature, and it should be left to programmer to decide when he don't need it. compilers should just be able to check whether some module/library imported abstractly or with internals too. this will render useless all those .Internals modules that now we need to add everywhere
I agree in principle, but GHC also uses that knowledge to optimize the code better - if a function is exported it has to produce the most polymorphic possible code for its type, if it isn't it can specialize better... that sort of thing. So it's not for security purposes, it's for technical reasons; you can't override the export list externally because the information you'd need to use the functions simply doesn't exist.

Hello Svein, Tuesday, February 24, 2009, 3:47:44 PM, you wrote:
btw, i always thought that it should be a way to overcome any export lists and go directly to module internals. limiting export is the way to protect programmer from errors, not security feature, and it should be left to programmer to decide when he don't need it. compilers should just be able to check whether some module/library imported abstractly or with internals too. this will render useless all those .Internals modules that now we need to add everywhere
I agree in principle, but GHC also uses that knowledge to optimize the code better - if a function is exported it has to produce the most polymorphic possible code for its type, if it isn't it can specialize better... that sort of thing.
So it's not for security purposes, it's for technical reasons; you can't override the export list externally because the information you'd need to use the functions simply doesn't exist.
well, obvious answer is that ghc should optimize according to export specs AND add original function definition to the .o file -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

btw, i always thought that it should be a way to overcome any export lists and go directly to module internals. limiting export is the way to protect programmer from errors, not security feature, and it should be left to programmer to decide when he don't need it. compilers should just be able to check whether some module/library imported abstractly or with internals too. this will render useless all those .Internals modules that now we need to add everywhere
You're not alone!-) This has been called "Open Implementation" (OI, a pre-cursor of aspect-oriented programming): http://www2.parc.com/csl/groups/sda/projects/oi/ They argue for an explicit auxiliary interface instead of full access to module internals. Since these same folks worked on meta-object protocols as well (at the meta-level, the boundaries can be bypassed entirely), that suggestion probably comes from experience. They do allow for the auxiliary interface to use meta-programming style features, though that depends on the language in question (in Haskell, type classes or type functions might be used instead, but rewrite rules and Template Haskell are also available). Open Implementation Design Guidelines http://www2.parc.com/csl/groups/sda/publications/papers/Kiczales-ICSE97/ is a short paper discussing a Set API/Open Implementation example.
I agree in principle, but GHC also uses that knowledge to optimize the code better - if a function is exported it has to produce the most polymorphic possible code for its type, if it isn't it can specialize better... that sort of thing.
That refers to optimization in the provider module. As the OI people argued, optimization in the client modules also needs to be taken into account. If the default one-size-fits-all-uses implementation behind the default API doesn't work well enough, there'll be a proliferation of library variants. If there is a way to fine-tune the implementation via an auxiliary API, much of that can be avoided. In other words, instead of half a dozen Maps and a dozen Array variants, there'd just be one of each, but with auxiliary interfaces that would allow client code to choose and tune the most suitable implementations behind the main interfaces. It isn't magic, though: if, say, even the auxiliary API doesn't allow you to say "thanks, but I know the length already", you're still stuck. But it does help to think about designing 2 APIs (the default functionality one and the auxiliary fine-tuning one) instead of offering only the choice of 1 API or full access to internals. Claus

Felipe Lessa wrote:
On Tue, Feb 24, 2009 at 4:59 AM, Don Stewart
wrote: Looks like the Map reading/showing via association lists could do with further work.
Anyone want to dig around in the Map instance? (There's also some patches for an alternative lazy Map serialisation, if people are keen to load maps -- happstack devs?).
From binary-0.5:
instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where put m = put (Map.size m) >> mapM_ put (Map.toAscList m) get = liftM Map.fromDistinctAscList get
instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get
Can't get better, I think.
We can improve it slightly (about 20% runtime in dons example [*]): instance (Ord k, Binary k, Binary e) => Binary (Map.Map k e) where get = liftM (Map.fromDistinctAscList . map strictValue) get where strictValue (k,v) = (v `seq` k, v) The point is that Data.Map.Map is strict in the keys, but not in the values of the map. In the case of deserialisation this means the values will be thunks that hang on to the Daya.Binary buffer.
Now, from containers-0.2.0.0:
fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx _ -> error "fromDistinctAscList build" build c n xs' = seq nr $ build (buildR nr c) nl xs' where nl = n `div` 2 nr = n - nl - 1
buildR n c l ((k,x):ys) = build (buildB l k x c) n ys buildR _ _ _ [] = error "fromDistinctAscList buildR []" buildB l k x c r zs = c (bin k x l r) zs
The builds seem fine, but we spot a (length xs) on the beginning. Maybe this is the culprit? We already know the size of the map (it was serialized), so it is just a matter of exporting
fromDistinctAscSizedList :: Int -> [(k, a)] -> Map k a
Eliminating the 'length' call helps, too, improving runtime by another about 5%. The result is still a factor of 1.7 slower than reading the list of key/value pairs. Bertram [*] Notes on timings: 1) I used `rnf` for all timings, as in my previous mail. 2) I noticed that in my previous measurements, the GC time for the Data.Map tests was excessively large (70% and more), so I used +RTS -H32M this time. This resulted in a significant runtime improvement of about 30%. 3) Do your own measurements! Some code to play with is available here: http://int-e.home.tlink.de/haskell/MapTest.hs http://int-e.home.tlink.de/haskell/Map.hs

fromDistinctAscList :: [(k,a)] -> Map k a fromDistinctAscList xs = build const (length xs) xs where -- 1) use continutations so that we use heap space instead of stack space. -- 2) special case for n==5 to build bushier trees. build c 0 xs' = c Tip xs' build c 5 xs' = case xs' of ((k1,x1):(k2,x2):(k3,x3):(k4,x4):(k5,x5):xx) -> c (bin k4 x4 (bin k2 x2 (singleton k1 x1) (singleton k3 x3)) (singleton k5 x5)) xx
By the way, did anyone test if (or when) this n==5 case "bushier trees" gains something? Thanks Christian

Don Stewart wrote:
dons: [...] Just serialising straight lists of pairs, [...] And reading them back in,
main = do [f] <- getArgs m <- decode `fmap` L.readFile f print (length (m :: [(B.ByteString,Int)])) print "done"
Well, you don't actually read the whole list here, just its length: instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get To demonstrate, this works: main = do L.writeFile "v" (encode (42 :: Int)) m <- decode `fmap` L.readFile "v" print (length (m :: [Int])) So instead, we should try something like this: import Control.Parallel.Strategies instance NFData B.ByteString where rnf bs = bs `seq` () main = do [f] <- getArgs m <- decode `fmap` L.readFile f print (rnf m `seq` length (m :: [(B.ByteString,Int)])) My timings: reading list, without rnf: 0.04s with rnf: 0.16s reading a Data.Map: 0.52s with rnf: 0.62s Bertram

wren ng thornton wrote:
If you have many identical strings then you will save lots by memoizing your strings into Integers, and then serializing that memo table and the integerized version of your data structure. The amount of savings decreases as the number of duplications decrease, though since you don't need the memo table itself you should be able to serialize it in a way that doesn't have much overhead.
I had problems with the size of the allocated heap space after serializing and loading data with the binary package. The reason was that binary does not support sharing of identical elements and considered a restricted solution for strings and certain other data types first, but came up with a generic solution in the end. (I did it just last weekend). I put the Binary monad in a state transformer with maps for memoization: type PutShared = St.StateT (Map Object Int, Int) PutM () type GetShared = St.StateT (IntMap Object) Bin.Get In addition to standard get ant put methods: class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where put :: α → PutShared get :: GetShared α I added putShared and getShared methods with memoization: putShared :: (α → PutShared) → α → PutShared getShared :: GetShared α → GetShared α For types that I don't want memoization I can either refer to the underlying binary monad for primitive types, e.g.: instance BinaryShared Int where put = lift∘Bin.put get = lift Bin.get or stay in the BinaryShared monad for types of which I may memoize components, e.g.: instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where put (a,b) = put a ≫ put b get = liftM2 (,) get get And for types for which I want memoization, I wrap it with putShared and getShared ,e.g: instance BinaryShared a ⇒ BinaryShared [a] where put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l) get = getShared (do n ← lift (Bin.get :: Bin.Get Int) replicateM n get) This save 1/3 of heap space to my application. I didn't measure time. Maybe it would be useful to have something like this in the binary module. Jürgen PS: And here is the dirty implementation, in the case someone finds it useful: putShared :: (α → PutShared) → α → PutShared putShared fput v = do (dict, unique) ← St.get case (ObjC v) `Map.lookup` dict of Just i → lift (Bin.putWord8 0 ≫ putWord64be (fromIntegral i)) Nothing → do St.put (dict,unique + 1) lift (Bin.putWord8 1) lift (putWord64be (fromIntegral unique)) fput v (dict2, unique2) ← St.get let newDict = Map.insert (ObjC v) unique dict2 St.put (newDict,unique2) getShared :: GetShared α → GetShared α getShared f = do dict ← St.get w ← lift Bin.getWord8 case w of 0 → do i ← lift (liftM fromIntegral (getWord64be)) case IMap.lookup i dict of Just (ObjC obj) → return (forceJust (cast obj) "Shared≫getShared: Cast failed") Nothing → error ◊ "Shared≫getShared : Dont find in Map " ⊕ show i 1 → do i ← lift (liftM fromIntegral (getWord64be)) obj ← f dict2 ← St.get St.put (IMap.insert i (ObjC obj) dict2) return obj _ → error ◊ "Shared≫getShared : Encoding error" data Object = ∀ α. (Typeable α, Ord α, Eq α) ⇒ ObjC {unObj :: α} instance Eq Object where (ObjC a) ≡ (ObjC b) = if typeOf a ≠ typeOf b then False else (Just a) ≡ cast b -- can someone explain to me why this works? instance Ord Object where compare (ObjC a) (ObjC b) = if typeOf a ≠ typeOf b then compare ((unsafePerformIO∘typeRepKey∘typeOf) a) ((unsafePerformIO∘typeRepKey∘typeOf) b) else compare (Just a) (cast b) encodeSer :: BinaryShared a ⇒ a → L.ByteString encodeSer v = runPut (evalStateT (put v) (Map.empty,0)) decodeSer :: BinaryShared α ⇒ L.ByteString → α decodeSer = runGet (evalStateT get IMap.empty) -- View this message in context: http://www.nabble.com/Data.Binary-poor-read-performance-tp22167466p22192337.... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

jnf:
wren ng thornton wrote:
If you have many identical strings then you will save lots by memoizing your strings into Integers, and then serializing that memo table and the integerized version of your data structure. The amount of savings decreases as the number of duplications decrease, though since you don't need the memo table itself you should be able to serialize it in a way that doesn't have much overhead.
I had problems with the size of the allocated heap space after serializing and loading data with the binary package. The reason was that binary does not support sharing of identical elements and considered a restricted solution for strings and certain other data types first, but came up with a generic solution in the end. (I did it just last weekend).
And this is exactly the intended path -- that people will release their own special instances for doing more elaborate parsing/printing tricks!
I put the Binary monad in a state transformer with maps for memoization: type PutShared = St.StateT (Map Object Int, Int) PutM () type GetShared = St.StateT (IntMap Object) Bin.Get
In addition to standard get ant put methods: class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where put :: α → PutShared get :: GetShared α I added putShared and getShared methods with memoization: putShared :: (α → PutShared) → α → PutShared getShared :: GetShared α → GetShared α
For types that I don't want memoization I can either refer to the underlying binary monad for primitive types, e.g.: instance BinaryShared Int where put = lift∘Bin.put get = lift Bin.get or stay in the BinaryShared monad for types of which I may memoize components, e.g.: instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where put (a,b) = put a ≫ put b get = liftM2 (,) get get
And for types for which I want memoization, I wrap it with putShared and getShared ,e.g: instance BinaryShared a ⇒ BinaryShared [a] where put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l) get = getShared (do n ← lift (Bin.get :: Bin.Get Int) replicateM n get)
This save 1/3 of heap space to my application. I didn't measure time. Maybe it would be useful to have something like this in the binary module.
Very nice. Maybe even upload these useful instances in a little binary-extras package?

On Tue, Feb 24, 2009 at 7:42 PM, jutaro
instance Eq Object where (ObjC a) ≡ (ObjC b) = if typeOf a ≠ typeOf b then False else (Just a) ≡ cast b -- can someone explain to me why this works?
In fact, can't you just say instance Eq Object where ObjC a == ObjC b = Just a == cast b ? -- Felipe.
participants (12)
-
Bertram Felgenhauer
-
Bulat Ziganshin
-
Christian Maeder
-
Claus Reinke
-
Don Stewart
-
Duncan Coutts
-
Felipe Lessa
-
jutaro
-
Neil Mitchell
-
Paulo Tanimoto
-
Svein Ove Aas
-
wren ng thornton