
jnf:
wren ng thornton wrote:
If you have many identical strings then you will save lots by memoizing your strings into Integers, and then serializing that memo table and the integerized version of your data structure. The amount of savings decreases as the number of duplications decrease, though since you don't need the memo table itself you should be able to serialize it in a way that doesn't have much overhead.
I had problems with the size of the allocated heap space after serializing and loading data with the binary package. The reason was that binary does not support sharing of identical elements and considered a restricted solution for strings and certain other data types first, but came up with a generic solution in the end. (I did it just last weekend).
And this is exactly the intended path -- that people will release their own special instances for doing more elaborate parsing/printing tricks!
I put the Binary monad in a state transformer with maps for memoization: type PutShared = St.StateT (Map Object Int, Int) PutM () type GetShared = St.StateT (IntMap Object) Bin.Get
In addition to standard get ant put methods: class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where put :: α → PutShared get :: GetShared α I added putShared and getShared methods with memoization: putShared :: (α → PutShared) → α → PutShared getShared :: GetShared α → GetShared α
For types that I don't want memoization I can either refer to the underlying binary monad for primitive types, e.g.: instance BinaryShared Int where put = lift∘Bin.put get = lift Bin.get or stay in the BinaryShared monad for types of which I may memoize components, e.g.: instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where put (a,b) = put a ≫ put b get = liftM2 (,) get get
And for types for which I want memoization, I wrap it with putShared and getShared ,e.g: instance BinaryShared a ⇒ BinaryShared [a] where put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l) get = getShared (do n ← lift (Bin.get :: Bin.Get Int) replicateM n get)
This save 1/3 of heap space to my application. I didn't measure time. Maybe it would be useful to have something like this in the binary module.
Very nice. Maybe even upload these useful instances in a little binary-extras package?