
I was writing a Haskell program which builds a large labeled binary tree and then does some processing of it, which is fold-like. In the actual application that I have in mind the tree will be *huge*. If the whole tree is kept in memory it would probably take up 100's of gigabytes. Because of the pattern of processing the tree, it occurred to me that it might be better (cause much less paging) if some large subtrees could be replaced by thunks which can either recalculate the subtree as needed, or write out the subtree, get rid of the references to it (so it can be garbage collected) and then read back in (perhaps in pieces) as needed. This could be fairly cleanly expressed monadically. So does anyone know if someone has created something like this? Victor

On Mar 22, 2012 2:56 AM, "Victor Miller"
I was writing a Haskell program which builds a large labeled binary tree
and then does some processing of it, which is fold-like. In the actual application that I have in mind the tree will be *huge*. If the whole tree is kept in memory it would probably take up 100's of gigabytes. Because of the pattern of processing the tree, it occurred to me that it might be better (cause much less paging) if some large subtrees could be replaced by thunks which can either recalculate the subtree as needed, This sounds like weak references (warning this is tricky stuff), http://www.haskell.org/ghc/docs/7.0.2/html/libraries/base-4.3.1.0/System-Mem... or write out the subtree, get rid of the references to it (so it can be garbage collected) and then read back in (perhaps in pieces) as needed. This could be fairly cleanly expressed monadically. So does anyone know if someone has created something like this?
Victor
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Victor There was a paper at one of the early PADL conferences describing out-out-core data structures in Ocaml. I've never seen anyone following up this work, possibly because RAM has got so cheap in the last decade. If you have such large trees you may find the paper interesting. Although the authors used Caml they did use monads. Tyng-Ruey Chuang and Shin-Cheng Mu, "Out-of-core functional programming with type-based primitives," The paper seems to be on Citeseer.

It depends on how you are building the tree.
If you are building up the tree from repeated substitution at the leaves
and never reference its body before you do the final fold, you may be able
to exploit the fact that trees form a free monad, and that there is a nice
construction for increasing the efficiency of substitution into free monads
that has the side-effect of making them 'memoryless'.
I blogged about it here:
http://comonad.com/reader/2011/free-monads-for-less/
http://comonad.com/reader/2011/free-monads-for-less-2/
and used it here:
http://comonad.com/reader/2011/free-monads-for-less-3/
Janis Voightländer has a paper on the simpler version from the first post
above as well
http://www.iai.uni-bonn.de/~jv/mpc08.pdf
However, to page out or recalculate, you are probably looking at a more
complicated construction.
-Edward
On Wed, Mar 21, 2012 at 9:55 PM, Victor Miller
I was writing a Haskell program which builds a large labeled binary tree and then does some processing of it, which is fold-like. In the actual application that I have in mind the tree will be *huge*. If the whole tree is kept in memory it would probably take up 100's of gigabytes. Because of the pattern of processing the tree, it occurred to me that it might be better (cause much less paging) if some large subtrees could be replaced by thunks which can either recalculate the subtree as needed, or write out the subtree, get rid of the references to it (so it can be garbage collected) and then read back in (perhaps in pieces) as needed. This could be fairly cleanly expressed monadically. So does anyone know if someone has created something like this?
Victor
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Edward Kmett
-
L Corbijn
-
Stephen Tetley
-
Victor Miller