New subject: Very large data structures in the heap

17 Feb 2015

      On Tue, Feb 17, 2015 at 8:26 PM, Dr. Olaf Klinke <
o.klinke@dkfz-heidelberg.de> wrote:
...
On Tue, 17 Feb 2015, Kim-Ee Yeoh wrote:
...
On Tue, Feb 17, 2015 at 5:05 PM, Dr. Olaf Klinke <
o.klinke@dkfz-heidelberg.de> wrote:
      I'd like to separate serialization (IO) from traversal of the
stucture (pure), but how can one avoid the entire heap object from being
built? Can lazyness be exploited?
Something I didn't quite understand from your email is whether
1. the data structure is being constructed and you'd like it
simultaneously serialized to disk
or
2. it's already serialized and you'd like to perform pure traversals on
it.
Problem 1 is one of compute then write, problem 2 is one of read then
compute.
Lazy I/O, as in unsafeInterleaveIO, excels at 2.
-- Kim-Ee
Both.
The data structure is a search tree, which is intended to be used multiple
times, so 1. applies. I guess that the only way to circumvent the memory
bottleneck would be to serialize the part of the structure already
constructed and use the already serialized part for constructing the rest,
somehow without de-serializing it again.
To an extent, the virtual memory of a modern operating system solves the
memory bottleneck.

I'm curious what you were hoping for when thinking of exploiting laziness.
If the search tree is truly huge, you'd probably resort to fancy (or fuzzy
or both) algorithms to break up the work.
...
2. also applies. Queries to the search tree traverse different paths of
it, but one can not rule out that multiple queries will eventually have
traversed the entire tree. Thus lazily de-serializing it may not be enough
unless the garbage collector disposes of parts of the tree after each
query. Which it won't, if we say something like this:
search :: FilePath -> IO (Query -> Result)
search treeOnDisk = do
        searchtree <- decodeFile treeOnDisk
        return (\query -> searchFor query searchtree)
No, gc _will_ reclaim memory if decodeFile uses unsafeInterleaveIO. Take a
look at the lazy bytestrings package.

p.s. Your search function looks odd. Surely you mean something with the
type signature: FilePath -> Query -> IO Result ?

-- Kim-Ee

Re: [Haskell-cafe] Very large data structures in the heap

Kim-Ee Yeoh

Dr. Olaf Klinke

tags

participants (2)