Both.On Tue, 17 Feb 2015, Kim-Ee Yeoh wrote:
On Tue, Feb 17, 2015 at 5:05 PM, Dr. Olaf Klinke <o.klinke@dkfz-heidelberg.de> wrote:
I'd like to separate serialization (IO) from traversal of the stucture (pure), but how can one avoid the entire heap object from being built? Can lazyness be exploited?
Something I didn't quite understand from your email is whether
1. the data structure is being constructed and you'd like it simultaneously serialized to disk
or
2. it's already serialized and you'd like to perform pure traversals on it.
Problem 1 is one of compute then write, problem 2 is one of read then compute.
Lazy I/O, as in unsafeInterleaveIO, excels at 2.
-- Kim-Ee
The data structure is a search tree, which is intended to be used multiple times, so 1. applies. I guess that the only way to circumvent the memory bottleneck would be to serialize the part of the structure already constructed and use the already serialized part for constructing the rest, somehow without de-serializing it again.
2. also applies. Queries to the search tree traverse different paths of it, but one can not rule out that multiple queries will eventually have traversed the entire tree. Thus lazily de-serializing it may not be enough unless the garbage collector disposes of parts of the tree after each query. Which it won't, if we say something like this:
search :: FilePath -> IO (Query -> Result)
search treeOnDisk = do
searchtree <- decodeFile treeOnDisk
return (\query -> searchFor query searchtree)