
Hi, I need some pointers and suggestion in the following problem: I have a very large time-series data. I need to perform some operation on that. At any instant of time, I only need a window-full of data, size of window being known. Starting from the first point, I slide the window towards right and operate on the data that reside in the current window. Currently, I read all of the data to arrays which takes a lot of space and hence GC takes up lot of time. After reading "Streching Storage Manager: Weak pointers, Stable names" paper, I had a feeling that I can do something better for my problem, especially weak pointers seem to be useful. But I do not really get the thread. What I would ideally like to do is: read all the data lazily into some data structure and as I slide the window towards right, I will invalidate the data that fall before the start point of current window. By invalidating, I mean I will somehow tell the GC to reclaim that space if necessary. Is there any way in which I can do this in Haskell, or there is a better(simpler) solution for this. Thanks, Saswat

On Fri, Sep 28, 2001 at 05:31:20PM +0800, Saswat Anand wrote:
I have a very large time-series data. I need to perform some operation on that. At any instant of time, I only need a window-full of data, size of window being known. Starting from the first point, I slide the window towards right and operate on the data that reside in the current window.
This sounds perfectly suited to Haskell's standard lazy lists. If you only keep a pointer to the beginning of the data you need to work on, then Haskell will automatically read in exactly as much data as you use and GC it away after you are done with it. The downside is that accessing elements within the window will take time O(window size). Are the windows large enough that this is a concern? Best, Dylan Thurston

The downside is that accessing elements within the window will take time O(window size). Are the windows large enough that this is a concern?
Yes windows are quite large and elements need to be accessed very frequently. As regard to Marcin's suggestion of using a list of compact arrays, although elements can be accessed faster, there will be a lot if redundancy since windows are overlapping. So consecutive arrays will contain almost same data. Can we create something like standard lazy list but with better access time. Thanks, Saswat

Saswat Anand
As regard to Marcin's suggestion of using a list of compact arrays, although elements can be accessed faster, there will be a lot if redundancy since windows are overlapping. So consecutive arrays will contain almost same data.
Hmmm - a circular buffer array, fed from a lazy read? Of course, if this is to be more efficient than a lazy list of arrays, you need to destructively update the buffer. Perhaps this is a too Cistic solution. -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (3)
-
Dylan Thurston
-
Ketil Malde
-
Saswat Anand