
I had a couple ideas, and I was wondering if they were good ones. One is the idea of a slice function on an IArray: a' = slice a (i,j) giving an array with the elements i..j of array a. The nifty idea would be that this could be done without any memory copy. The catch, of course, would be that you might end up having to retain the entire array a just because you need a little slice of it. However, I think this risk (which should be documented) would be overwhelmed by the increased efficiency (both in space and time) gains that would be possible. The most obvious (to me at least, since I've been working a lot with PackedStrings recently) application would be in substrPS. Using slice, this would be an O(1) operation, rather than an O(n) operation, and would use essentially no memory (at the cost of retaining the mother string). I'm thinking that it may even be possible to make the GC smart enough to get rid of mother strings which are needed by very few children (at least if it's desperate), if array slices were supported at a low enough level... I imagine they'd have to be supported at a very low level. The second (and related) idea is that it would be nice to be able to use mmap on a readonly file to read it into an immutable array. Of course, this works really nicely with array slices, since with the slices the mmapped array wouldn't need to be copied, and could remain backed by the file itself. I imagine being able to read a 100M file into a PackedString, then run linesPS (or possibly a special slice version of linesPS) on it, and have a [PackedString], all of which is backed by the file, so when memory gets tight it can be just thrown away (by the VM system) and then get read in again when I use it. The mmap idea seems even tougher than the slice idea, as it would require (preferably) keeping track of all mmapped regions so when the file is closed they could all be copied over into normal memory and unmapped. Technically they could remain mmapped after the file is closed, but I think that would cause problems if the program then wants to reopen the file for writing. Anyhow, those are just a couple of ideas I had, and I have nowhere near the knowledge if haskell internals to tell if they are workable or not. -- David Roundy http://www.abridgegame.org

On Sat, 26 Apr 2003 12:21:52 -0400
David Roundy
I had a couple ideas, and I was wondering if they were good ones.
One is the idea of a slice function on an IArray:
a' = slice a (i,j) giving an array with the elements i..j of array a.
[..]
I'm thinking that it may even be possible to make the GC smart enough to get rid of mother strings which are needed by very few children (at least if it's desperate), if array slices were supported at a low enough level... I imagine they'd have to be supported at a very low level.
Perhaps it could be done with cunning use of finalisers. Instead of having the substrings keep references to the 'mother' string, you could reverse it, so the mother keeps (weak?) references to the substrings. That way, when the mother string is garbage collected it can make copies of the substrings in new storage. As you suggest, a further optimisation might be to not collect the mother string if enough of it is still in use. Also when the substrings are copied, you could try and do it in a way that maintains as much sharing as possile if substrings overlap. It porbably still requires significant low-level support by maybe you don't have to teach the garbage collecter specially. Duncan

On Sat, Apr 26, 2003 at 05:40:23PM +0100, Duncan Coutts wrote:
On Sat, 26 Apr 2003 12:21:52 -0400 David Roundy
wrote: I had a couple ideas, and I was wondering if they were good ones.
One is the idea of a slice function on an IArray:
a' = slice a (i,j) giving an array with the elements i..j of array a.
[..]
I'm thinking that it may even be possible to make the GC smart enough to get rid of mother strings which are needed by very few children (at least if it's desperate), if array slices were supported at a low enough level... I imagine they'd have to be supported at a very low level.
Perhaps it could be done with cunning use of finalisers. Instead of having the substrings keep references to the 'mother' string, you could reverse it, so the mother keeps (weak?) references to the substrings. That way, when the mother string is garbage collected it can make copies of the substrings in new storage.
Oooo, that sounds like a great way to do things. I had no idea you could do that! :) The same sort of trick could also be used to deal with the mmaped file situation where the file needs to be closed and you want to unmap the file (but need to know who is referencing that memory, since it's about to be made invalid). -- David Roundy http://www.abridgegame.org
participants (2)
-
David Roundy
-
Duncan Coutts