[Haskell-cafe] idea for avoiding temporaries

8 Mar 2007

      Hi all,

I was just teaching a class on minimization methods, with a focus on
conjugate gradient minimization in particular, and one of my main points
was that with conjugate gradient minimization we only need three or four
arrays of size N (depending on whether we use the Fletcher-Reeves or
Polak-Ribiere variant), that this was a huge win, and I was explaining that
in my code we use Fletcher-Reeves even though it's generally got less
stable convergence behavior, because avoiding one extra copy is well worth
it, when each copy is on the order of a gigabyte (which is not unusual in
my line of work).  This got me thinking about one of the largest problems
with writing serious numerical code in Haskell, which is that of memory
consumption and avoiding temporary variables.

The count of three arrays requires assumes that we do in-place updates,
which presupposes an imperative model.  But the main reason I'd like to
switch to Haskell is so that I can write pure functional code, which is
easier to read, easier to reason with, and easier to get right.  So I
started wondering whether there's a solution that would allow us to write
pretty high-level pure functional code, while the RTS can realize at
run-time that we have the only reference to the input argument and that it
is therefore safe to consume it destructively.  My thought is to move some
of the amazing rewriting streams stuff that's going on with Data.ByteStream
and Manuel's array fusion work onto the memory management level.  If we can
rewrite high-level array manipulations into a form in which the compiler
can fuse the loops and write good low-level code, why not also allow
automatic in-place updates (when possible)?

My thought is to create a function like the following:

copy :: ForeignPtr a -> IO (ForeignPtr a)

This function ordinarily copies its input into a new chunk of memory, and
then calls the initialization function to update that memory.  So we could
call it with something like

a `plus` b = do a' <- copy a
                a' `plusEqual` b
                return a'

Of course, in practice I imagine this whole business being encapsulated in
a module where it's called with unsafePerformIO (but it's really perfectly
safe, a la Data.ByteString).

The key is that copy will ask the GC to check whether there is more than
one reference to its input, and if there isn't, then it'll return its input
without making a copy.

Obviously, we'd want to be careful in making calls to copy (since the GC
overhead could overwhelm the potential (transient) memory savings.  And
I've got no idea how hard it would be to find out if there are no other
references, or how effective this would be... that is, whether the compiler
might generate "false" references to the input that would prevent the
savings from ever being realized.

There are also other questions.  My copy above is simple, but probably we'd
be better off with a more general:

copy :: a -> (a -> IO a) -> IO a

which could be used in the base libraries for other sorts of objects, such
as Array, which might not have a ForeignPtr at their back end.

We might also want a more general copy that would handle the case where we
would like to copy either one variable or another, and we don't care
which.  i.e. we in my `plus` example above, if "a" can't be discarded, but
"b" can, we'll be out of luck.  Perhaps we want something like

reuseOrNot :: a -> (a -> IO b) -> (a -> IO b) -> IO b

which would call one function or the other, depending on whether the input
is deemed expendible by the GC, so we could write

a `plus` b = reuseOrNot a
             (\a' -> do { a' `plusEqual` b; return a' })
             (\a' -> reuseOrNot b
                     (\b' -> do { b' `plusEqual` a'; return b' })
                     (\b' -> a' `boringAdd` b'))

where boringAdd is an addition that doesn't reuse either array.

Anyhow, all this would be predicated on the possibility of asking the GC
whether anyone else has a reference to a given object, and I've no idea how
tough (or slow) this would be.  It'd also be effectively predicated on
someone like dons having the time and interest to write good fusion code
which would allow one to make use of this in referentially transparent
high-level pure functional code.  I think it'd be pretty cool, and would
certainly (not immediately, but as my research group ramps up) be
interested in being a guinea pig user of this stuff.

Any thoughts? Am I crazy? Is this infeasible for some reason? Should I have
patented this idea before mentioning it on the list? Has someone else
already patented it?
-- 
David Roundy
Department of Physics
Oregon State University