
* The algorithm as written already tries to express minimal storage. The only question is, do +=, -=, and := update their left-hand side in place, or do we think of them (in the Haskell model of the universe) as fresh arrays, and the previous values as newly-created garbage? My challenge to fellow Haskell hackers: find a way to express this such that it doesn't look so imperative.
* Even if we do a good job with += and -=, which is what David seems to be looking for, we still have those irritating := assignments--- we'd like to throw out the old p and reuse the space in the last line. And we'd like to have one piece of storage to hold the q on each successive iteration. So even if we solve David's problem, we still haven't matched the space requirements of the imperative code.
i find it difficult to discuss performance issues without concrete code examples, so i decided to convert Jan-Willem's loop code into Haskell. at first, i just naively translated the loop to recursion over lists, then i tried to produce an update-inplace variant based on some form of arrays, and finally i added a variant based on strict lists (would be nice to have standard libraries for (head-/spine-)strict lists, btw). both the array and strict list versions avoid some intermediate structures; for the arbitrarily invented, relatively small inputs i've tried, strict lists are the clear winner, thanks to lower memory traffic, but i'd like some feedback from the experts: -are there any obvious inefficiencies in the array code? -what does your real code look like, in view of scaling to much larger inputs? -i tried to keep the computations equal, but seem to have introduced a small variance in the strict list version, which i can't seem to spot by staring at it. any ideas? -i assume the memory overhead of binary strict lists is unacceptable for large inputs, but i'm still waiting for those polymorphic bytestrings..;-) while playing with this code, it occurred to me that situations as that described by David (small numbers of huge structures of constant size, with no nested pointers) are less suited for compacting garbage collection than for slot-reusing reference counting. GC wins in the common situation of many variable-sized, frequently created/destroyed structures, by touching only those objects that are live when space runs out, while RC has a higher continuous overhead. as i mentioned earlier, i'm not sure whether update-in-place vs recreate is so bad for whole-array updates, but the real win could be not having to copy the huge live arrays arround at GC time (either from old to new space, or within one space, to create contiguous free slots while compacting the occupied slots). so instead of update-in-place in virtual memory space, which will still be copied at GC time, one would want to pin the virtual region in which the arrays are allocated to a real memory region, and tell the GC to keep its hands off it (none of that space will be released until the loop ends, all of it will be released after the loop ends and a copy of z has been made). does this make sense to the number crunchers and memory managers out there? and is there a way to allocate Haskell arrays to such pinned memory regions? claus