Re: [Haskell-cafe] Mutable arrays

5 Feb 2008

      At least for file processing, I don't think the lazy solution is as
bad as some people on this list indicate.  My solution was to define a
function processAudioFile :: (Handle, Handle) -> (ASig -> ASig) -> IO
(), similar to interact.  The function reads from the first handle and
writes to the second (the problem domain requires two separate files).
 Contents of the first are read into a lazy bytestring with
hGetContents (from Data.ByteString.Lazy), decoded into an ASig (which
in the current version is actually a tuple of a list and an Int of the
total length, but I'm reworking this into a monad), and processed.
The processed list is then encoded back into a bytestring and written
with hPut.  I then stick the whole thing in a bracket to open and
close the filehandles, and call the bracketed function when I'm ready
to do processing.

I'm pretty happy with this solution, for several reasons:
1. The actual processing code remains purely functional.
2.  I didn't have to write imperative-style looping constructs.
3.  Handles get closed after use (even with exceptions, thanks to bracket).
4.  Because all IO, processing, and writing is encapsulated in one
function, everything happens sequentially as it's supposed to, so I
don't get exceptions about lazy filehandles being closed.
5.  Performance has been good.  Memory usage is lower than expected,
and it's fairly fast (at least when I remember to use a non-profiled
version).  I've tested this approach with wave files into the 100's of
MB so far.  Perhaps not quite as fast as optimized C, but good enough
for me.

I'm not quite sure how to get around the problem of getElems being
strict, though.  I do have one idea, but I don't know how it would
work in practice:

-- let ary_max = foldl1' max $ elems $ unsafeFreeze myArray

If you use a boxed array type (IOArray or STArray) for myArray, and
compiled with GHC, no copying is necessary (you may need to use type
annotations to guarantee this).  Then use the foldl' function to get
array_max, and map it onto the original mutable array.  I think it
would be safe provided that you calculate ary_max before you start to
modify the array, which is true for normalization.

It's worth a try, anyway.
John Lato
...
...
Changing the subject slightly, I once wrote code in Concurrent Clean that
filtered a file that was larger than the available memory on my PC.  I did
this by creating a function that returned the contents of the original file
as a lazy list.  Then, I created functions to process the list and write the
processed list to a results file.  The code was not imperative at all.  The
function that wrote the results file forced the evaluation of the lazy list.
As the lazy list was consumed, the contents of the original file were read.
Is this possible with Monads in Haskell?
Yes, using hGetContents, which is considered bad practice by many
people here.  The problem is that hGetContents breaks referential
transparency, and I suspect that whatever Clean does to lazily read
files also does (though I can't be sure, I haven't looked in any
detail at uniqueness types).  That is, the contents of the returned
list depend on when you read it, which is not allowed in a
referentially transparent language.
The same applies to your problem.  getElems cannot return a lazy list
of elements*, because what if the array were changed between the point
that you did the getElems and the point you required the element.  So
it seems that actually specifying the order of evaluation using an
imperative-style loop is the only pure way to do this.
* Well, it could, but it would require some cleverness like
copy-on-write logic under the hood.