
At least for file processing, I don't think the lazy solution is as bad as some people on this list indicate. My solution was to define a function processAudioFile :: (Handle, Handle) -> (ASig -> ASig) -> IO (), similar to interact. The function reads from the first handle and writes to the second (the problem domain requires two separate files). Contents of the first are read into a lazy bytestring with hGetContents (from Data.ByteString.Lazy), decoded into an ASig (which in the current version is actually a tuple of a list and an Int of the total length, but I'm reworking this into a monad), and processed. The processed list is then encoded back into a bytestring and written with hPut. I then stick the whole thing in a bracket to open and close the filehandles, and call the bracketed function when I'm ready to do processing. I'm pretty happy with this solution, for several reasons: 1. The actual processing code remains purely functional. 2. I didn't have to write imperative-style looping constructs. 3. Handles get closed after use (even with exceptions, thanks to bracket). 4. Because all IO, processing, and writing is encapsulated in one function, everything happens sequentially as it's supposed to, so I don't get exceptions about lazy filehandles being closed. 5. Performance has been good. Memory usage is lower than expected, and it's fairly fast (at least when I remember to use a non-profiled version). I've tested this approach with wave files into the 100's of MB so far. Perhaps not quite as fast as optimized C, but good enough for me. I'm not quite sure how to get around the problem of getElems being strict, though. I do have one idea, but I don't know how it would work in practice: -- let ary_max = foldl1' max $ elems $ unsafeFreeze myArray If you use a boxed array type (IOArray or STArray) for myArray, and compiled with GHC, no copying is necessary (you may need to use type annotations to guarantee this). Then use the foldl' function to get array_max, and map it onto the original mutable array. I think it would be safe provided that you calculate ary_max before you start to modify the array, which is true for normalization. It's worth a try, anyway. John Lato
Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. I did this by creating a function that returned the contents of the original file as a lazy list. Then, I created functions to process the list and write the processed list to a results file. The code was not imperative at all. The function that wrote the results file forced the evaluation of the lazy list. As the lazy list was consumed, the contents of the original file were read. Is this possible with Monads in Haskell?
Yes, using hGetContents, which is considered bad practice by many people here. The problem is that hGetContents breaks referential transparency, and I suspect that whatever Clean does to lazily read files also does (though I can't be sure, I haven't looked in any detail at uniqueness types). That is, the contents of the returned list depend on when you read it, which is not allowed in a referentially transparent language.
The same applies to your problem. getElems cannot return a lazy list of elements*, because what if the array were changed between the point that you did the getElems and the point you required the element. So it seems that actually specifying the order of evaluation using an imperative-style loop is the only pure way to do this.
* Well, it could, but it would require some cleverness like copy-on-write logic under the hood.

On Tue, Feb 05, 2008 at 06:00:38PM -0600, John Lato wrote:
-- let ary_max = foldl1' max $ elems $ unsafeFreeze myArray
If you use a boxed array type (IOArray or STArray) for myArray, and compiled with GHC, no copying is necessary (you may need to use type annotations to guarantee this). Then use the foldl' function to get array_max, and map it onto the original mutable array. I think it would be safe provided that you calculate ary_max before you start to modify the array, which is true for normalization.
Eek! unsafeFreeze isn't a type cast, it actually modifies flags in the heap object which are used by the generational garbage collector. It's quite concievable that you could get segfaults by modifying a boxed array after passing it to unsafeFreeze. This, I believe, would work: let ary_max = foldl1' max [ inlinePerformIO (readArray myArray ix) | ix <- range (inlinePerformIO (getBounds myArray)) ] But it's equally untested. Stefan

Hmm. It looks like I forgot a step, and it probably would segfault as
given. That's what I get for mucking about with unsafe* functions.
How about this?
let frozen_ary = unsafeFreeze myArray
let ary_max = foldl1' max $ elems frozen_ary
in ary_max `seq` map (1/ary_max) $ unsafeThaw frozen_ary
This sequence doesn't modify the array object after freezing, except
to call unsafeThaw on it, and there is no need to access the frozen
array after unsafeThaw. Even so, piling another unsafe function on to
clean up the mess from the first unsafe function strikes me as an
anti-pattern (even though the docs seem to indicate that "unsafeThaw
write unsafeFreeze" could work). Furthermore, I can't say what would
happen to any of the original references to myArray, other than that
it's better not to use them.
I'm still mostly a noob, but assuming it works, your version with
inlinePerformIO looks better to me, even with the caveats of
inlinePerformIO.
John
On Feb 5, 2008 7:26 PM, Stefan O'Rear
On Tue, Feb 05, 2008 at 06:00:38PM -0600, John Lato wrote:
-- let ary_max = foldl1' max $ elems $ unsafeFreeze myArray
If you use a boxed array type (IOArray or STArray) for myArray, and compiled with GHC, no copying is necessary (you may need to use type annotations to guarantee this). Then use the foldl' function to get array_max, and map it onto the original mutable array. I think it would be safe provided that you calculate ary_max before you start to modify the array, which is true for normalization.
Eek! unsafeFreeze isn't a type cast, it actually modifies flags in the heap object which are used by the generational garbage collector. It's quite concievable that you could get segfaults by modifying a boxed array after passing it to unsafeFreeze.
This, I believe, would work:
let ary_max = foldl1' max [ inlinePerformIO (readArray myArray ix) | ix <- range (inlinePerformIO (getBounds myArray)) ]
But it's equally untested.
Stefan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHqQzTFBz7OZ2P+dIRAujzAJ49RDMKtgzrMZ9TxRyXge0hSFZHgwCdGAXM 8rQy4Fufodehcj5cxoSOoVM= =wHxm -----END PGP SIGNATURE-----
participants (2)
-
John Lato
-
Stefan O'Rear