
At least for file processing, I don't think the lazy solution is as bad as some people on this list indicate. My solution was to define a function processAudioFile :: (Handle, Handle) -> (ASig -> ASig) -> IO (), similar to interact. The function reads from the first handle and writes to the second (the problem domain requires two separate files). Contents of the first are read into a lazy bytestring with hGetContents (from Data.ByteString.Lazy), decoded into an ASig (which in the current version is actually a tuple of a list and an Int of the total length, but I'm reworking this into a monad), and processed. The processed list is then encoded back into a bytestring and written with hPut. I then stick the whole thing in a bracket to open and close the filehandles, and call the bracketed function when I'm ready to do processing. I'm pretty happy with this solution, for several reasons: 1. The actual processing code remains purely functional. 2. I didn't have to write imperative-style looping constructs. 3. Handles get closed after use (even with exceptions, thanks to bracket). 4. Because all IO, processing, and writing is encapsulated in one function, everything happens sequentially as it's supposed to, so I don't get exceptions about lazy filehandles being closed. 5. Performance has been good. Memory usage is lower than expected, and it's fairly fast (at least when I remember to use a non-profiled version). I've tested this approach with wave files into the 100's of MB so far. Perhaps not quite as fast as optimized C, but good enough for me. I'm not quite sure how to get around the problem of getElems being strict, though. I do have one idea, but I don't know how it would work in practice: -- let ary_max = foldl1' max $ elems $ unsafeFreeze myArray If you use a boxed array type (IOArray or STArray) for myArray, and compiled with GHC, no copying is necessary (you may need to use type annotations to guarantee this). Then use the foldl' function to get array_max, and map it onto the original mutable array. I think it would be safe provided that you calculate ary_max before you start to modify the array, which is true for normalization. It's worth a try, anyway. John Lato
Changing the subject slightly, I once wrote code in Concurrent Clean that filtered a file that was larger than the available memory on my PC. I did this by creating a function that returned the contents of the original file as a lazy list. Then, I created functions to process the list and write the processed list to a results file. The code was not imperative at all. The function that wrote the results file forced the evaluation of the lazy list. As the lazy list was consumed, the contents of the original file were read. Is this possible with Monads in Haskell?
Yes, using hGetContents, which is considered bad practice by many people here. The problem is that hGetContents breaks referential transparency, and I suspect that whatever Clean does to lazily read files also does (though I can't be sure, I haven't looked in any detail at uniqueness types). That is, the contents of the returned list depend on when you read it, which is not allowed in a referentially transparent language.
The same applies to your problem. getElems cannot return a lazy list of elements*, because what if the array were changed between the point that you did the getElems and the point you required the element. So it seems that actually specifying the order of evaluation using an imperative-style loop is the only pure way to do this.
* Well, it could, but it would require some cleverness like copy-on-write logic under the hood.