Difficult memory leak in array processing

Hi everyone, I have the following code whose purpose is to add dither (noise) to a given array. The code looks very straightforward but apparently it has a memory leak somewhere. Here I try to run the algorithm for an array of 10,000,000 integers. Ten million unboxed strict integers should equal to 40MB which should pose no problems to any modern system. However, the program fails with a stack overflow error. I'm using GHC 6.6 on Windows with 1 GB of RAM. I've tried applying seq and some other strictness tricks (such as x == x) pretty much everywhere on the code with no results. Could you please help me understand what is going on here? Have I misunderstood something critical in how Haskell works? Here is the relevant portion of the code: module Main where import Data.Array.IO import System.Random type Buffer = IOUArray Int Int -- | Triangular Probability Density Function, equivalent to a roll of two dice. -- The number sums have different probabilities of surfacing. tpdf :: (Int, Int) -> IO Int tpdf (low, high) = do first <- getStdRandom (randomR (low, high)) second <- getStdRandom (randomR (low, high)) return ((first + second) `div` 2) -- | Fills an array with dither generated by the specified function. genSeries :: Buffer -> ((Int, Int) -> IO Int) -> (Int, Int) -> IO () genSeries buf denfun lims = let worker low i | i >= low = do r <- denfun lims writeArray buf i r worker low (i - 1) | otherwise = return () in do (lo, hi) <- getBounds buf worker lo hi main = do -- This should allocate a 40 MB array buf <- newArray_ (0, 10000000) :: IO Buffer -- Fill the array with dither genSeries buf tpdf (2, 12) -- niko.korhonen@gmail.com

Hi Niko, to, 2006-11-23 kello 12:11 +0200, Niko Korhonen kirjoitti:
I've tried applying seq and some other strictness tricks (such as x == x) pretty much everywhere on the code with no results. Could you please help me understand what is going on here? Have I misunderstood something critical in how Haskell works? Here is the relevant portion of the code:
main = do -- This should allocate a 40 MB array buf <- newArray_ (0, 10000000) :: IO Buffer -- Fill the array with dither genSeries buf tpdf (2, 12)
main = do -- This should allocate a 40 MB array buf <- newArray_ (0, 100000000) :: IO Buffer -- Fill the array with dither genSeries buf tpdf (2, 12) a <- readArray buf 100000000 putStrLn $ "a is " ++ (show a) By adding -O3 -optc-O3 -funfolding-use-threshold=16 compile flags the above code with 100'000'000 elements worked. And by still adding -ddump-simpl > core.txt flag and looking the generated core, the worker-loop seemed to use primitives. I cannot say, if this was the helping part here. Have you tried profiling: -prof -auto-all and running with +RTS -p -RTS? Or running with +RTS -sstderr gives 14,257,786,344 bytes allocated in the heap 4,282,040 bytes copied during GC (scavenged) 1,646,936 bytes copied during GC (not scavenged) 80,733,232 bytes maximum residency (2 sample(s)) 27045 collections in generation 0 ( 0.31s) 2 collections in generation 1 ( 0.00s) 78 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 22.61s ( 24.07s elapsed) GC time 0.31s ( 0.32s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 22.92s ( 24.39s elapsed) %GC time 1.3% (1.3% elapsed) Alloc rate 630,612,876 bytes per MUT second Productivity 98.6% of total user, 92.7% of total elapsed It seems that garbage collector has not used very much time here. There is more information on haskell wiki: http://www.haskell.org/haskellwiki/Performance http://www.haskell.org/haskellwiki/Performance/GHC This GHC specific part does not mention -O3 -optc-O3 -funfolding-use-threshold=nn flags. They were hinted here on this list; I have found them very helpful a couple of weeks ago - thanks again :) btw, Could the GHC specific wiki page be updated to contain and explain these flags? Hopefully this helped you a bit! And hopefully someone who knows how these things go have time to give you a detailed answer! br, Isto

Niko Korhonen wrote:
I have the following code whose purpose is to add dither (noise) to a given array. The code looks very straightforward but apparently it has a memory leak somewhere.
No, it doesn't. It can't, because it doesn't even compile. After correcting the obvious
(lo, hi) <- getBounds buf
to let (lo,hi) = bounds buf it just works and needs 40MB plus epsilon. Your problem has to be somewhere else. -Udo. -- fork(2) New processes are created by other processes, just like new humans. New humans are created by other humans, of course, not by processes. -- Unix System Administration Handbook

Udo Stenzel wrote:
Niko Korhonen wrote:
I have the following code whose purpose is to add dither (noise) to a given array. The code looks very straightforward but apparently it has a memory leak somewhere.
No, it doesn't. It can't, because it doesn't even compile. After correcting the obvious
(lo, hi) <- getBounds buf
to
let (lo,hi) = bounds buf
The interface changed between GHC 6.4.2 and 6.6. But no honorable Haskell paladin would ever dare to use UndeadArrays.
it just works and needs 40MB plus epsilon. Your problem has to be somewhere else.
The strictness analyzer likes Udo more than Niko, does it? Regards, apfelmus

apfelmus@quantentunnel.de wrote:
(lo, hi) <- getBounds buf
to
let (lo,hi) = bounds buf
The interface changed between GHC 6.4.2 and 6.6. But no honorable Haskell paladin would ever dare to use UndeadArrays.
Hm, and 'bounds' is simply gone? Hope that doesn't bite in an unexpected way.
The strictness analyzer likes Udo more than Niko, does it?
So it seems. I just tried it with GHC 6.6 and there still is no leak and no stack overflow. And frankly, I can't even see where too much lazyness could creep into this code. Everything is sequenced by IO, the array is unboxed, there's hardly any room for an unexpected thunk to hide in. -Udo -- "Science is like sex - sometimes something useful comes out of it, but that's not what we are doing it for." -- Richard Feynman

On Thu, Nov 23, 2006 at 10:13:25PM +0100, Udo Stenzel wrote:
apfelmus@quantentunnel.de wrote:
(lo, hi) <- getBounds buf
to
let (lo,hi) = bounds buf
The interface changed between GHC 6.4.2 and 6.6. But no honorable Haskell paladin would ever dare to use UndeadArrays.
Hm, and 'bounds' is simply gone? Hope that doesn't bite in an unexpected way.
bounds is only gone for mutable arrays. Since they might change size you need to use the monad to get at their bounds. 'bounds' still exists with the same interface for immutable arrays. -- John Meacham - ⑆repetae.net⑆john⑈

Ah, yet another UndeadArray necromancer exhausting his stack of bones. May the forces of light suggest to structure the incantation of darkness? modifyArray arr i f = readArray arr i >>= \y -> writeArray arr i (f y) accumM :: (MArray a e m, Ix i) => (e -> e' -> e) -> a i e -> [(i, e')] -> m () accumM f arr xs = mapM_ chg xs where chg (i,x) = modifyArray arr i (flip f x) twodice (x:x':xs) = (x+x') `div` 2 : twodice xs noise rng gen = twodice $ randomRs rng gen main = do let bnds = (0, 10000000) buf <- newArray_ bnds :: IO Buffer gen <- getStdGen accumM (curry snd) buf $ zip (range bnds) $ noise (2,12) gen I absolutely don't know why there is no (accumM) function in the standard libraries. And having the ByteString API (maybe even the fusion) for general arrays would be very nice. Maybe (modifyArray) is missing, too. Regards, apfelmus PS: did you try worker low (i `seq` i-1) ? PSS: The strictness analyzer is likely to insert that automatically if you compile with -O or -O2. Niko Korhonen wrote:
Hi everyone,
I have the following code whose purpose is to add dither (noise) to a given array. The code looks very straightforward but apparently it has a memory leak somewhere. Here I try to run the algorithm for an array of 10,000,000 integers. Ten million unboxed strict integers should equal to 40MB which should pose no problems to any modern system. However, the program fails with a stack overflow error. I'm using GHC 6.6 on Windows with 1 GB of RAM.
I've tried applying seq and some other strictness tricks (such as x == x) pretty much everywhere on the code with no results. Could you please help me understand what is going on here? Have I misunderstood something critical in how Haskell works? Here is the relevant portion of the code:
module Main where
import Data.Array.IO import System.Random
type Buffer = IOUArray Int Int
-- | Triangular Probability Density Function, equivalent to a roll of two dice. -- The number sums have different probabilities of surfacing. tpdf :: (Int, Int) -> IO Int tpdf (low, high) = do first <- getStdRandom (randomR (low, high)) second <- getStdRandom (randomR (low, high)) return ((first + second) `div` 2)
-- | Fills an array with dither generated by the specified function. genSeries :: Buffer -> ((Int, Int) -> IO Int) -> (Int, Int) -> IO () genSeries buf denfun lims = let worker low i | i >= low = do r <- denfun lims writeArray buf i r worker low (i - 1) | otherwise = return () in do (lo, hi) <- getBounds buf worker lo hi
main = do -- This should allocate a 40 MB array buf <- newArray_ (0, 10000000) :: IO Buffer -- Fill the array with dither genSeries buf tpdf (2, 12)

Hi Niko, On Thu, Nov 23, 2006 at 12:11:43PM +0200, Niko Korhonen wrote:
I have the following code whose purpose is to add dither (noise) to a given array. The code looks very straightforward but apparently it has a memory leak somewhere. Here I try to run the algorithm for an array of 10,000,000 integers. Ten million unboxed strict integers should equal to 40MB which should pose no problems to any modern system. However, the program fails with a stack overflow error. I'm using GHC 6.6 on Windows with 1 GB of RAM.
I'm also unable to reproduce this. Can you tell us exactly what commandline you are using to compile and run the program please? Thanks Ian

I can say neither that I have any idea what an 'undead array' is nor that I would really understand the code you've posted. On the positive side of things, you've given me a lot to think about. Maybe in the fullness of time I shall return and say 'Lo! I can write leakless Haskell code!'. But alas, that time seems so distant now. However, this necromancy business really does sound like an exiting new career prospect. Interesting job opportunities, respect of the community, flexible hours and extremely loyal peers and other commandlings that will literally work for just for the Brain Food. Regards, Nik The Blak, Necromancer of the Glorious Forces of Evil PS: No. PSS: And I shall. apfelmus wrote:
Ah, yet another UndeadArray necromancer exhausting his stack of bones. May the forces of light suggest to structure the incantation of darkness?
modifyArray arr i f = readArray arr i >>= \y -> writeArray arr i (f y)
accumM :: (MArray a e m, Ix i) => (e -> e' -> e) -> a i e -> [(i, e')] -> m () accumM f arr xs = mapM_ chg xs where chg (i,x) = modifyArray arr i (flip f x)
twodice (x:x':xs) = (x+x') `div` 2 : twodice xs noise rng gen = twodice $ randomRs rng gen
main = do let bnds = (0, 10000000) buf <- newArray_ bnds :: IO Buffer
gen <- getStdGen accumM (curry snd) buf $ zip (range bnds) $ noise (2,12) gen
I absolutely don't know why there is no (accumM) function in the standard libraries. And having the ByteString API (maybe even the fusion) for general arrays would be very nice. Maybe (modifyArray) is missing, too.
Regards, apfelmus
PS: did you try worker low (i `seq` i-1) ? PSS: The strictness analyzer is likely to insert that automatically if you compile with -O or -O2.

Ian Lynagh wrote:
I'm also unable to reproduce this. Can you tell us exactly what commandline you are using to compile and run the program please?
In fact it turned out that the example code I posted did not exhibit the memory leak at all. It just took a /very long time/ to complete (compared to a Java version), but it did complete. My complete code, which also counted the instances of a given number from the array, does however exhibit the leak. It is here: module Main where import Data.Array.IO import System.Random type Buffer = IOUArray Int Int -- | Triangular Probability Density Function, equivalent to a roll of two dice. -- The number sums have different probabilities of surfacing. tpdf :: (Int, Int) -> IO Int tpdf (low, high) = do first <- getStdRandom (randomR (low, high)) second <- getStdRandom (randomR (low, high)) return ((first + second) `div` 2) -- | Fills an array with dither generated by the specified function. genSeries :: Buffer -> ((Int, Int) -> IO Int) -> (Int, Int) -> IO () genSeries buf denfun lims = let worker low i | i >= low = do r <- denfun lims writeArray buf i r worker low (i - 1) | otherwise = return () in do (lo, hi) <- getBounds buf worker lo hi countNumbers :: Buffer -> Int -> IO Int countNumbers buf x = let worker lo i acc | i >= lo = do n <- readArray buf i if n == x then worker lo (i - 1) (acc + 1) else worker lo (i - 1) acc | otherwise = return acc in do (lo, hi) <- getBounds buf worker lo hi 0 main = do buf <- newArray_ (0, 10000000) :: IO Buffer genSeries buf tpdf (2, 12) sevens <- countNumbers buf 7 putStrLn ("Magic number sevens: " ++ show sevens) return 0

In fact it turned out that the example code I posted did not exhibit the memory leak at all. It just took a /very long time/ to complete (compared to a Java version), but it did complete. My complete code, which also counted the instances of a given number from the array, does however exhibit the leak. It is here:
quick guess, and useful pattern-to-avoid: tail-recursive functions with non-strict accumulators may be tail recursive, but they build up unevaluated expressions representing the accumulations; when those are forced by inspection, the evaluator descends non-tail-recursively into those possibly deep accumulations (..(0+1)..+1), possibly resulting in stack overflows. the worker in genSeries inspects its parameters at each call, keeping them evaluated; the worker in countNumbers inspects only its first two parameters, possibly (depending on optimizations) leaving acc unevaluated. try: worker lo (i-1) $! acc hth, claus
module Main where
import Data.Array.IO import System.Random
type Buffer = IOUArray Int Int
-- | Triangular Probability Density Function, equivalent to a roll of two dice. -- The number sums have different probabilities of surfacing. tpdf :: (Int, Int) -> IO Int tpdf (low, high) = do first <- getStdRandom (randomR (low, high)) second <- getStdRandom (randomR (low, high)) return ((first + second) `div` 2)
-- | Fills an array with dither generated by the specified function. genSeries :: Buffer -> ((Int, Int) -> IO Int) -> (Int, Int) -> IO () genSeries buf denfun lims = let worker low i | i >= low = do r <- denfun lims writeArray buf i r worker low (i - 1) | otherwise = return () in do (lo, hi) <- getBounds buf worker lo hi
countNumbers :: Buffer -> Int -> IO Int countNumbers buf x = let worker lo i acc | i >= lo = do n <- readArray buf i if n == x then worker lo (i - 1) (acc + 1) else worker lo (i - 1) acc | otherwise = return acc in do (lo, hi) <- getBounds buf worker lo hi 0
main = do buf <- newArray_ (0, 10000000) :: IO Buffer genSeries buf tpdf (2, 12) sevens <- countNumbers buf 7 putStrLn ("Magic number sevens: " ++ show sevens) return 0
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

nor that I would really understand the code you've posted. On the positive side of things, you've given me a lot to think about. Maybe in the fullness of time I shall return and say 'Lo! I can write leakless Haskell code!'. But alas, that time seems so distant now.
I can say neither that I have any idea what an 'undead array' is "UndeadArray" is a bowdlerization of "Unboxed Array" which is the type you're using as Buffer. They generally are to be considered "evil" as well, hence the renaming. Their only use is to store huge amounts of memory like their most prominent example (ByteString). If you want to add noise to an actual audio channel, then they can be adequate. But if you only want statistics about random numbers, they're completely out of
I tried to show how the code can be rewritten in a more declarative manner by the use of the higher order function (accumM) which is akin to the accumulation function (Data.Array.accum) for immutable arrays. This removes the need for traversing the array by hand with (worker). As Claus Reinke already pointed out, the accumulating parameter (acc) in your full example has to be evaluated strictly or it will overflow the stack. This is the same situation as in the famous example length' [] n = n length' (_:xs) n = length' xs (n+1) By avoiding (worker) and using higher order functions, you can avoid this kind of accumulating parameters altogether: length xs = foldr (+1) 0 xs Besides, writing things explicitly tail recursive does not help much in Haskell. In the following, I'm going to explain the restructured code posted. First of all, reducing the amount of IOs is always good for code sanity. Formulated with an emotional undertone, IO is "evil". Why do you need IO? There are two reasons: the mutable array and the random numbers. Random numbers clearly are a side-effect, but there is a wonderful trick: you simply fetch a lazy infinite list of random numbers and work with that. With System.Random, you can say do -- get the default pseudo-random number generator, -- it already has good random seed gen <- getStgGen let rs = randomsRs (2,12) gen result = dostuffwith rs return result (rs) is an infinite list of random numbers and (dostuffwith) is a pure function, no IO involved. In our case, twodice (x:x':xs) = (x+x') `div` 2 : twodice xs noise rng gen = twodice $ randomRs rng gen is a combination of (rs) and (dostuff). (noise) simply supplies an infinite list of random numbers to (twodice). (twodice) processes this list by taking pairs and averaging them. Both cover the functionality of your (tpdf) offers. Concerning mutable arrays, place. In that case, countNumber k xs = length $ filter (k==) xs main = do gen <- getStdGen return $ countNumber 7 (noise (2,12) gen) will do everything you need. If mutable arrays are really unavoidable (hence this is only for "necromancers"), the use of higher order functions is mandatory. One useful higher order function is (accum) from Data.Array. The adaption to the mutable case uses the helper function modifyArray arr i f = readArray arr i >>= \y -> writeArray arr i (f y) which applies f to the array element at position i. accumM f arr xs = mapM_ chg xs where chg (i,x) = modifyArray arr i (flip f x) "takes an array and an association list and accumulates pairs from the list into the array with the accumulating function f" (documentation from Data.Array.accum). For example if arr[0] == 0, arr[1] == 1, arr[2] == 2, arr[3] == 3 then brr = accum (+) arr [(1,2),(1,3),(2,3)] yields brr[0] == 0, brr[1] == 6, brr[2] == 5, brr[3] == 3 As another example, (accum (curry snd) arr xs) replaces the array entries by those listed in xs. Finally, countNumber can be expressed as a fold over the array. In general, every higher order function for lists can be translated to one for arrays.
However, this necromancy business really does sound like an exiting new career prospect. Interesting job opportunities, respect of the community, flexible hours and extremely loyal peers and other commandlings that will literally work for just for the Brain Food.
Regards, Nik The Blak, Necromancer of the Glorious Forces of Evil
This is indeed very tempting :) Though I suspect that the glory of forces built on (IO a) will be very limited. Regards, apfelmus, Golden Delicious of the Shining Bulbs of Light

apfelmus@quantentunnel.de wrote:
I tried to show how the code can be rewritten in a more declarative manner by the use of the higher order function (accumM) which is akin to the accumulation function (Data.Array.accum) for immutable arrays. This
Ok, given this explanation and your detailed annotations about the code, the whole thing now makes much more sense to me. This is indeed a much more Haskellish, more beautiful and certainly much less evil way to perform the task that I wanted to perform. Thank you for taking the time to explain the whole thing in such a detail; if nothing else, I've gained some pretty valuable insights on How To Do Things The Functional Way. Later in your reply you suggest that IOUArray would be an overkill for statistical analysis of random numbers, but possibly feasible for an actual audio buffer. I actually had both of these scenarios in mind. I have a version of the code that processes plain'ol Haskell lists instead of evil IOUArrays. Needless to say that this code is much more elegant and heavier on the folds and other higher order functions than the IOUArray code. I personally find doing higher order functions with IO extremely difficult, and the resulting compiler errors are often downright scary. But this is probably a direct consequence my rather limited understanding of the monadic operators that the do notion hides. It seems that one has to be /very/ good in Haskell before one can do IO effectively. Hmm. In imperative languages IO is the first thing one learns, before understanding how to combine a series of operations into a complete algorithm. Hence the 'void doIt()' functions that are so ubiquitous in the C and Java code out there. The folks may say that real-life Haskell code is littered with strictness annotations, but real-life C/C++/C#/Java code is littered with void doIt() functions, which are an order of magnitude worse from code clarity point of view... But back to the point, using Haskell lists for representing a largish buffer of, say, 16-bit samples that is constantly being refreshed from hard disk and sent into the sound system for playback would probably be inefficient beyond imagination. The overhead of one list element is probably larger than the size of the element itself (16 to 32 bit integers) and a new list would have to be generated every time more data is read from the hard disk. So the garbage collector would be working really hard all the time. And this list would have to be converted into an IO buffer for passing it to the C sound API anyway. These were the reasons why I went for the IOUArray way to begin with. For this particular problem, probably the most natural and efficient solution is to have a static buffer in memory to which data is read from the hard drive and which is then passed to the sound API for playback. For me, this seems the best way to do the job. But even this is not so obvious since I don't really know the performance implications of different Haskell constructs and the strong/weak points of the GHC compiler. The initial code I posted (without number counting) seems to take an awful long time to perform the task of filling the array with random numbers, especially compared to a Java (or actually Scala) solution which does more or less the same thing in 1/30 of the time. I will probably run some comparisons of my original code to the code you posted to see if there is a significant difference in not only elegance but performance too. Regards, Niko

Hello Niko, Tuesday, November 28, 2006, 1:42:10 PM, you wrote:
I personally find doing higher order functions with IO extremely difficult, and the resulting compiler errors are often downright scary. But this is probably a direct consequence my rather limited understanding of the monadic operators that the do notion hides. It seems that one has to be /very/ good in Haskell before one can do IO effectively.
i will say that one should be very good in general program optimization and know a few haskell-specific tricks, such as avoiding laziness and boxing optimization is my lovely problem, so i can help you somewhat. first, look at http://haskell.org/haskellwiki/Performance page. second, look at sources of ByteString (FPS) library - because it both shows efficient programming style and can be used for your sound processing
inefficient beyond imagination. The overhead of one list element is
8 bytes on 32-bit system
probably larger than the size of the element itself (16 to 32 bit integers) and a new list would have to be generated every time more data is read from the hard disk. So the garbage collector would be working really hard all the time. And this list would have to be converted into an IO buffer for passing it to the C sound API anyway. These were the reasons why I went for the IOUArray way to begin with. For this particular problem, probably the most natural and efficient solution is to have a static buffer in memory to which data is read from the hard drive and which is then passed to the sound API for playback. For me, this seems the best way to do the job.
it seems that you are ideal consumer for unboxed parallel arrays or extended ByteStrings (WordStrings :) we can ask Donald and Duncan whether they interested in implementing such extension?
But even this is not so obvious since I don't really know the performance implications of different Haskell constructs and the strong/weak points of the GHC compiler.
i can give you pointers to some longer threads which discussed these topics -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, 2006-11-28 at 15:10 +0300, Bulat Ziganshin wrote:
Hello Niko,
Tuesday, November 28, 2006, 1:42:10 PM, you wrote:
I personally find doing higher order functions with IO extremely difficult, and the resulting compiler errors are often downright scary. But this is probably a direct consequence my rather limited understanding of the monadic operators that the do notion hides. It seems that one has to be /very/ good in Haskell before one can do IO effectively.
i will say that one should be very good in general program optimization and know a few haskell-specific tricks, such as avoiding laziness and boxing
optimization is my lovely problem, so i can help you somewhat. first, look at http://haskell.org/haskellwiki/Performance page. second, look at sources of ByteString (FPS) library - because it both shows efficient programming style and can be used for your sound processing
inefficient beyond imagination. The overhead of one list element is
8 bytes on 32-bit system
Actually it's 12 bytes. 4 for the cell header, 4 for the head pointer and 4 for the tail pointer. On a 64bit machine it is double that of course.
probably larger than the size of the element itself (16 to 32 bit integers) and a new list would have to be generated every time more data is read from the hard disk. So the garbage collector would be working really hard all the time. And this list would have to be converted into an IO buffer for passing it to the C sound API anyway. These were the reasons why I went for the IOUArray way to begin with. For this particular problem, probably the most natural and efficient solution is to have a static buffer in memory to which data is read from the hard drive and which is then passed to the sound API for playback. For me, this seems the best way to do the job.
it seems that you are ideal consumer for unboxed parallel arrays or extended ByteStrings (WordStrings :)
we can ask Donald and Duncan whether they interested in implementing such extension?
Spencer Janssen implemented a variant on ByteString for arbitrary instances of the Storable class. Duncan

I personally find doing higher order functions with IO extremely difficult, and the resulting compiler errors are often downright scary. But this is probably a direct consequence my rather limited understanding of the monadic operators that the do notion hides. It seems that one has to be /very/ good in Haskell before one can do IO effectively. Hmm.
Well, as good as a necromancer has to be :) But seriously, I think the most mind boggling point about IO actions is that those are higher order, too. I for myself have taken the point of view that values of type (IO a) are just "plans" that will be executed when the main Haskell program has done its work. The programs task is to assemble the "main plan" by sequencing smaller "plans" with (>>=).
But back to the point, using Haskell lists for representing a largish buffer of, say, 16-bit samples that is constantly being refreshed from hard disk and sent into the sound system for playback would probably be inefficient beyond imagination.
Lists are indeed not suited for massive byte crunching. If they fit into a black box, you can of course outsource things into C. In case you were writing a track scheduler à la iTunes, the black box most likely would be a single function (playSoundFile) which does the job of handling data from disk to the sound card. Actual sound processing with Haskell functions is more involved. As already said, specifying loops over the samples as higher order functions will save you a lot of headaches. The point is that one just cannot start writing Haskell code and hope that it will run as fast as a tight loop in C. Instead, one should do aggressive optimizations at a few critical points only. And these are exactly the higher order loops. I have to admit that (accumM) is not very fast because it is able to change data at arbitrary indexes which therefore have to be kept around. Most often, you want to process each index exactly once which is better expressed as a (map) or a (fold). As Bulat and Duncan already said, Data.ByteString does exactly this for arrays of bytes. Using it or the generalization promised by Duncan is likely to be the best way to go. On the implementation level, lazy evaluation is in the way when crunching bytes. So be sure to tell GHC to make things strict and to unbox and inline everything it can put its hands on by giving appropriate command line options. As others already pointed out, the details are on http://haskell.org/haskellwiki/Performance As a first test, you may want to compile your original code with -O3 -optc-O3 -funfolding-use-threshold=16 and explore what happens. GHC does a good job with strictness analysis and it's of no use to drown your code in strictness annotations. Of course, some well placed ones may hint GHC about things it overlooked. To mention yet another approach, the image synthesis library Pan http://conal.net/pan/ pretends that the programmer writes ordinary Haskell functions, but under the hood, it's a tiny programming language that gets compiled to C++. Of course, the main point is that the image synthesis combinators are strictly less powerful than full Haskell. But as the full power is unneeded in that context, this doesn't hurt. While there are audio related projects, http://haskell.org/haskellwiki/Libraries_and_tools/Music_and_sound I don't know whether they focus on speed. Regards, afpelmus

On Wed, 2006-11-29 at 20:27 +0100, apfelmus@quantentunnel.de wrote:
On the implementation level, lazy evaluation is in the way when crunching bytes.
Something I rather enjoyed when hacking on the ByteString lib is finding that actually lazy evaluation is great when crunching bytes, though you do need to know exactly when to use it. Lazy ByteStrings rely on lazy evaluation of course. Demanding a lazy ByteString alternates between strictly filling in big chunks of data in memory with lazily suspending before producing the next chunk. As many people have observed before, FP optimisation is to a great extent about thinking more carefully about a better evaluation order for a computation and making some bits stricter and some bits lazier to get that better evaluation order. Duncan

Duncan Coutts wrote:
On Wed, 2006-11-29 at 20:27 +0100, apfelmus@quantentunnel.de wrote:
On the implementation level, lazy evaluation is in the way when crunching bytes.
Something I rather enjoyed when hacking on the ByteString lib is finding that actually lazy evaluation is great when crunching bytes, though you do need to know exactly when to use it.
Lazy ByteStrings rely on lazy evaluation of course. Demanding a lazy ByteString alternates between strictly filling in big chunks of data in memory with lazily suspending before producing the next chunk.
As many people have observed before, FP optimisation is to a great extent about thinking more carefully about a better evaluation order for a computation and making some bits stricter and some bits lazier to get that better evaluation order.
I completely agree. My statement was not well formulated, I actually meant that the overhead implied by lazy evaluation occurring at every single byte to be crunched is in the way. In this case, the cost is too high to pay off as the bytes are most likely consumed anyway. The detailed account keeping about every byte ("is it _|_ or not?") is unnecessary for a (map) which invariably does look at every byte. The situation is already different for a (fold), though: any p = foldr (\x b -> p x `or` b) False Here, the computation may stop at any position in the list. In a sense, lazy ByteStrings just reduce the "cost of lazy evaluation" / byte ratio by grouping bytes strictly. Bookkeeping becomes cheaper because one doesn't look up so often. Of course, with a stricter fold, (any) gets more costly. The aim is to make the former ratio smaller while not raising the latter too much. One may say that ByteString makes explicit what the "Optimistic Haskell Compiler" aimed to make implicit. IMHO, lazy evaluation is always the better choice (in theory). In practice, the only problem about lazy evaluation is the overhead (which hurts mostly at (large -> small)) which is *not* a consequence of "no free lunch" but stems from the fact that current machine architecture is not very friendly to purely functional things. In a sense, the natural complexity measure in Haskell is the number of reductions in "hugs +s" whereas the natural complexity measure on RAM machines is the number of operations in 0xDEADBEAF-arithmetic. Unfortunately, it's the latter which is inside Intel. Regards, apfelmus

apfelmus:
Duncan Coutts wrote:
On Wed, 2006-11-29 at 20:27 +0100, apfelmus@quantentunnel.de wrote:
On the implementation level, lazy evaluation is in the way when crunching bytes.
Something I rather enjoyed when hacking on the ByteString lib is finding that actually lazy evaluation is great when crunching bytes, though you do need to know exactly when to use it.
Lazy ByteStrings rely on lazy evaluation of course. Demanding a lazy ByteString alternates between strictly filling in big chunks of data in memory with lazily suspending before producing the next chunk.
As many people have observed before, FP optimisation is to a great extent about thinking more carefully about a better evaluation order for a computation and making some bits stricter and some bits lazier to get that better evaluation order.
I completely agree. My statement was not well formulated, I actually meant that the overhead implied by lazy evaluation occurring at every single byte to be crunched is in the way. In this case, the cost is too high to pay off as the bytes are most likely consumed anyway. The detailed account keeping about every byte ("is it _|_ or not?") is unnecessary for a (map) which invariably does look at every byte. The situation is already different for a (fold), though:
any p = foldr (\x b -> p x `or` b) False
Here, the computation may stop at any position in the list.
In a sense, lazy ByteStrings just reduce the "cost of lazy evaluation" / byte ratio by grouping bytes strictly. Bookkeeping becomes cheaper because one doesn't look up so often. Of course, with a stricter fold, (any) gets more costly. The aim is to make the former ratio smaller while not raising the latter too much. One may say that ByteString makes explicit what the "Optimistic Haskell Compiler" aimed to make implicit.
This is a very interesting insight. Indeed, it does act much this way. -- Don
participants (10)
-
apfelmus@quantentunnel.de
-
Bulat Ziganshin
-
Claus Reinke
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Ian Lynagh
-
isto
-
John Meacham
-
Niko Korhonen
-
Udo Stenzel