
I'm writing a program that will use functions to generate audio. The Haskell code will write the audio samples to disk---no need for real time playback. I see some useful libraries for writing audio files. My question concerns efficiency when generating several million to 20 million samples (or even many times more than that if I use high-resolution sampling rates). They can be generated one at a time in sequence, so there's no need to occupy a lot of memory or postpone thunk evaluation. I'm going to need efficient disk writing. Note that I may need some pseudorandom numbers in my calculations, so I might want to calculate samples by state monadic computations to carry the generator state. What is my general strategy going to be for memory and time efficiency? I am pretty confused by Haskell "strictness" and normal head form and all that, which often doesn't seem to be very strict. Or bang patterns, etc. Is it going to be simple to understand what I need? Dennis

Interesting question! I don't know but I'm excited to read the responses.
If you don't find an answer here, this question seems to me easily
difficult enough to be appropriate on haskell cafe.
On Fri, Apr 29, 2016 at 8:58 PM, Dennis Raddle
I'm writing a program that will use functions to generate audio. The Haskell code will write the audio samples to disk---no need for real time playback. I see some useful libraries for writing audio files.
My question concerns efficiency when generating several million to 20 million samples (or even many times more than that if I use high-resolution sampling rates). They can be generated one at a time in sequence, so there's no need to occupy a lot of memory or postpone thunk evaluation. I'm going to need efficient disk writing. Note that I may need some pseudorandom numbers in my calculations, so I might want to calculate samples by state monadic computations to carry the generator state. What is my general strategy going to be for memory and time efficiency? I am pretty confused by Haskell "strictness" and normal head form and all that, which often doesn't seem to be very strict. Or bang patterns, etc. Is it going to be simple to understand what I need?
Dennis
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
-- Jeffrey Benjamin Brown

Oh a related question is---do any of the linear algebra packages for Haskell use hardware acceleration for vector arithmetic? I have a Windows PC with a gamer-class video card, so maybe I could speed things up manyfold.

AFAIK, graphic card programming is usually done with accelerate. Packages that depend on accelerate can be found here: http://packdeps.haskellers.com/reverse/accelerate

The entire topic of space use in Haskell is not simple, but the part you
need here may be. As long as GHC can tell that values already written
to disk may be garbage collected, memory use is quite reasonable.
For example, here's a short program that prints a long-ish list:
xs :: [Double]
xs = map cos [1..1e7]
main :: IO ()
main = traverse_ print $ map sin xs
It runs in constant space, of less than 1 MB. (I ran it on a few
smaller cases to confirm that max residency stays the same order of
magnitude.) Note the difference between "bytes allocated" and "total
memory in use".
$ ./laziness +RTS -sstderr > /dev/null
181,493,398,808 bytes allocated in the heap
414,623,400 bytes copied during GC
131,736 bytes maximum residency (2 sample(s))
23,520 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
This next program generates random numbers. You could use the State
monad; here I've just used the infinite list generator in System.Random.
main = do
g <- newStdGen
let xs = take 100000 (randoms g) :: [Int]
traverse_ print xs
This one also runs in constant space:
$ ./.cabal-sandbox/bin/lazyRandom +RTS -sstderr > /dev/null
380,128,240 bytes allocated in the heap
238,472 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Based on these tests, I'd recommend trying to structure your program as
a map or fold over a (lazy) list. If that structure makes sense for your
problem, I'd expect managing memory usage to be as simple as the cases
above. I expect that memory usage will be constant in the number of
samples, although higher than my examples because each sample is bigger
than the Int or Double I used.
Let me know if you want me to elaborate on any of this.
bergey
On 2016-04-29 at 23:58, Dennis Raddle
I'm writing a program that will use functions to generate audio. The Haskell code will write the audio samples to disk---no need for real time playback. I see some useful libraries for writing audio files.
My question concerns efficiency when generating several million to 20 million samples (or even many times more than that if I use high-resolution sampling rates). They can be generated one at a time in sequence, so there's no need to occupy a lot of memory or postpone thunk evaluation. I'm going to need efficient disk writing. Note that I may need some pseudorandom numbers in my calculations, so I might want to calculate samples by state monadic computations to carry the generator state. What is my general strategy going to be for memory and time efficiency? I am pretty confused by Haskell "strictness" and normal head form and all that, which often doesn't seem to be very strict. Or bang patterns, etc. Is it going to be simple to understand what I need?
Dennis
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

On Sat, Apr 30, 2016 at 11:00 AM, Daniel Bergey
The entire topic of space use in Haskell is not simple, but the part you need here may be. As long as GHC can tell that values already written to disk may be garbage collected, memory use is quite reasonable.
For example, here's a short program that prints a long-ish list:
xs :: [Double] xs = map cos [1..1e7]
main :: IO () main = traverse_ print $ map sin xs
Thanks. I'll see if this works for me. My question right now is, what is traverse_print? Is that the same as main = traverse print . map sin $ xs ? I'm guessing IO is traversable and for some reason you don't want to use mapM. D

On 2016-04-30 at 20:16, Dennis Raddle
main :: IO () main = traverse_ print $ map sin xs
Thanks. I'll see if this works for me. My question right now is, what is traverse_print? Is that the same as
main = traverse print . map sin $ xs
?
I'm guessing IO is traversable and for some reason you don't want to use mapM.
traverse_ is in Data.Foldable [1] You're right that it's closely related to `traverse` and `mapM`. I generally prefer `traverse` and `traverse_` to `mapM` and `mapM_` because they only require Applicative, not Monad. So they work in more cases, and generic code can be more generic. The versions with the _ give back `f ()` instead of `f b` - in this case, we get `IO ()` instead of `IO [()]`. If you try with `traverse, the program won't typecheck, because main needs to have type `IO ()`. bergey Footnotes: [1] http://hackage.haskell.org/package/base-4.8.2.0/docs/Data-Foldable.html#v:tr...

You might consider using the pipes library:
http://hackage.haskell.org/package/pipes-4.1.8/docs/Pipes-Tutorial.html
The pipes library will allow you to generate audio and write it to
disk with out having to worry if you are going to suck up all the RAM
accidentally.
It should also help you decompose your pipeline into smaller pieces.
For example, you would like to be able to decompose your code into
things like:
1. the code that generates the audio
2. the code that converts the audio into a format like wav/aiff/etc
3. the code that writes binary data to disk
And then simply glue those pieces together, while feeling secure that
you don't create a space leak in the process.
It will also allow you to use StateT or IO (or anything other type
with a Monad instance) if you need to.
The pipes library is not trivial to learn. But it is well designed.
Without using the pipes library you have two options:
1. understand how laziness works and then be very careful to make
sure you never accidentally hold onto data too long and cause a space
leak.
2. use strict IO functions and write code that generates the output
in little chunks at a time. The concept is, perhaps, easy to
understand. But as your code base grows, the code will become harder
to understand and to modify.
Assuming you chose to use the pipes library, you then also need to
decide what size chunks you want to work with. You could write all
your code to work with one sample at a time. That is to say, you could
always yield/await example one sample. But for better performance you
might decide to use buffers and work with 32, 64, 512, etc samples at
a time. That can provide better performance. However, it could also
make it harder to deal with modulating parameters. For example, if you
are doing a filter sweep, you might only be able to change the cutoff
frequency at the beginning of each buffer. So the sweep would not be
as smooth as it would be if you yielded/await single samples. Though,
in practice, I think you are pretty unlikely to notice the difference
for reasonably sized buffer sizes.
- jeremy
On Fri, Apr 29, 2016 at 10:58 PM, Dennis Raddle
I'm writing a program that will use functions to generate audio. The Haskell code will write the audio samples to disk---no need for real time playback. I see some useful libraries for writing audio files.
My question concerns efficiency when generating several million to 20 million samples (or even many times more than that if I use high-resolution sampling rates). They can be generated one at a time in sequence, so there's no need to occupy a lot of memory or postpone thunk evaluation. I'm going to need efficient disk writing. Note that I may need some pseudorandom numbers in my calculations, so I might want to calculate samples by state monadic computations to carry the generator state. What is my general strategy going to be for memory and time efficiency? I am pretty confused by Haskell "strictness" and normal head form and all that, which often doesn't seem to be very strict. Or bang patterns, etc. Is it going to be simple to understand what I need?
Dennis
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

On Sun, May 1, 2016 at 9:59 PM, Jeremy Shaw
You might consider using the pipes library:
http://hackage.haskell.org/package/pipes-4.1.8/docs/Pipes-Tutorial.html
The pipes library will allow you to generate audio and write it to disk with out having to worry if you are going to suck up all the RAM accidentally.
It should also help you decompose your pipeline into smaller pieces. For example, you would like to be able to decompose your code into things like:
1. the code that generates the audio 2. the code that converts the audio into a format like wav/aiff/etc 3. the code that writes binary data to disk
And then simply glue those pieces together, while feeling secure that you don't create a space leak in the process.
It will also allow you to use StateT or IO (or anything other type with a Monad instance) if you need to.
The pipes library is not trivial to learn. But it is well designed.
Without using the pipes library you have two options:
1. understand how laziness works and then be very careful to make sure you never accidentally hold onto data too long and cause a space leak.
2. use strict IO functions and write code that generates the output in little chunks at a time. The concept is, perhaps, easy to understand. But as your code base grows, the code will become harder to understand and to modify.
Thanks, Jeremy. At this time the code is experimental and may never go anywhere after the initial experiments, so if your option #2 is the easiest to get going, that would be my choice. Mike
participants (5)
-
Anton Felix Lorenzen
-
Daniel Bergey
-
Dennis Raddle
-
Jeffrey Brown
-
Jeremy Shaw