3GB allocation, 75% CPU usage from a callback function

I recently got the jack package from hackage working again. For those unfamiliar, jack is a callback-based audio server. Writing a client entails importing the C library or its bindings (the Sound.JACK module in my case), creating a client and some ports (using provided library functions), and then registering callbacks for audio processing. I've written a simple program that outputs a sine wave through JACK. The server's sample rate is 44100, which means that this function must be called 44100 times/second (it is buffered, so generally that would be in chunks of 64, 128, or 256). It is an IO function, which gives the only opportunity to keep track of time: (Note that the function produced by newSinWave is the one actually registered as a callback: newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave sampleRate freq = do ioref <- newIORef (0::Integer) let multiplier = 2 * pi * freq / (fromIntegral sampleRate) return (\_ -> {-# SCC "sinWave" #-} do t <- readIORef ioref modifyIORef ioref (+1) return $ fromRational $ toRational $ sin (fromIntegral t * multiplier)) I profiled this since when my program registered with the jack server and started taking callbacks, it was using about 75% cpu (in contrast, the echo program included with the jack package uses less than 2%). The following two lines are of concern to me: "total alloc = 3,040,397,164 bytes (excludes profiling overheads)" My program uses 3GB of virtual memory over a 15 second run? " sinWave Main 341 1173295 100.0 100.0 0.0 0.0" and ~100% of that 75% cpu time is being spent in my callback. Is there something I'm doing wrong? At the very least, it shouldn't be using 3GB of memory. The only thing that needs to be saved between callbacks is the IORef, which is storing an Int. I assume that evaluating that whole construct in haskell may be too much timewise to put in a sound callback (or perhaps not), but 3GB of memory is ridiculous. Thoughts/hints/"you're doing it wrong" anyone? -- Edward Amsden Undergraduate Computer Science Rochester Institute of Technology www.edwardamsden.com

On Saturday 29 January 2011 17:29:02, Edward Amsden wrote:
(Note that the function produced by newSinWave is the one actually registered as a callback:
newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave sampleRate freq = do ioref <- newIORef (0::Integer) let multiplier = 2 * pi * freq / (fromIntegral sampleRate)
return (\_ -> {-# SCC "sinWave" #-} do t <- readIORef ioref modifyIORef ioref (+1) return $ fromRational $ toRational $
fromRational . toRational is a big no-no here. It will still be slow when the new implementation makes it into base, but currently it's slooooooooooooow. If you replace it with realToFrac and compile with optimisations, it should be rewritten to a no-op.
sin (fromIntegral t * multiplier))
I profiled this since when my program registered with the jack server and started taking callbacks, it was using about 75% cpu (in contrast, the echo program included with the jack package uses less than 2%). The following two lines are of concern to me:
"total alloc = 3,040,397,164 bytes (excludes profiling overheads)" My program uses 3GB of virtual memory over a 15 second run?
Sequentially. With 44100 calls per second, you'd have 661500 calls. Each call to toRational allocates several Integers, the calls to fromRational even more. I wouldn't expect it to be 4k per wave call, but see what figures you get with realToFrac instead of fromRational . toRational.
" sinWave Main 341 1173295 100.0 100.0 0.0 0.0" and ~100% of that 75% cpu time is being spent in my callback.
Is there something I'm doing wrong? At the very least, it shouldn't be using 3GB of memory. The only thing that needs to be saved between callbacks is the IORef, which is storing an Int. I assume that evaluating that whole construct in haskell may be too much timewise to put in a sound callback (or perhaps not), but 3GB of memory is ridiculous.
Thoughts/hints/"you're doing it wrong" anyone?

Thanks much!
On Sat, Jan 29, 2011 at 12:01 PM, Daniel Fischer
On Saturday 29 January 2011 17:29:02, Edward Amsden wrote:
(Note that the function produced by newSinWave is the one actually registered as a callback:
newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave sampleRate freq = do ioref <- newIORef (0::Integer) let multiplier = 2 * pi * freq / (fromIntegral sampleRate)
return (\_ -> {-# SCC "sinWave" #-} do t <- readIORef ioref modifyIORef ioref (+1) return $ fromRational $ toRational $
fromRational . toRational is a big no-no here. It will still be slow when the new implementation makes it into base, but currently it's slooooooooooooow. If you replace it with realToFrac and compile with optimisations, it should be rewritten to a no-op.
And suddenly I get a very reasonable 3% cpu load when I use realToFrac. Thanks!
I profiled this since when my program registered with the jack server and started taking callbacks, it was using about 75% cpu (in contrast, the echo program included with the jack package uses less than 2%). The following two lines are of concern to me:
"total alloc = 3,040,397,164 bytes (excludes profiling overheads)" My program uses 3GB of virtual memory over a 15 second run?
Sequentially. With 44100 calls per second, you'd have 661500 calls. Each call to toRational allocates several Integers, the calls to fromRational even more. I wouldn't expect it to be 4k per wave call, but see what figures you get with realToFrac instead of fromRational . toRational. And with realToFrac and -O3, 58MB total allocation instead.
Thanks for clearing that up. -- Edward Amsden Undergraduate Computer Science Rochester Institute of Technology www.edwardamsden.com

Hi, First of all, don't be fooled by the alloc statistic. That is not 3GB memory residency, that's 3GB allocation, which was interspersed with lots of garbage collections, in the same way that measuring how many times malloc was called in a C program doesn't necessarily indicate memory residency. Using +RTS -s, it looks like your program uses around 10MB at any one time. As for the speed, your program is doing a lot of conversions that aren't necessary. CFloat has the Num and Floating instances necessary to use sin, so you're better off making everything a CFloat, rather than converting to and from Float. I took your program and ironed it out a bit (you were also using an extra readIORef as part of the modifyIORef that you didn't need), and used Criterion to test the speed. Here's the program in its entirety (you'll need to "cabal install criterion"): === import Data.IORef import Foreign.C.Types import Criterion.Main newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave sampleRate freq = do ioref <- newIORef (0::Integer) let multiplier = 2 * pi * freq / (fromIntegral sampleRate) return (\ _ -> {-# SCC "sinWave" #-} do t <- readIORef ioref modifyIORef ioref (+1) return $ fromRational $ toRational $ sin (fromIntegral t * multiplier)) newSinWave' :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave' sampleRate freq = do ioref <- newIORef 0 let multiplier = 2 * pi * (realToFrac freq) / (fromIntegral sampleRate) return (\ _ -> {-# SCC "sinWave'" #-} do t <- readIORef ioref writeIORef ioref (t+1) return $ sin (t * multiplier)) runLots :: (a -> IO a) -> a -> IO a runLots f = go 10000 where go 0 !x = return x go n !x = f x >>= go (n - 1) main :: IO () main = do f <- newSinWave 44100 100 g <- newSinWave' 44100 100 defaultMain [bench "old" $ runLots f 0, bench "new" $ runLots g 0] === And here's the output from Criterion on my machine, compiled with -XBangPatterns -O1 -rtsopts: === benchmarking old collecting 100 samples, 1 iterations each, in estimated 10.54111 s bootstrapping with 100000 resamples mean: 116.4734 ms, lb 116.2565 ms, ub 117.1492 ms, ci 0.950 std dev: 1.794715 ms, lb 626.6683 us, ub 3.992824 ms, ci 0.950 found 5 outliers among 100 samples (5.0%) 1 (1.0%) low severe 3 (3.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 0.993% variance is unaffected by outliers benchmarking new collecting 100 samples, 2 iterations each, in estimated 1.417208 s bootstrapping with 100000 resamples mean: 10.33277 ms, lb 10.15559 ms, ub 10.50883 ms, ci 0.950 std dev: 904.9297 us, lb 845.3293 us, ub 973.6881 us, ci 0.950 variance introduced by outliers: 1.000% variance is unaffected by outliers === So unless I've done something wrong in the methodology (always possible), that's made it ten times faster. And here's the output from +RTS -s: === 6,458,290,512 bytes allocated in the heap 10,855,744 bytes copied during GC 5,522,696 bytes maximum residency (5 sample(s)) 3,194,696 bytes maximum slop 13 MB total memory in use (0 MB lost due to fragmentation) === Hope that helps, Neil. On 29/01/2011 16:29, Edward Amsden wrote:
I recently got the jack package from hackage working again. For those unfamiliar, jack is a callback-based audio server. Writing a client entails importing the C library or its bindings (the Sound.JACK module in my case), creating a client and some ports (using provided library functions), and then registering callbacks for audio processing.
I've written a simple program that outputs a sine wave through JACK. The server's sample rate is 44100, which means that this function must be called 44100 times/second (it is buffered, so generally that would be in chunks of 64, 128, or 256). It is an IO function, which gives the only opportunity to keep track of time:
(Note that the function produced by newSinWave is the one actually registered as a callback:
newSinWave :: Int -> Float -> IO (CFloat -> IO CFloat) newSinWave sampleRate freq = do ioref<- newIORef (0::Integer) let multiplier = 2 * pi * freq / (fromIntegral sampleRate)
return (\_ -> {-# SCC "sinWave" #-} do t<- readIORef ioref modifyIORef ioref (+1) return $ fromRational $ toRational $ sin (fromIntegral t * multiplier))
I profiled this since when my program registered with the jack server and started taking callbacks, it was using about 75% cpu (in contrast, the echo program included with the jack package uses less than 2%). The following two lines are of concern to me:
"total alloc = 3,040,397,164 bytes (excludes profiling overheads)" My program uses 3GB of virtual memory over a 15 second run?
" sinWave Main 341 1173295 100.0 100.0 0.0 0.0" and ~100% of that 75% cpu time is being spent in my callback.
Is there something I'm doing wrong? At the very least, it shouldn't be using 3GB of memory. The only thing that needs to be saved between callbacks is the IORef, which is storing an Int. I assume that evaluating that whole construct in haskell may be too much timewise to put in a sound callback (or perhaps not), but 3GB of memory is ridiculous.
Thoughts/hints/"you're doing it wrong" anyone?
participants (4)
-
Bas van Dijk
-
Daniel Fischer
-
Edward Amsden
-
Neil Brown