
I always thought that signal processing must be a good benchmark for compiler optimizations and optimizer rules of libraries like 'binary' and 'fps'. Many signal processing functions process a signal from the beginning to the end with a little of state, thus in many cases I expect that they can be translated to simple loops. However, I still cannot get the necessary speed for real-time processing out of 'binary' and 'fps'. In my setup a signal is of type [Double]. After a series of sound transformations a signal is finally converted to [Int16] and then to ByteString. Can I expect that interim signals represented as lists are optimized away? I have simple example which writes zeros to disk. The content of the first file is created by ByteString.replicate for the purpose of comparison with the second file. With the second file I want to check the speed of the conversion from [Int16] to ByteString. module Main (main) where import System.Time (getClockTime, diffClockTimes, tdSec, tdPicosec) import qualified Data.ByteString.Lazy as B import qualified Data.Binary.Put as Bin import Foreign (Int16) signalToBinaryPut :: [Int16] -> B.ByteString signalToBinaryPut = Bin.runPut . mapM_ (Bin.putWord16host . fromIntegral) writeSignalBinaryPut :: FilePath -> [Int16] -> IO () writeSignalBinaryPut fileName = B.writeFile fileName . signalToBinaryPut measureTime :: String -> IO () -> IO () measureTime name act = do putStr (name++": ") timeA <- getClockTime act timeB <- getClockTime let td = diffClockTimes timeB timeA print (fromIntegral (tdSec td) + fromInteger (tdPicosec td) * 1e-12 :: Double) numSamples :: Int numSamples = 1000000 zeroSignal16 :: [Int16] zeroSignal16 = replicate numSamples 0 zeroByteString :: B.ByteString zeroByteString = B.replicate (fromIntegral (2 * numSamples)) 0 main :: IO () main = do measureTime "write zero bytestring" (B.writeFile "zero-bytestring.sw" zeroByteString) measureTime "put zero int16" (writeSignalBinaryPut "zero-int16string.sw" zeroSignal16) The program is compiled with GHC-6.4 and option -O2, CPU clock 1.7 GHz. This yields: $ speedtest write zero bytestring: 2.5674e-2 put zero int16: 1.080541 That is, not using ByteString.replicate and converting from [Int16] to ByteString slows down computation by a factor of 40. What am I doing wrong?

Henning Thielemann wrote:
The program is compiled with GHC-6.4 and option -O2, CPU clock 1.7 GHz.
ByteString is much faster with GHC 6.6, IIRC. We optimised the representation of ForeignPtr, and ByteString takes advantage of that. I recommend upgrading. Cheers, Simon

simonmarhaskell:
Henning Thielemann wrote:
The program is compiled with GHC-6.4 and option -O2, CPU clock 1.7 GHz.
ByteString is much faster with GHC 6.6, IIRC. We optimised the representation of ForeignPtr, and ByteString takes advantage of that. I recommend upgrading.
Yes, a 2x speedup isn't uncommon. ByteString is even faster with the GHC head, branch, given the cranked up rules and constructor specialisation. -- Don

On Sun, 17 Jun 2007, Donald Bruce Stewart wrote:
simonmarhaskell:
Henning Thielemann wrote:
The program is compiled with GHC-6.4 and option -O2, CPU clock 1.7 GHz.
ByteString is much faster with GHC 6.6, IIRC. We optimised the representation of ForeignPtr, and ByteString takes advantage of that. I recommend upgrading.
Yes, a 2x speedup isn't uncommon.
Indeed, in my simple example the speedup factor was 2. However this is still far from being enough for real-time signal processing. I found another problem: The rounding functions from RealFrac are much slower than GHC.Float.double2Int. I've set up a bug ticket in GHC trac.
ByteString is even faster with the GHC head, branch, given the cranked up rules and constructor specialisation.
Can I test them without compiling GHC myself? I.e. can I still install the FPS package separately?

lemming:
On Sun, 17 Jun 2007, Donald Bruce Stewart wrote:
simonmarhaskell:
Henning Thielemann wrote:
The program is compiled with GHC-6.4 and option -O2, CPU clock 1.7 GHz.
ByteString is much faster with GHC 6.6, IIRC. We optimised the representation of ForeignPtr, and ByteString takes advantage of that. I recommend upgrading.
Yes, a 2x speedup isn't uncommon.
Indeed, in my simple example the speedup factor was 2. However this is still far from being enough for real-time signal processing. I found another problem: The rounding functions from RealFrac are much slower than GHC.Float.double2Int. I've set up a bug ticket in GHC trac.
if there's floating point math involved, carefully check the Core output. -ddump-simpl -O2
ByteString is even faster with the GHC head, branch, given the cranked up rules and constructor specialisation.
Can I test them without compiling GHC myself? I.e. can I still install the FPS package separately?
oh, i was just suggesting trying the GHC HEAD branch for its improved optimisations, not the `unstable' branch of fps. -- Don
participants (3)
-
dons@cse.unsw.edu.au
-
Henning Thielemann
-
Simon Marlow