
Hello, I am attempting to process images captured from a webcam. My aim is to do so, in real time, at the frame rate of the camera. I'm using GHC 6.4.2 with -O3. A frame consists of ~100k 24bit colour values. The webcam is interfaced through FFI bindings to some C++ -- these are all labelled 'unsafe'. The image is passed to Haskell as a Ptr Word8. To blit this to the screen (via Gtk2Hs) I do the following: data Cam = Cam { snap_width :: !Int , snap_height :: !Int , snap_bytespp :: !Int , snap_size :: !Int , cam_img :: Ptr Word8 , cam_obj :: ForeignPtr () } do (PixbufData _ dst _) <- (pixbufGetPixels pixbuf :: IO (PixbufData Int Word8)) copyBytes dst (cam_img cam) This achieves the desired throughput (25-29fps.) However, I am at a bit of a loss how to do something similar for preprocessing the data in Haskell before blitting the data (whilst also retaining some semblance of functional programming...) Currently, I have: cam_snap cam f x = do let loop (r:g:b:rest) n x = f r g b n x >>= loop rest (n+3) loop _ _ x = return x px <- peekArray (snap_size cam) (cam_img cam) loop px 0 x cam_snap2 cam f x = let loop ptr n x | n >= snap_size cam = return x | otherwise = do let ptrs = scanl plusPtr ptr [1,1] [r,g,b] <- mapM peek ptrs f r g b n x >>= loop (ptr `plusPtr` 3) (n+3) in loop (cam_img cam) 0 x do ... let sum_px r g b _ (sr,sg,sb) = return (sr+r,sg+g,sb+b) sum <- cam_snap (cam ui) sum_px (0.0,0.0,0.0) print sum cam_snap only processes at 5 fps, whereas cam_snap2 operates at 6fps. Any suggestions? Thanks.

Jeff Briggs wrote:
Hello,
I am attempting to process images captured from a webcam. My aim is to do so, in real time, at the frame rate of the camera. I'm using GHC 6.4.2 with -O3. A frame consists of ~100k 24bit colour values.
The webcam is interfaced through FFI bindings to some C++ -- these are all labelled 'unsafe'. The image is passed to Haskell as a Ptr Word8.
To blit this to the screen (via Gtk2Hs) I do the following:
data Cam = Cam { snap_width :: !Int , snap_height :: !Int , snap_bytespp :: !Int , snap_size :: !Int , cam_img :: Ptr Word8 , cam_obj :: ForeignPtr () }
do (PixbufData _ dst _) <- (pixbufGetPixels pixbuf :: IO (PixbufData Int Word8)) copyBytes dst (cam_img cam)
This achieves the desired throughput (25-29fps.) However, I am at a bit of a loss how to do something similar for preprocessing the data in Haskell before blitting the data (whilst also retaining some semblance of functional programming...)
Currently, I have:
cam_snap cam f x = do let loop (r:g:b:rest) n x = f r g b n x >>= loop rest (n+3) loop _ _ x = return x px <- peekArray (snap_size cam) (cam_img cam) loop px 0 x
cam_snap2 cam f x = let loop ptr n x | n >= snap_size cam = return x | otherwise = do let ptrs = scanl plusPtr ptr [1,1] [r,g,b] <- mapM peek ptrs f r g b n x >>= loop (ptr `plusPtr` 3) (n+3) in loop (cam_img cam) 0 x
do ... let sum_px r g b _ (sr,sg,sb) = return (sr+r,sg+g,sb+b) sum <- cam_snap (cam ui) sum_px (0.0,0.0,0.0) print sum
cam_snap only processes at 5 fps, whereas cam_snap2 operates at 6fps.
Any suggestions?
I suggest trying something, using "/usr/bin/ghc -O3 -optc-O3" like this:
{-# OPTIONS_GHC -funbox-strict-fields #-}
import Foreign import Control.Monad
data Cam = Cam { snap_width :: !Int , snap_height :: !Int , snap_bytespp :: !Int , snap_size :: !Int , cam_img :: Ptr Word8 , cam_obj :: ForeignPtr () }
type F = Word8 -> Word8 -> Word8 -> Int -> Int -> Int
{-# INLINE cam_snap_3 #-} cam_snap_3 :: Cam -> F -> Int -> IO Int cam_snap_3 cam f x = let end = snap_size cam loop ptr n x | ptr `seq` n `seq` x `seq` False = undefined | n >= end = return x | otherwise = do r <- peek ptr g <- peek (advancePtr ptr 1) b <- peek (advancePtr ptr 2) loop (advancePtr ptr 3) (n+3) (f r g b n x) in loop (cam_img cam) 0 x

Hello Chris, Saturday, August 5, 2006, 3:47:19 AM, you wrote:
in Haskell before blitting the data (whilst also retaining some semblance of functional programming...)
the best way to optimize Haskell program (with current technologies) is to rewrite it in strict & imperative manner:
cam_snap_3 cam f x = let end = snap_size cam loop ptr n x | ptr `seq` n `seq` x `seq` False = undefined | n >= end = return x | otherwise = do r <- peek ptr g <- peek (advancePtr ptr 1) b <- peek (advancePtr ptr 2) loop (advancePtr ptr 3) (n+3) (f r g b n x) in loop (cam_img cam) 0 x
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Chris,
Saturday, August 5, 2006, 3:47:19 AM, you wrote:
in Haskell before blitting the data (whilst also retaining some semblance of functional programming...)
the best way to optimize Haskell program (with current technologies) is to rewrite it in strict & imperative manner:
Strict, very often, since we get unboxed types out of ghc. Imperative, not always (and will be less so with Data.ByteString -- since we don't need to drop into IO to get peek/poke). -- Don

Hello Donald, Sunday, August 6, 2006, 7:03:45 AM, you wrote:
the best way to optimize Haskell program (with current technologies) is to rewrite it in strict & imperative manner:
Strict, very often, since we get unboxed types out of ghc. Imperative, not always (and will be less so with Data.ByteString -- since we don't need to drop into IO to get peek/poke).
FPS just implements some algorithms in imperative code and gives functional interface to them. if you need one of implemented algorithms - you can avoid programming imperative code himself. but that is true for any other library. the only difference is that FPS will be much more used than average lib. btw, is it possible to rewrite this algorithm in more high-level way using FPS, of course with more or less good speed? and imperative for me don't mean "in IO/ST monad". imperative mean that your program is sequence of steps required to compute result instead of function that translates original object to resulting one -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 06/08/06, Bulat Ziganshin
more used than average lib. btw, is it possible to rewrite this algorithm in more high-level way using FPS, of course with more or less good speed?
Using Data.ByteString, I see no noticeable decrease in performance. data C = R|G|B deriving Show ... B.useAsCStringLen image $ \ (src,len) -> copyBytes dst (castPtr src) len print $ B.foldl sum_rgb (0,0,0,R) image ... sum_rgb (r,g,b,s) px = r `seq` g `seq` b `seq` case s of R -> (r+px, g, b, G) G -> (r, g+px, b, B) _ -> (r, g, b+px, R) Fantastic library.

On 05/08/06, Chris Kuklewicz
I suggest trying something, using "/usr/bin/ghc -O3 -optc-O3" like this:
{-# OPTIONS_GHC -funbox-strict-fields #-}
import Foreign import Control.Monad
data Cam = Cam { snap_width :: !Int , snap_height :: !Int , snap_bytespp :: !Int , snap_size :: !Int , cam_img :: Ptr Word8 , cam_obj :: ForeignPtr () }
type F = Word8 -> Word8 -> Word8 -> Int -> Int -> Int
{-# INLINE cam_snap_3 #-} cam_snap_3 :: Cam -> F -> Int -> IO Int cam_snap_3 cam f x = let end = snap_size cam loop ptr n x | ptr `seq` n `seq` x `seq` False = undefined | n >= end = return x | otherwise = do r <- peek ptr g <- peek (advancePtr ptr 1) b <- peek (advancePtr ptr 2) loop (advancePtr ptr 3) (n+3) (f r g b n x) in loop (cam_img cam) 0 x
Ah, so excessive laziness and IO were killing it! Thanks! This works most excellently :)
participants (4)
-
Bulat Ziganshin
-
Chris Kuklewicz
-
dons@cse.unsw.edu.au
-
Jeff Briggs