Performance of function defined in a 'where' clause

Hi All, I'm trying to write a bit of code that maps each byte in a block of Word8's to 3xWord8 using an array; i.e. mapping from 8 bit to 24 bit colour (this is an OpenGL application, and I'm using textures). I should point out that this is experimental code, and I'm still learning Haskell (and *loving* it, by the way!), so it probably looks a little unpolished. First, some data: data Palette = Palette { palRed :: Word8, palGrn :: Word8, palBlu :: Word8 } palette = listArray (0,49) paletteList paletteList = [ Palette 0 0 0, Palette 0 0 0, Palette 0 0 0, ..... Then, my first implementation, which took 57% time under profiling, was: loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do terrainBytes <- readTile lat lon -- implementation #1 mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1] free terrainBytes return rgbBytes where tileSize = 128 paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO () paletteMapper tb rgb idx = do v <- peekElemOff tb idx pokeByteOff rgb (idx*3) (palRed (palette!v)) pokeByteOff rgb (idx*3+1) (palGrn (palette!v)) pokeByteOff rgb (idx*3+2) (palBlu (palette!v)) I tried moving paletterMapper out of the 'where' clause and into the top level, which then took only 26% of time - i.e. half the time: paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO () paletteMapper tb rgb idx = do v <- peekElemOff tb idx pokeByteOff rgb (idx*3) (palRed (palette!v)) pokeByteOff rgb (idx*3+1) (palGrn (palette!v)) pokeByteOff rgb (idx*3+2) (palBlu (palette!v)) loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do terrainBytes <- readTile lat lon -- implementation #1 mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1] free terrainBytes return rgbBytes where tileSize = 128 I don't understand why - the functions are the same, except for the scope they're in. Can anyone elaborate on what's happening? Incidentally, I now realise a faster way (14%) is: loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do terrainBytes <- readTile lat lon rgbBytes <- mallocBytes (3*(tileSize^2)) mapM_ (\x -> do v <- peekElemOff terrainBytes x pokeByteOff rgbBytes (x*3) (palRed (palette!v)) pokeByteOff rgbBytes (x*3+1) (palGrn (palette!v)) pokeByteOff rgbBytes (x*3+2) (palBlu (palette!v)) ) [0..tileSize^2-1] free terrainBytes return rgbBytes where tileSize = 128 (There may be faster/better ways still, I'm all ears :-) Cheers Kevin

On Sat, May 09, 2009 at 05:50:33PM +0100, Kevin Haines wrote:
palette = listArray (0,49) paletteList paletteList = [ Palette 0 0 0, Palette 0 0 0, Palette 0 0 0, .....
By the way, this should probably be something like palette = listArray (0,49) (replicate 50 $ Palette 0 0 0) Unless, of course, there are things other than Palette 0 0 0 in the remainder of the list. If you ever have to copy and paste *anything* in a Haskell program, warning bells should start going off in your head. =) Unfortunately, performance tuning is a black art of which I know precious little. Perhaps others will have some good suggestions. -Brent

Did you compile with -O or -O2?
On Sat, May 9, 2009 at 6:50 PM, Kevin Haines
Hi All,
I'm trying to write a bit of code that maps each byte in a block of Word8's to 3xWord8 using an array; i.e. mapping from 8 bit to 24 bit colour (this is an OpenGL application, and I'm using textures).
I should point out that this is experimental code, and I'm still learning Haskell (and *loving* it, by the way!), so it probably looks a little unpolished.
First, some data:
data Palette = Palette { palRed :: Word8, palGrn :: Word8, palBlu :: Word8 }
palette = listArray (0,49) paletteList paletteList = [ Palette 0 0 0, Palette 0 0 0, Palette 0 0 0, .....
Then, my first implementation, which took 57% time under profiling, was:
loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do terrainBytes <- readTile lat lon
-- implementation #1 mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1]
free terrainBytes return rgbBytes
where tileSize = 128 paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO () paletteMapper tb rgb idx = do v <- peekElemOff tb idx pokeByteOff rgb (idx*3) (palRed (palette!v)) pokeByteOff rgb (idx*3+1) (palGrn (palette!v)) pokeByteOff rgb (idx*3+2) (palBlu (palette!v))
I tried moving paletterMapper out of the 'where' clause and into the top level, which then took only 26% of time - i.e. half the time:
paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO () paletteMapper tb rgb idx = do v <- peekElemOff tb idx pokeByteOff rgb (idx*3) (palRed (palette!v)) pokeByteOff rgb (idx*3+1) (palGrn (palette!v)) pokeByteOff rgb (idx*3+2) (palBlu (palette!v))
loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do terrainBytes <- readTile lat lon
-- implementation #1 mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1]
free terrainBytes return rgbBytes
where tileSize = 128
I don't understand why - the functions are the same, except for the scope they're in. Can anyone elaborate on what's happening?
Incidentally, I now realise a faster way (14%) is:
loadTile :: Int -> Int -> IO (Ptr Word8) loadTile lat lon = do
terrainBytes <- readTile lat lon
rgbBytes <- mallocBytes (3*(tileSize^2)) mapM_ (\x -> do v <- peekElemOff terrainBytes x pokeByteOff rgbBytes (x*3) (palRed (palette!v)) pokeByteOff rgbBytes (x*3+1) (palGrn (palette!v)) pokeByteOff rgbBytes (x*3+2) (palBlu (palette!v)) ) [0..tileSize^2-1]
free terrainBytes return rgbBytes
where tileSize = 128
(There may be faster/better ways still, I'm all ears :-)
Cheers
Kevin _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

I used -O. Also, to answer Brent's posting, the Palette list is different further down but I did take the tip and replace the pasted lines with replicate, so thanks! Cheers Kevin Peter Verswyvelen wrote:
Did you compile with -O or -O2?
On Sat, May 9, 2009 at 6:50 PM, Kevin Haines
mailto:kevin.haines@ntlworld.com> wrote:
participants (3)
-
Brent Yorgey
-
Kevin Haines
-
Peter Verswyvelen