Did you compile with -O or -O2?

On Sat, May 9, 2009 at 6:50 PM, Kevin Haines <kevin.haines@ntlworld.com> wrote:
Hi All,

I'm trying to write a bit of code that maps each byte in a block of Word8's to 3xWord8 using an array; i.e. mapping from 8 bit to 24 bit colour (this is an OpenGL application, and I'm using textures).

I should point out that this is experimental code, and I'm still learning Haskell (and *loving* it, by the way!), so it probably looks a little unpolished.

First, some data:

data Palette = Palette { palRed :: Word8, palGrn :: Word8, palBlu :: Word8 }

palette = listArray (0,49) paletteList
paletteList = [
           Palette 0 0 0,
           Palette 0 0 0,
           Palette 0 0 0,
               .....



Then, my first implementation, which took 57% time under profiling, was:

loadTile :: Int -> Int -> IO (Ptr Word8)
loadTile lat lon = do
   terrainBytes <- readTile lat lon

   -- implementation #1
   mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1]

   free terrainBytes
   return rgbBytes

   where tileSize = 128
         paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO ()
         paletteMapper tb rgb idx = do
               v <- peekElemOff tb idx
               pokeByteOff rgb (idx*3) (palRed (palette!v))
               pokeByteOff rgb (idx*3+1) (palGrn (palette!v))
               pokeByteOff rgb (idx*3+2) (palBlu (palette!v))


I tried moving paletterMapper out of the 'where' clause and into the top level, which then took only 26% of time - i.e. half the time:


paletteMapper :: Ptr Word8 -> Ptr Word8 -> Int -> IO ()
paletteMapper tb rgb idx = do
  v <- peekElemOff tb idx
  pokeByteOff rgb (idx*3) (palRed (palette!v))
  pokeByteOff rgb (idx*3+1) (palGrn (palette!v))
 pokeByteOff rgb (idx*3+2) (palBlu (palette!v))

loadTile :: Int -> Int -> IO (Ptr Word8)
loadTile lat lon = do
   terrainBytes <- readTile lat lon

   -- implementation #1
   mapM_ (paletteMapper terrainBytes rgbBytes) [0..tileSize^2-1]

   free terrainBytes
   return rgbBytes

   where tileSize = 128


I don't understand why - the functions are the same, except for the scope they're in. Can anyone elaborate on what's happening?


Incidentally, I now realise a faster way (14%) is:

loadTile :: Int -> Int -> IO (Ptr Word8)
loadTile lat lon = do

   terrainBytes <- readTile lat lon

   rgbBytes <- mallocBytes (3*(tileSize^2))
   mapM_ (\x -> do
       v <- peekElemOff terrainBytes x
       pokeByteOff rgbBytes (x*3) (palRed (palette!v))
       pokeByteOff rgbBytes (x*3+1) (palGrn (palette!v))
       pokeByteOff rgbBytes (x*3+2) (palBlu (palette!v))
       ) [0..tileSize^2-1]


   free terrainBytes
   return rgbBytes

   where tileSize = 128


(There may be faster/better ways still, I'm all ears :-)

Cheers

Kevin
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners