poor perfomance of indexU in uvector package

Hello This post meant to be literate haskell. I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2 Code below converts square 2D array to list of 1D arrays. Summation of array contents is done in force evaluation
import Control.Monad import Control.Monad.ST import Data.Array.Vector import System
arr :: Int -> UArr Double arr n = toU $ map fromIntegral [1 .. n*n]
This is fastest function. It slice arrays along another direction and used mainly as upper bound of speed
sliceY :: Int -> UArr Double -> [UArr Double] sliceY n a = map (\i -> sliceU a (i*n) n) [0 .. n-1]
Naive implementation using lists and index lookup. 2.15 second for 200*200 array
sliceXlist :: Int -> UArr Double -> [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y -> indexU a (x + y*n)) [0 .. n-1]
Similar implementation in ST monad and it uses indexU too. 2.14 seconds for 200*200 array
sliceXst :: Int -> UArr Double -> [UArr Double] sliceXst n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr <- newMU n forM_ [0 .. n-1] $ \y -> writeMU arr y (indexU a (y*n + x)) unsafeFreezeAllMU arr
This implementation avoids use of indexU by copying entire 2D array into mutable array and using it for lookup. Surprisingly it outperform previsious implementations for sufficiently big n 1.19 seconds for 200*200 array
sliceXcopy :: Int -> UArr Double -> [UArr Double] sliceXcopy n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr <- newMU n cp <- newMU (n*n) copyMU cp 0 a forM_ [0 .. n-1] $ \y -> writeMU arr y =<< readMU cp (y*n + x) unsafeFreezeAllMU arr
This is another implementation with lists which convert whole array to list and picks appropriate element it. It is fastest implementation so far. 0.039 seconds for 200*200 array
sliceXlistfast :: Int -> UArr Double -> [UArr Double] sliceXlistfast n a = map mkSlice [0 .. n-1] where takeEvery n [] = [] takeEvery n (x:xs) = x : takeEvery n (drop (n-1) xs) mkSlice x = toU $ takeEvery n . drop x $ fromU a
---------------------------------------------------------------- main :: IO () main = do [str,a] <- getArgs let n = read str case a of "y" -> print $ sum $ map sumU (sliceY n (arr n)) "list" -> print $ sum $ map sumU (sliceXlist n (arr n)) "lf" -> print $ sum $ map sumU (sliceXlistfast n (arr n)) "st" -> print $ sum $ map sumU (sliceXst n (arr n)) "copy" -> print $ sum $ map sumU (sliceXcopy n (arr n))

On Sun, Nov 15, 2009 at 03:00:34PM +0300, Alexey Khudyakov wrote:
Naive implementation using lists and index lookup. 2.15 second for 200*200 array
sliceXlist :: Int -> UArr Double -> [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y -> indexU a (x + y*n)) [0 .. n-1]
Have you tried something like mkSlice x = mapU (\y -> indexU a (x + y*n)) $ enumFromToU 0 (n-1) I guess it should be a lot faster :). Also, I would recomend using criterion. Another implementation you may try is a' = mapU (\(i :*: x) -> (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) -> i == j) a' HTH, -- Felipe.

On Sun, Nov 15, 2009 at 5:50 PM, Felipe Lessa
On Sun, Nov 15, 2009 at 03:00:34PM +0300, Alexey Khudyakov wrote:
Naive implementation using lists and index lookup. 2.15 second for 200*200 array
sliceXlist :: Int -> UArr Double -> [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y -> indexU a (x + y*n)) [0 .. n-1]
Have you tried something like
mkSlice x = mapU (\y -> indexU a (x + y*n)) $ enumFromToU 0 (n-1)
I guess it should be a lot faster :).
No it doesn't. There is no significant difference between two variant above. I think any program which uses indexU will be slowed to crawl. Seems like a bug for me.
Another implementation you may try is
a' = mapU (\(i :*: x) -> (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) -> i == j) a'
This one is fastest so far
Also, I would recomend using criterion.
I tried to do so.. But it depends on gtk2hs and it is too difficult to install

On Sun, Nov 15, 2009 at 06:16:03PM +0300, Alexey Khudyakov wrote:
Another implementation you may try is
a' = mapU (\(i :*: x) -> (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) -> i == j) a'
This one is fastest so far
Nice! Just for the record, of course I meant 'sndU' :). Thanks god Haskell is statically typed and that error should be caught rather easily. -- Felipe.

alexey.skladnoy:
Hello
This post meant to be literate haskell.
I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2
You're using the streamed version when its not fusing. Use the non-streaming direct implementation exported from Data.Array.Vector.UArr This is really an API bug, but I've not had time to sanitize the use.

On Sun, Nov 15, 2009 at 8:59 PM, Don Stewart
alexey.skladnoy:
I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2
You're using the streamed version when its not fusing. Use the non-streaming direct implementation exported from Data.Array.Vector.UArr
This is really an API bug, but I've not had time to sanitize the use.
Probably this should be stated more explicitly in documentation. This is
_very_ unexpected and confusing behaviour.
Also I don't quite understand nature of bug. Is this missing export or
wrong function is exported. And is streamed version of idexU useful
and in which way?
On Sun, Nov 15, 2009 at 9:11 PM, Thomas DuBuisson
The documentation explicitly says indexU is O(n) - no need for so much testing to rediscover that fact. When I needed a contiguous block of values in UArr, I just relied on sliceU to acquire the block and performed a foldU.
Problems begin when you need non-contiguous block. Easiest way to so is indexing.

The documentation explicitly says indexU is O(n) - no need for so much
testing to rediscover that fact. When I needed a contiguous block of
values in UArr, I just relied on sliceU to acquire the block and
performed a foldU.
Thomas
On Sun, Nov 15, 2009 at 4:00 AM, Alexey Khudyakov
Hello
This post meant to be literate haskell.
I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2
Code below converts square 2D array to list of 1D arrays. Summation of array contents is done in force evaluation
import Control.Monad import Control.Monad.ST import Data.Array.Vector import System
arr :: Int -> UArr Double arr n = toU $ map fromIntegral [1 .. n*n]
This is fastest function. It slice arrays along another direction and used mainly as upper bound of speed
sliceY :: Int -> UArr Double -> [UArr Double] sliceY n a = map (\i -> sliceU a (i*n) n) [0 .. n-1]
Naive implementation using lists and index lookup. 2.15 second for 200*200 array
sliceXlist :: Int -> UArr Double -> [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y -> indexU a (x + y*n)) [0 .. n-1]
Similar implementation in ST monad and it uses indexU too. 2.14 seconds for 200*200 array
sliceXst :: Int -> UArr Double -> [UArr Double] sliceXst n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr <- newMU n forM_ [0 .. n-1] $ \y -> writeMU arr y (indexU a (y*n + x)) unsafeFreezeAllMU arr
This implementation avoids use of indexU by copying entire 2D array into mutable array and using it for lookup. Surprisingly it outperform previsious implementations for sufficiently big n 1.19 seconds for 200*200 array
sliceXcopy :: Int -> UArr Double -> [UArr Double] sliceXcopy n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr <- newMU n cp <- newMU (n*n) copyMU cp 0 a forM_ [0 .. n-1] $ \y -> writeMU arr y =<< readMU cp (y*n + x) unsafeFreezeAllMU arr
This is another implementation with lists which convert whole array to list and picks appropriate element it. It is fastest implementation so far. 0.039 seconds for 200*200 array
sliceXlistfast :: Int -> UArr Double -> [UArr Double] sliceXlistfast n a = map mkSlice [0 .. n-1] where takeEvery n [] = [] takeEvery n (x:xs) = x : takeEvery n (drop (n-1) xs) mkSlice x = toU $ takeEvery n . drop x $ fromU a
---------------------------------------------------------------- main :: IO () main = do [str,a] <- getArgs let n = read str case a of "y" -> print $ sum $ map sumU (sliceY n (arr n)) "list" -> print $ sum $ map sumU (sliceXlist n (arr n)) "lf" -> print $ sum $ map sumU (sliceXlistfast n (arr n)) "st" -> print $ sum $ map sumU (sliceXst n (arr n)) "copy" -> print $ sum $ map sumU (sliceXcopy n (arr n))
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (6)
-
Alexey Khudyakov
-
Don Stewart
-
Felipe Lessa
-
Roman Leshchinskiy
-
Thomas DuBuisson
-
Yusaku Hashimoto