
Hello! I am trying to implement a binary search function that returns the index of an exact or the (index + 1) where the item should be inserted in an array if the item to be searched is not found (I am not trying to insert data in the array) . Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times. Do you have any ideas on how to improve the performance of this function? import Data.Array.IArray type IntArray a = Array Int a -- The array must be 0 indexed. binarySearch :: Ord a => a -> IntArray a -> Int binarySearch query array = let (low, high) = bounds array in binarySearch' query array low high binarySearch' :: Ord a => a -> IntArray a -> Int -> Int -> Int binarySearch' query array !low !high | low <= high = let ! mid = low + ((high - low) `div` 2) ! midVal = array ! mid in next mid midVal | otherwise = -(low + 1) where next mid midVal | midVal < query = binarySearch' query array (mid + 1) high | midVal > query = binarySearch' query array low (mid - 1) | otherwise = mid Thank you! Arnoldo Muller

Am Donnerstag 18 März 2010 19:59:33 schrieb Arnoldo Muller:
Hello!
I am trying to implement a binary search function that returns the index of an exact or the (index + 1) where the item should be inserted in an array if the item to be searched is not found (I am not trying to insert data in the array) .
Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times.
If it's called often, and the arrays are 0-based and Int-indexed, import Data.Array.Base (unsafeAt) and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster.
Do you have any ideas on how to improve the performance of this function?
import Data.Array.IArray
type IntArray a = Array Int a
-- The array must be 0 indexed. binarySearch :: Ord a => a -> IntArray a -> Int binarySearch query array = let (low, high) = bounds array in binarySearch' query array low high
binarySearch' :: Ord a => a -> IntArray a -> Int -> Int -> Int binarySearch' query array !low !high
| low <= high = let ! mid = low + ((high - low) `div` 2)
! midVal = array ! mid in next mid midVal
| otherwise = -(low + 1)
where next mid midVal
| midVal < query = binarySearch' query array (mid + 1) | high midVal > query = binarySearch' query array low | (mid - 1) otherwise = mid
No obvious performance killers, maybe the 'next' function costs a little and let ... in case compare midVal query of LT -> binarySearch' query array (mid+1) high EQ -> mid GT -> binarySearch' query array low (mid-1) would be faster. Or moving binarySearch' from the top-level into binarySearch and eliminating the two static arguments may improve performance (I seem to remember that a static argument-transform for less than three or four non-function arguments can speed the code up or slow it down, so you'd have to test; for many arguments or function arguments it's pretty certain to give a speed-up, IIRC). binarySearch query array = go low high where (low,high) = bounds array go !l !h | h < l = -(l+1) | mv < query = go l (m-1) | mv == query = m | otherwise = go (m+1) h where m = l + (h-l) `quot` 2 mv = array `unsafeAt` m
Thank you!
Arnoldo Muller

Am Donnerstag 18 März 2010 20:49:30 schrieb Daniel Fischer:
Am Donnerstag 18 März 2010 19:59:33 schrieb Arnoldo Muller:
Hello!
I am trying to implement a binary search function that returns the index of an exact or the (index + 1) where the item should be inserted in an array if the item to be searched is not found (I am not trying to insert data in the array) .
Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times.
If it's called often, and the arrays are 0-based and Int-indexed,
import Data.Array.Base (unsafeAt)
and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster.
Do you have any ideas on how to improve the performance of this function?
would be faster. Or moving binarySearch' from the top-level into binarySearch and eliminating the two static arguments may improve performance (I seem to remember that a static argument-transform for less than three or four non-function arguments can speed the code up or slow it down, so you'd have to test; for many arguments or function arguments it's pretty certain to give a speed-up, IIRC).
Yep, for me {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits binarySearch :: Ord a => a -> Array Int a -> Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi < lo = -(lo+1) | otherwise = case compare mv q of LT -> go (m+1) hi EQ -> m GT -> go lo (m-1) where m = (lo .&. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m chops ~40% off the time. 'unsafeAt' alone reduces time by ~10%, the local loop gives the biggest speedup, and the bit-fiddling instead of m = lo + (hi-lo) `quot` 2 something like 4%. If you don't like bit-fiddling or want your code to be portable to machines that don't use two's complement, the last few percent can be left alone. Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests).
Thank you!
Arnoldo Muller

Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer:
Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests).
However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete "probably".). {-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits {-# SPECIALISE binarySearch :: Double -> Array Int Double -> Int #-} {-# SPECIALISE binarySearch :: Int -> Array Int Int -> Int #-} {-# SPECIALISE binarySearch :: Bool -> Array Int Bool -> Int #-} {-# SPECIALISE binarySearch :: Char -> Array Int Char -> Int #-} {-# SPECIALISE binarySearch :: Float -> Array Int Float -> Int #-} binarySearch :: Ord a => a -> Array Int a -> Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi < lo = -(lo+1) | otherwise = case compare mv q of LT -> go (m+1) hi EQ -> m GT -> go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .&. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that.

On 19/03/2010, at 08:48, Daniel Fischer wrote:
Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer:
Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests).
However, a few {-# SPECIALISE #-} pragmas set the record straight.
This is because without specialising, unsafeAt is a straight (inlineable) function call for boxed arrays but is overloaded and hence much slower for unboxed ones. In general, unboxed arrays tend to be slower in generic code. The only real solution is making functions such as binarySearch INLINE. Roman

Hello Daniel,
Regarding your solution, can I apply {-# SPECIALISE ... #-} statements to
datatypes I define?
And if so, I am not able to import the datatypes to the module where
binarySearch is.
The problem is that if I import them a circular dependency is detected and
the compiler gives an error.
Is there a way of importing a datatype from another module do avoid this
circular dependency?
Thank you,
Arnoldo
On Thu, Mar 18, 2010 at 10:48 PM, Daniel Fischer
Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer:
Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests).
However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete "probably".).
{-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where
import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits
{-# SPECIALISE binarySearch :: Double -> Array Int Double -> Int #-} {-# SPECIALISE binarySearch :: Int -> Array Int Int -> Int #-} {-# SPECIALISE binarySearch :: Bool -> Array Int Bool -> Int #-} {-# SPECIALISE binarySearch :: Char -> Array Int Char -> Int #-} {-# SPECIALISE binarySearch :: Float -> Array Int Float -> Int #-} binarySearch :: Ord a => a -> Array Int a -> Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi < lo = -(lo+1) | otherwise = case compare mv q of LT -> go (m+1) hi EQ -> m GT -> go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .&. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m
Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that.

A lte reply, but if you still need to have circular module depency: 4.6.9.
How to compile mutually recursive modules in
http://www.haskell.org/ghc/docs/latest/html/users_guide/separate-compilation...
On 21 March 2010 01:31, Arnoldo Muller
Hello Daniel,
Regarding your solution, can I apply {-# SPECIALISE ... #-} statements to datatypes I define? And if so, I am not able to import the datatypes to the module where binarySearch is. The problem is that if I import them a circular dependency is detected and the compiler gives an error. Is there a way of importing a datatype from another module do avoid this circular dependency?
Thank you,
Arnoldo
On Thu, Mar 18, 2010 at 10:48 PM, Daniel Fischer
wrote:
Am Donnerstag 18 März 2010 21:57:34 schrieb Daniel Fischer:
Contrary to my expectations, however, using unboxed arrays is slower than straight arrays (in my tests).
However, a few {-# SPECIALISE #-} pragmas set the record straight. Specialising speeds up both, boxed and unboxed arrays, significantly, but now, for the specialised types, unboxed arrays are faster (note, however, that when the code for the binary search is in the same module as it is used, with optimisations, GHC will probably specialise it itself. If binarySearch is not exported, AFAIK, you can delete "probably".).
{-# LANGUAGE BangPatterns #-} module SATBinSearch (binarySearch) where
import Data.Array.IArray import Data.Array.Base (unsafeAt) import Data.Bits
{-# SPECIALISE binarySearch :: Double -> Array Int Double -> Int #-} {-# SPECIALISE binarySearch :: Int -> Array Int Int -> Int #-} {-# SPECIALISE binarySearch :: Bool -> Array Int Bool -> Int #-} {-# SPECIALISE binarySearch :: Char -> Array Int Char -> Int #-} {-# SPECIALISE binarySearch :: Float -> Array Int Float -> Int #-} binarySearch :: Ord a => a -> Array Int a -> Int binarySearch q a = go l h where (l,h) = bounds a go !lo !hi | hi < lo = -(lo+1) | otherwise = case compare mv q of LT -> go (m+1) hi EQ -> m GT -> go lo (m-1) where -- m = lo + (hi-lo) `quot` 2 m = (lo .&. hi) + (lo `xor` hi) `shiftR` 1 mv = a `unsafeAt` m
Use Data.Array.Unboxed and UArray if possible. Now the bit-fiddling instead of arithmetics makes a serious difference, about 20% for unboxed arrays, 17% for boxed arrays (Double), so I'd recommend that.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ozgur Akgun

Daniel Fischer wrote:
If it's called often, and the arrays are 0-based and Int-indexed,
import Data.Array.Base (unsafeAt)
and replacing ! with `unsafeAt` should give a speed-up, though probably not terribly much. If you don't need the polymorphism and your array elements are unboxable, using UArray from Data.Array.Unboxed should be significantly faster.
Beware that unboxed arrays are strict, and changing the strictness properties of your code can have non-obvious consequences...

Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times. Do you have any ideas on how to improve the performance of this function?
Bast solution for speeding up is to write it in assembler! Ragards, Andrey -- View this message in context: http://old.nabble.com/Performance-question-tp27949969p27950864.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Arnoldo Muller
Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times.
Do you have any ideas on how to improve the performance of this function?
The fastest way to do a binary search is to reify it into code using TH, in Van Emde Boas layout if it's a big enough search (so that you get less cache misses) This might of course get tricky if your tree isn't compile-time static, but if you're really doing gazillions of lookups, occasionally compiling+dynamically linking code might well be worth it. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

Thank you all, I will apply your suggestions to my function.
Thank you for making the process of learning Haskell much easier!
Arnoldo
On Fri, Mar 19, 2010 at 4:21 PM, Achim Schneider
Arnoldo Muller
wrote: Right now, the bottleneck of my program is in binarySearch', the function must be called a few billion times.
Do you have any ideas on how to improve the performance of this function?
The fastest way to do a binary search is to reify it into code using TH, in Van Emde Boas layout if it's a big enough search (so that you get less cache misses)
This might of course get tricky if your tree isn't compile-time static, but if you're really doing gazillions of lookups, occasionally compiling+dynamically linking code might well be worth it.
-- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (7)
-
Achim Schneider
-
Andrew Coppin
-
Andrey Sisoyev
-
Arnoldo Muller
-
Daniel Fischer
-
Ozgur Akgun
-
Roman Leshchinskiy