Faster break and span for Prelude

Hello everyone, I noticed that Prelude break and span Core uses boxed tuples which is quite expensive. Since functions from Prelude are used everywhere, many programs will benefit from their optimization. Please see my implementation which uses internal worker function and unboxed tuples. In the best case it could perform more than twice faster than Prelude functions. import Criterion.Main newspan :: (a -> Bool) -> [a] -> ([a], [a]) newspan p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = case go xs' of (ys, zs) -> (x:ys,zs) | otherwise = ([], xs) newbreak :: (a -> Bool) -> [a] -> ([a], [a]) newbreak p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = ([], xs) | otherwise = case go xs' of (ys, zs) -> (x:ys,zs) versus f1 f2 pred arg = [ bench "new" $ whnf (f1 pred) arg , bench "prelude" $ whnf (f2 pred) arg ] variousLists f1 f2 pred = [ bgroup "fullMatch" $ versus f1 f2 pred (replicate 1000 1 :: [Int]) , bgroup "failOnFirst" $ versus f1 f2 (not . pred) (replicate 1000 1 :: [Int]) , bgroup "emptyList" $ versus f1 f2 pred ([]::[Int]) ] main = defaultMain [ bgroup "span" $ variousLists newspan span (==1) , bgroup "break" $ variousLists newbreak break (/=1) ] -- Regards, Boris

On 27 December 2011 20:41, Boris Lykah
Hello everyone,
I noticed that Prelude break and span Core uses boxed tuples which is quite expensive. Since functions from Prelude are used everywhere, many programs will benefit from their optimization. Please see my implementation which uses internal worker function and unboxed tuples. In the best case it could perform more than twice faster than Prelude functions.
import Criterion.Main
newspan :: (a -> Bool) -> [a] -> ([a], [a]) newspan p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = case go xs' of (ys, zs) -> (x:ys,zs) | otherwise = ([], xs)
newbreak :: (a -> Bool) -> [a] -> ([a], [a]) newbreak p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = ([], xs) | otherwise = case go xs' of (ys, zs) -> (x:ys,zs)
versus f1 f2 pred arg = [ bench "new" $ whnf (f1 pred) arg , bench "prelude" $ whnf (f2 pred) arg ]
variousLists f1 f2 pred = [ bgroup "fullMatch" $ versus f1 f2 pred (replicate 1000 1 :: [Int]) , bgroup "failOnFirst" $ versus f1 f2 (not . pred) (replicate 1000 1 :: [Int]) , bgroup "emptyList" $ versus f1 f2 pred ([]::[Int]) ]
main = defaultMain [ bgroup "span" $ variousLists newspan span (==1) , bgroup "break" $ variousLists newbreak break (/=1) ]
-- Regards, Boris
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
Note that your benchmarks just compare the time to construct the final tuples and not the lists inside of it. This is because you're using 'whnf' instead of 'nf'. Because your new functions are strict in the tail of the argument list (in contrast to the current span and break), it will actually take a longer time to calculate the final tuple: http://basvandijk.github.com/newSpanBreak_WHNF.html So I think it's fairer to use 'nf': http://basvandijk.github.com/newSpanBreak_NF.html But, as already mentioned, your functions are more strict than the current functions:
head $ fst $ span (==1) (1:undefined) 1 head $ fst $ newspan (==1) (1:undefined) *** Exception: Prelude.undefined
head $ fst $ break (/=1) (1:undefined) 1 head $ fst $ newbreak (/=1) (1:undefined) *** Exception: Prelude.undefined
So they can't just replace the originals without going through the library submission process. But lets see what happens when we make them as lazy as the original functions: newspan :: (a -> Bool) -> [a] -> ([a], [a]) newspan p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = let (ys, zs) = go xs' in (x:ys,zs) | otherwise = ([], xs) newbreak :: (a -> Bool) -> [a] -> ([a], [a]) newbreak p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = ([], xs) | otherwise = let (ys, zs) = go xs' in (x:ys,zs) They are faster but not much: http://basvandijk.github.com/newSpanBreak_Lazy_NF.html However I do expect them to be significantly faster when they are applied fully saturated. Cheers, Bas

On 27 December 2011 22:40, Bas van Dijk
However I do expect them to be significantly faster when they are applied fully saturated.
This looks to be correct when looking at the core of a fully saturated application: spanApplied = span (==1) (replicate 1000 1 :: [Int]) You can see that in the generated core the (==1) is passed to the worker function $wspan: spanApplied = case $wspan@ Int (==1) (replicate 1000 1) of _ { (# ww1_aoo, ww2_aop #) $wspan = \ (@ a_aa0) (w_snj :: a_aa0 -> Bool) (w1_snk :: [a_aa0]) -> case w1_snk of wild_X9 { [] -> (# [] @ a_aa0, [] @ a_aa0 #); : x1_aas xs'_aat -> case w_snj x1_aas of _ { False -> (# [] @ a_aa0, wild_X9 #); True -> let { ds_smn [Dmd=Just D(SS)] :: ([a_aa0], [a_aa0]) ds_smn = case $wspan @ a_aa0 w_snj xs'_aat of _ { (# ww1_snO, ww2_snP #) -> (ww1_snO, ww2_snP) } } in (# : @ a_aa0 x1_aas (case ds_smn of _ { (ys_Xdu, zs_Xdl) -> ys_Xdu }) , case ds_smn of _ { (ys_ad7, zs_Xdu) -> zs_Xdu } #) } } While in: newSpanApplied = newspan (==1) (replicate 1000 1 :: [Int]) the (==1) is "fused" into the worker function $wgo: newSpanApplied = case $wgo (replicate 1000 1) of _ { (# ww1_spk, ww2_spl #) -> (ww1_spk, ww2_spl)} $wgo = \ (w_soQ :: [Int]) -> case w_soQ of wild_Xb { [] -> (# [] @ Int, [] @ Int #); : x1_abE xs'_abF -> case x1_abE of wild1_amK { I# x2_amM -> case x2_amM of _ { __DEFAULT -> (# [] @ Int, wild_Xb #); 1 -> let { ds_snn [Dmd=Just D(SS)] :: ([Int], [Int]) ds_snn = case $wgo xs'_abF of _ { (# ww1_spk, ww2_spl #) -> (ww1_spk, ww2_spl) } } in (# : @ Int wild1_amK ( case ds_snn of _ { (ys_Xmv, zs_Xmm) -> ys_Xmv }) , case ds_snn of _ { (ys_am6, zs_Xmv) -> zs_Xmv } #) } } } (BTW I'm using GHC-7.4.1-rc1) Bas

Thank you for checking it out. I missed that my variant has different
semantics. However, I am surprised to see that benchmark with whnf
shows these figures. On my machine new span/break were 1.5-2.5 were
times faster both with whnf and nf. Prelude functions took much longer
time with whnf. I used GHC 7.0.2 with -O2.
Perhaps your lazy variant is a good substitution for the current
Prelude functions.
On Tue, Dec 27, 2011 at 11:40 PM, Bas van Dijk
On 27 December 2011 20:41, Boris Lykah
wrote: Hello everyone,
I noticed that Prelude break and span Core uses boxed tuples which is quite expensive. Since functions from Prelude are used everywhere, many programs will benefit from their optimization. Please see my implementation which uses internal worker function and unboxed tuples. In the best case it could perform more than twice faster than Prelude functions.
import Criterion.Main
newspan :: (a -> Bool) -> [a] -> ([a], [a]) newspan p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = case go xs' of (ys, zs) -> (x:ys,zs) | otherwise = ([], xs)
newbreak :: (a -> Bool) -> [a] -> ([a], [a]) newbreak p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = ([], xs) | otherwise = case go xs' of (ys, zs) -> (x:ys,zs)
versus f1 f2 pred arg = [ bench "new" $ whnf (f1 pred) arg , bench "prelude" $ whnf (f2 pred) arg ]
variousLists f1 f2 pred = [ bgroup "fullMatch" $ versus f1 f2 pred (replicate 1000 1 :: [Int]) , bgroup "failOnFirst" $ versus f1 f2 (not . pred) (replicate 1000 1 :: [Int]) , bgroup "emptyList" $ versus f1 f2 pred ([]::[Int]) ]
main = defaultMain [ bgroup "span" $ variousLists newspan span (==1) , bgroup "break" $ variousLists newbreak break (/=1) ]
-- Regards, Boris
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
Note that your benchmarks just compare the time to construct the final tuples and not the lists inside of it. This is because you're using 'whnf' instead of 'nf'. Because your new functions are strict in the tail of the argument list (in contrast to the current span and break), it will actually take a longer time to calculate the final tuple:
http://basvandijk.github.com/newSpanBreak_WHNF.html
So I think it's fairer to use 'nf':
http://basvandijk.github.com/newSpanBreak_NF.html
But, as already mentioned, your functions are more strict than the current functions:
head $ fst $ span (==1) (1:undefined) 1 head $ fst $ newspan (==1) (1:undefined) *** Exception: Prelude.undefined
head $ fst $ break (/=1) (1:undefined) 1 head $ fst $ newbreak (/=1) (1:undefined) *** Exception: Prelude.undefined
So they can't just replace the originals without going through the library submission process.
But lets see what happens when we make them as lazy as the original functions:
newspan :: (a -> Bool) -> [a] -> ([a], [a]) newspan p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = let (ys, zs) = go xs' in (x:ys,zs) | otherwise = ([], xs)
newbreak :: (a -> Bool) -> [a] -> ([a], [a]) newbreak p xs = go xs where go xs@[] = (xs, xs) go xs@(x:xs') | p x = ([], xs) | otherwise = let (ys, zs) = go xs' in (x:ys,zs)
They are faster but not much:
http://basvandijk.github.com/newSpanBreak_Lazy_NF.html
However I do expect them to be significantly faster when they are applied fully saturated.
Cheers,
Bas
-- Regards, Boris
participants (2)
-
Bas van Dijk
-
Boris Lykah