
On Aug 8, 2006, at 7:37 AM, Chris Kuklewicz wrote:
Simon Marlow wrote:
Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ... hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs } The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining). Cheers, Simon The new one looks better to me. But the foldr is not needed:
filterM _ [] = return [] filterM p (x:xs) = do flg <- p x if flg then do ys <- filterM p xs return (x:ys) else filterM p xs
The above filterM differs from the original in when the garbage collector will be able to collect "x". It looks like the original had to hold onto "x" while "ys" was being computed. The version above looks like it can discard "x" as soon as the else branch is chosen.
-- Chris
I've benchmarked the versions from base, Chris Kuklewicz and myself. Each implementation was placed in a separate module, given an INLINE pragma and compiled with -O2. \begin{code} -- construct a list that won't fuse construct 0 = [] construct i = i : construct (i - 1) all = filterM (return . const True) (construct $ 2^18) half = filterM (return . even) (construct $ 2^18) none = filterM (return . const False) (construct $ 2^19) fuse = filterM (return . const False) [1 .. 2^22 :: Int] \end{code} Results (GHC 6.5 on an Intel Core Duo 1.83 GHz): | All | Half | None | Fuse Janssen | 0.15 | 0.09 | 0.02 | 0.02 Kuklewicz | 0.15 | 0.11 | 0.02 | 0.15 Control.Monad | 0.21 | 0.31 | 0.25 | stack overflow The tests are about what we expect. The alternate versions beat filterM solidly, even more so with 'sparse' predicates. In addition, Janssen ties or beats Kuklewicz in each test showing that the overhead due to foldr isn't a problem. Cheers, Spencer Janssen