darcs patch: Improve Control.Monad.filterM:

Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen

Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ... hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs }
The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining). Cheers, Simon

Simon Marlow wrote:
Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ... hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs } The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining).
Cheers, Simon
The new one looks better to me. But the foldr is not needed: filterM _ [] = return [] filterM p (x:xs) = do flg <- p x if flg then do ys <- filterM p xs return (x:ys) else filterM p xs The above filterM differs from the original in when the garbage collector will be able to collect "x". It looks like the original had to hold onto "x" while "ys" was being computed. The version above looks like it can discard "x" as soon as the else branch is chosen. -- Chris

On 2006-08-08 at 13:37BST Chris Kuklewicz wrote:
The new one looks better to me. But the foldr is not needed:
filterM _ [] = return [] filterM p (x:xs) = do flg <- p x if flg then do ys <- filterM p xs return (x:ys) else filterM p xs
I'm curious to know how this performs:
filterM p = fmap catMaybes . mapM (predMToMaybe p)
with the subsidiary (and generally useful) predMToMaybe¹ defined something like this:
predMToMaybe p x = fmap (\b->if b then Just x else Nothing ) $ p x
since this definition of filterM clearly shouldn't hold onto anything in the second half (. mapM (predMToMaybe p)) and the first half (fmap catMaybes) will be OK provided that catMaybes and fmap behave themselves -- and if they don't they should be given what for. Jón [1] or perhaps it should be called predFToMaybe since its type is (Functor f) => (a -> f Bool) -> a -> f (Maybe a) (if someone can think of a better name, I'd be glad to hear it) -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

On 2006-08-08 at 15:40BST I wrote:
I'm curious to know how this performs:
filterM p = fmap catMaybes . mapM (predMToMaybe p)
and the answer is "quite badly"...
since this definition of filterM clearly shouldn't hold onto anything in the second half (. mapM (predMToMaybe p))
... because the above statement is wrong: mapM has the same problem as the original filterM. If we replace mapM thus:
mapM f = mapM' [] f mapM' acc p [] = return (reverse acc) mapM' acc p (h:t) = do e <- p h (mapM' $! (e:acc)) p t
then we get something that doesn't overflow the stack on Spencer's fuse test, but which doesn't diverge on
do filterM (\x -> return undefined) [1]; return ()
The new mapM is spine-strict on the list, but so, I think, is the old version -- at least
(mapM (return . (+1)) [1..]) >>= print.head
diverges for both versions The speed of my filterM leaves a lot to be desired: it seems to be something like one eighth the speed of Spencer's on the fuse benchmark and one quarter on "none". Perhaps someone who understands the ins and outs of these things better than I do could explain? [It's not the use of maybes, because
filterM_strict p = foldr cm (return []) . map (predMToMaybe p) where cm x xs = do c <- x case c of Just x -> xs >>= return . (x:) Nothing -> xs
is competetive with Spencer's version (and as strict)] It does seem to be faster in at least some cases than Monad.filterM. Jón -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

Chris Kuklewicz wrote:
Simon Marlow wrote:
Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ...
hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs }
The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining).
The new one looks better to me.
It may well be better, but it doesn't have the same laziness properties, so it isn't the same function. eg. try this: do filterM (\x -> return undefined) [1]; return () Of course we may discuss whether the extra laziness is useful, but I can't apply the patch as it stands because it would break Haskell 98.
But the foldr is not needed:
The foldr was there to allow fusion, I believe. Cheers, Simon

On Aug 8, 2006, at 10:24 AM, Simon Marlow wrote:
It may well be better, but it doesn't have the same laziness properties, so it isn't the same function. eg. try this:
do filterM (\x -> return undefined) [1]; return () Ah yes, I missed that.
Of course we may discuss whether the extra laziness is useful, but I can't apply the patch as it stands because it would break Haskell 98. Data.List uses "#ifdef USE_REPORT_PRELUDE" in places to choose between original and improved implementations. Is that an option in this case?
But the foldr is not needed:
The foldr was there to allow fusion, I believe.
Cheers, Simon

Spencer Janssen wrote:
On Aug 8, 2006, at 10:24 AM, Simon Marlow wrote:
It may well be better, but it doesn't have the same laziness properties, so it isn't the same function. eg. try this:
do filterM (\x -> return undefined) [1]; return ()
Ah yes, I missed that.
Of course we may discuss whether the extra laziness is useful, but I can't apply the patch as it stands because it would break Haskell 98.
Data.List uses "#ifdef USE_REPORT_PRELUDE" in places to choose between original and improved implementations. Is that an option in this case?
Not really, we don't want to have two versions of the base package. The USE_REPORT_PRELUDE code is there mainly for documentation, I don't believe it has actually worked for a long time now. We could have Control.Monad.filterM be different from Monad.filterM to avoid breaking Haskell 98, if there was a convincing enough argument that the semantics of Monad.filterM should be changed. Library functions usually strive to be as lazy as possible, because laziness can't be recovered if you need it. On the other hand, laziness might imply unfixable space or time leaks in library code, so it's a delicate balance. Cheers, Simon

On Aug 8, 2006, at 7:37 AM, Chris Kuklewicz wrote:
Simon Marlow wrote:
Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ... hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs } The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining). Cheers, Simon The new one looks better to me. But the foldr is not needed:
filterM _ [] = return [] filterM p (x:xs) = do flg <- p x if flg then do ys <- filterM p xs return (x:ys) else filterM p xs
The above filterM differs from the original in when the garbage collector will be able to collect "x". It looks like the original had to hold onto "x" while "ys" was being computed. The version above looks like it can discard "x" as soon as the else branch is chosen.
-- Chris
I've benchmarked the versions from base, Chris Kuklewicz and myself. Each implementation was placed in a separate module, given an INLINE pragma and compiled with -O2. \begin{code} -- construct a list that won't fuse construct 0 = [] construct i = i : construct (i - 1) all = filterM (return . const True) (construct $ 2^18) half = filterM (return . even) (construct $ 2^18) none = filterM (return . const False) (construct $ 2^19) fuse = filterM (return . const False) [1 .. 2^22 :: Int] \end{code} Results (GHC 6.5 on an Intel Core Duo 1.83 GHz): | All | Half | None | Fuse Janssen | 0.15 | 0.09 | 0.02 | 0.02 Kuklewicz | 0.15 | 0.11 | 0.02 | 0.15 Control.Monad | 0.21 | 0.31 | 0.25 | stack overflow The tests are about what we expect. The alternate versions beat filterM solidly, even more so with 'sparse' predicates. In addition, Janssen ties or beats Kuklewicz in each test showing that the overhead due to foldr isn't a problem. Cheers, Spencer Janssen

Spencer Janssen wrote:
Fri Aug 4 13:49:57 CDT 2006 Spencer Janssen
* Improve Control.Monad.filterM: * filterM is defined in terms of foldr, making it a good consumer in GHC's fusion framework * filterM uses linear stack space with respect to the number of items that the predicate returns true, rather than the total number of elements in the input. ... hunk ./Control/Monad.hs 151 -filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys) +filterM p = foldr f (return []) + where + f x xs = do + flg <- p x + if flg + then xs >>= return . (x:) + else xs }
The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining). Cheers, Simon

Hello Simon, Tuesday, August 8, 2006, 4:18:01 PM, you wrote:
-filterM _ [] = return [] -filterM p (x:xs) = do - flg <- p x - ys <- filterM p xs - return (if flg then x:ys else ys)
The new definition looks less lazy than the original, so it's not a drop-in replacement. Also, we would need some measurements to test whether this version doesn't lose efficiency - it probably fuses better, but might be slower when it doesn't fuse. Rules to turn the foldr version back into the recursive version might be needed (or aggressive inlining).
jfyi - original filterM isn't tail recursive, so it overflows stack on really large data. -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 10/24/06, Bulat Ziganshin
jfyi - original filterM isn't tail recursive, so it overflows stack on really large data.
This new version doesn't look tail recursive either.
--
Taral
participants (6)
-
Bulat Ziganshin
-
Chris Kuklewicz
-
Jon Fairbairn
-
Simon Marlow
-
Spencer Janssen
-
Taral