Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

If there's not an existing way to do this, I suppose this is a minor
proposal.
When writing monadic code that consumes a container (Set, Map) etc, I often
find that I just want what iterators in imperative languages provide: that
is, to iterate over the contents of a large container, performing monadic
effects along the way, but without allocating.
The Foldable instance for Data.Map almost gives you what you want, but it
only exposes the values, not their keys. The "traverseWithKey" function is
close too, but it builds a new Map as output:
traverseWithKey ::
Applicative<../base-4.6.0.1/Control-Applicative.html#t:Applicative> t
=> (k -> a -> t b) -> Map

On Tue, Jul 2, 2013 at 6:58 AM, Ryan Newton
If there's not an existing way to do this, I suppose this is a minor proposal.
When writing monadic code that consumes a container (Set, Map) etc, I often find that I just want what iterators in imperative languages provide: that is, to iterate over the contents of a large container, performing monadic effects along the way, but without allocating.
The Foldable instance for Data.Map almost gives you what you want, but it only exposes the values, not their keys. The "traverseWithKey" function is close too, but it builds a new Map as output:
traverseWithKey :: Applicative t => (k -> a -> t b) -> Map k a -> t (Map k b)
So in practice what I do, and what I assume others do is this:
mapM_ f (toList myMap)
And I don't see any fusion rules that would ameliorate this (they seem limited to pure code), so I'm betting I always pay the cost of conversion to a list.
In the case of Data.Map the proposal is simply a "traverseWithKey_" that returns "t ()". This follows the suffixing convention of mapM/mapM_ etc.
More generally, it would be nice to do an audit of all popular containers with the assumption of very large collections that cant' be frivolously copied.
-Ryan
P.S. Actually, there are a number of places where the standard libraries could do with a more complete set of "_" functions -- e.g. it always chafes to not have "atomicModifyIORef_", which would apply a (a->a) function like a regular modifyIORef.
Is there a reason you couldn't implement this just as well using traverseWithKey, à la http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Le... ? traverse-style functions let you choose your own Applicative, so with the right choice of Applicative you can accumulate effects without building up a new structure. Of course, if you think this should be added for convenience, that's a different matter, but the API already seems to provide the ability. (Note: The implementation in lens uses undefined internally in one place, but that's not necessary in general. See https://github.com/ekmett/lens/issues/177 .) Shachaf

On Tue, 2 Jul 2013, Ryan Newton wrote:
If there's not an existing way to do this, I suppose this is a minor proposal. When writing monadic code that consumes a container (Set, Map) etc, I often find that I just want what iterators in imperative languages provide: that is, to iterate over the contents of a large container, performing monadic effects along the way, but without allocating.
The Foldable instance for Data.Map almost gives you what you want, but it only exposes the values, not their keys. The "traverseWithKey" function is close too, but it builds a new Map as output:
traverseWithKey :: Applicative t => (k -> a -> t b) -> Map k a -> t (Map k b)
So in practice what I do, and what I assume others do is this:
mapM_ f (toList myMap)
You could also do Map.foldrWithKey (\k a -> f k a >>) (return ()) which does not need an interim Map, or Foldable.sequence_ . Map.mapWithKey f which looks more elegant.

On Tue, Jul 2, 2013 at 2:45 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
Foldable.sequence_ . Map.mapWithKey f
which looks more elegant.
This has the unfortunate consequence that it builds an entire new map strictly before sequencing it, due to the fact that Data.Map is spine-strict. =( -Edward

Hi all,
Thanks for the responses. I want to go through and make sure I understand
these.
--------------------------------------------------------
First, Henning, won't both of these allocate in proportion to the size of
the map?
Map.foldrWithKey (\k a -> f k a >>) (return ())
Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when building
up that large monadic computation in the foldrWithKey?
--------------------------------------------------------
Edward said to use foldMapWithKey, which hasn't been released yet, but it
sounds like it will be.
https://github.com/ekmett/containers/commit/40187f32a43689ff02ca2b97465aa4fc...
Even then, I might argue it is somewhat non-obvious to the programmers how
to use this to do a non-allocating "for-each". For example, I am not
totally clear on what will happen with that tree of mappend calls -- will
it allocate thunks? Further, IO is not a monoid, so am I to create an
instance of "Monoid (IO ())" in order to use foldMapWithKey to iterate over
the Map?
--------------------------------------------------------
On Tue, Jul 2, 2013 at 10:29 AM, Shachaf Ben-Kiki
http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Le...
*
That function looks more overloaded than the traverse in Data.Map that I'm
referring to, e.g. here:
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-St...
I'm afraid I don't understand the proposal then -- is it to use lens
somehow? For the traversal I need to do over a Data.Map.Map, I need to fix
't' to be IO or Par or whatever I'm working with, so that the (k -> a -> t
b) function I'm passing in can do the effects I need.
To be specific I'm proposing providing these variants:
traverseWithKey :: **Applicative t => (k -> a -> t b) -> Map k a -> t
(Map k b)
traverseWithKey_ :: **Applicative t => (k -> a -> t ()) -> Map k a -> t
()
And without exposing the latter natively, I still don't understand how to
trick the former into not allocating, if that's the proposal.
-Ryan
On Tue, Jul 2, 2013 at 2:54 PM, Edward Kmett
On Tue, Jul 2, 2013 at 2:45 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
Foldable.sequence_ . Map.mapWithKey f
which looks more elegant.
This has the unfortunate consequence that it builds an entire new map strictly before sequencing it, due to the fact that Data.Map is spine-strict. =(
-Edward

On Tue, 2 Jul 2013, Ryan Newton wrote:
Hi all, Thanks for the responses. I want to go through and make sure I understand these.
-------------------------------------------------------- First, Henning, won't both of these allocate in proportion to the size of the map?
Map.foldrWithKey (\k a -> f k a >>) (return ()) Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when building up that large monadic computation in the foldrWithKey?
Since it is a foldr, the first action can be run without knowing the following ones. That is, at no time all actions must be allocated.

In particular, will the compiler be able to avoid allocating when
building up that large monadic computation in the foldrWithKey?
Since it is a foldr, the first action can be run without knowing the following ones. That is, at no time all actions must be allocated.
Hi all, Well, to test it out I went ahead and made a patch to containers, here: https://github.com/rrnewton/containers/commit/9b1a913c923fd9409932434894cca2... Henning, while I agree with you that GHC shouldn't truly *need* to generate a first class representation of that 10M-step monadic action... it remains the case that the updated test (attached to the end of this email), shows the traverseWithKey_ version allocating only 200K, whereas the foldrWithKey allocates 32M. Further, I argue that the traverseWithKey_ version is clearer to programmers that don't yet grok the first class nature of monadic actions (e.g. a fold to build up one big monadic action). And again, if we are not sure of the performance implications of that in this thread, I imagine many Haskell programmers would not be. So unless there's a strong counterargument, I propose the above patch for acceptance. -Ryan P.S. Using the HEAD version of cabal the allocation for -O0 foldrWithKey actually went up from 200M to 300M. Appendix: updated code below and at this URL: https://gist.github.com/rrnewton/5912513#file-maptest-hs -------------------------------------------- *import Control.DeepSeq* *import GHC.Stats* *import qualified Data.Map.Strict as M* *import Data.Time.Clock* *import Control.Exception* *import System.Mem* * * *main :: IO ()* *main = do* * t0 <- getCurrentTime* * let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])* * evaluate$ rnf m0 * * t1 <- getCurrentTime* * performGC* * s1 <- getGCStats * * putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show s1++"\n"* * let fn 500000 v = putStrLn "Got it!"* * fn _ _ = return ()* * * * -- Regular traverseWithKey uses 48MB* * -- traverseWithKey_ usse 200K of allocation:* * M.traverseWithKey_ fn m0* * t2 <- getCurrentTime* * performGC* * s2 <- getGCStats * * putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime t2 t1)++"\n "++ show s2++"\n"* * putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s2 - bytesAllocated s1)* * * * -- foldrWithKey uses 32MB allocation:* * M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0* * t3 <- getCurrentTime* * performGC* * s3 <- getGCStats * * putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime t3 t2)++"\n "++ show s3++"\n"* * putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s3 - bytesAllocated s2)* * return ()* * *

* Henning Thielemann
On Tue, 2 Jul 2013, Ryan Newton wrote:
Hi all, Thanks for the responses. I want to go through and make sure I understand these.
-------------------------------------------------------- First, Henning, won't both of these allocate in proportion to the size of the map?
Map.foldrWithKey (\k a -> f k a >>) (return ()) Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when building up that large monadic computation in the foldrWithKey?
Since it is a foldr, the first action can be run without knowing the following ones. That is, at no time all actions must be allocated.
It's not a foldr you would expect. Here's the code: foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l You can see that 'go' is (partially) driven by the tree structure. A more foldr-y foldr would look like foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = f kx x (go (go z' r) l) Perhaps this should be fixed? (Note that I haven't done any testing.) Another note: I guess "non-allocating" in the title means that we're not allocating anything of the order of map size. Some stack-like allocation is inevitable since we're working with a tree, but it will be logarithmic in the map size. But then, it seems to me that the current version of foldrWithKey should satsify that criterion, only with a big(ger) constant factor. It always traverses the left branch accumulating z, then yields control to f. Roman

In my case, I'm trying to avoid heap allocation, because in parallel
haskell it is death to performance.
My claim is that:
- current version of foldrWithKey heap allocates in proportion to the
input size
- my proposed traverseWithKey_ doesn't
- traverseWithKey_ may be a more obvious an idiom to some programmers
than folding to compose an IO action (requires understanding the detailed
interaction of laziness, optimizations, and first class IO actions)
It sounds like Roman's alternate foldrWithKey will be an improvement as
well, so I'll test it.
-Ryan
On Thu, Jul 4, 2013 at 12:41 PM, Roman Cheplyaka
* Henning Thielemann
[2013-07-02 21:57:58+0200] On Tue, 2 Jul 2013, Ryan Newton wrote:
Hi all, Thanks for the responses. I want to go through and make sure I
understand these.
-------------------------------------------------------- First, Henning, won't both of these allocate in proportion to the size
of the map?
Map.foldrWithKey (\k a -> f k a >>) (return ()) Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when
building up that large monadic computation
in the foldrWithKey?
Since it is a foldr, the first action can be run without knowing the following ones. That is, at no time all actions must be allocated.
It's not a foldr you would expect. Here's the code:
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l
You can see that 'go' is (partially) driven by the tree structure.
A more foldr-y foldr would look like
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = f kx x (go (go z' r) l)
Perhaps this should be fixed? (Note that I haven't done any testing.)
Another note: I guess "non-allocating" in the title means that we're not allocating anything of the order of map size. Some stack-like allocation is inevitable since we're working with a tree, but it will be logarithmic in the map size.
But then, it seems to me that the current version of foldrWithKey should satsify that criterion, only with a big(ger) constant factor. It always traverses the left branch accumulating z, then yields control to f.
Roman

Hello Roman, On 04.07.2013 19:41, Roman Cheplyaka wrote:
It's not a foldr you would expect. Here's the code:
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l
You can see that 'go' is (partially) driven by the tree structure.
This called an in-order traversal of the tree. (The node is handled in-between the sub trees.) foldrWithKey f z = foldr (uncurry f) z . toAscList
A more foldr-y foldr would look like
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = f kx x (go (go z' r) l)
This is a post-order traversal. (The node is handled after the subtrees.) This is a different thing.
Perhaps this should be fixed?
I don't think so. You would change the semantics. (Try to convert a Map into an ascending key-value list with your new definition.) Cheers, Andreas -- Andreas Abel <>< Du bist der geliebte Mensch. Theoretical Computer Science, University of Munich Oettingenstr. 67, D-80538 Munich, GERMANY andreas.abel@ifi.lmu.de http://www2.tcs.ifi.lmu.de/~abel/

* Andreas Abel
Hello Roman,
On 04.07.2013 19:41, Roman Cheplyaka wrote:
It's not a foldr you would expect. Here's the code:
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l
You can see that 'go' is (partially) driven by the tree structure.
This called an in-order traversal of the tree. (The node is handled in-between the sub trees.)
foldrWithKey f z = foldr (uncurry f) z . toAscList
A more foldr-y foldr would look like
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = f kx x (go (go z' r) l)
This is a post-order traversal. (The node is handled after the subtrees.) This is a different thing.
Perhaps this should be fixed?
I don't think so. You would change the semantics. (Try to convert a Map into an ascending key-value list with your new definition.)
Thanks Andreas, I missed the fact that this transformation changes the traversal order. (Although I believe my version is pre-order, not post-order.) The same applies to the Ryan's traverseWithKey_, by the way — in his patch he also handles the node before the subtrees. But that's easily fixable. After playing a bit with Ryan's benchmark, I no longer think that the order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard. Roman

After playing a bit with Ryan's benchmark, I no longer think that the order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard.
Yeah, it's all rather different to predict in advance, isn't it? I tried your alternate foldrWithKey and I saw it heap allocating as well. Further, -O0 vs. -O2 can make a big difference. It's a little frustrating because for dealing efficiently with big data sets, especially in parallel. It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level. By the way, is traverse/traverseWithKey supposed to guarantee a specific order? The doc uses this code in the definition: traversehttp://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Trave... ((k, v) -> (,) k $http://hackage.haskell.org/packages/archive/containers/latest/doc/html/$ f k v) (toListhttp://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-... m) And I thought "toList" didn't guarantee anything (as opposed to toAscList)...

Slightly off-topic, but why do you care about the performance of code
compiled with -O0? I can think of at least one valid reason for doing so,
but even for that particular instance I doubt that changing the code is the
proper solution.
Aside from that, I definitely agree that most packages could do a better
job documenting the heap usage of functions.
On Fri, Jul 5, 2013 at 8:41 AM, Ryan Newton
After playing a bit with Ryan's benchmark, I no longer think that the
order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard.
Yeah, it's all rather different to predict in advance, isn't it?
I tried your alternate foldrWithKey and I saw it heap allocating as well.
Further, -O0 vs. -O2 can make a big difference. It's a little frustrating because for dealing efficiently with big data sets, especially in parallel. It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level.
By the way, is traverse/traverseWithKey supposed to guarantee a specific order? The doc uses this code in the definition:
traversehttp://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Trave... ((k, v) -> (,) k $http://hackage.haskell.org/packages/archive/containers/latest/doc/html/$ f k v) (toListhttp://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-... m)
And I thought "toList" didn't guarantee anything (as opposed to toAscList)...
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Upon re-reading, this was much more terse than I intended, mostly because I
didn't want to get bogged down in an off-topic tangent. What I mean is
that I think it's quite rare indeed to be in a situation where you care
about performance, code with -O2 performs acceptably well, but being
performance-bound at -O0 is a serious hindrance.
But that's my own experience, and I'm sure that others have had different
experiences. And this issue in particular comes up often enough that I'm
sure there's something I'm missing. Hence my question.
On Fri, Jul 5, 2013 at 10:48 AM, John Lato
Slightly off-topic, but why do you care about the performance of code compiled with -O0? I can think of at least one valid reason for doing so, but even for that particular instance I doubt that changing the code is the proper solution.
Aside from that, I definitely agree that most packages could do a better job documenting the heap usage of functions.
On Fri, Jul 5, 2013 at 8:41 AM, Ryan Newton
wrote: After playing a bit with Ryan's benchmark, I no longer think that the
order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard.
Yeah, it's all rather different to predict in advance, isn't it?
I tried your alternate foldrWithKey and I saw it heap allocating as well.
Further, -O0 vs. -O2 can make a big difference. It's a little frustrating because for dealing efficiently with big data sets, especially in parallel. It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level.
By the way, is traverse/traverseWithKey supposed to guarantee a specific order? The doc uses this code in the definition:
traversehttp://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Trave... ((k, v) -> (,) k $http://hackage.haskell.org/packages/archive/containers/latest/doc/html/$ f k v) (toListhttp://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-... m)
And I thought "toList" didn't guarantee anything (as opposed to toAscList)...
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Well, with Haskell we *depend* on optimizations, right? It's a not a
performance-by-construction, but rather
performance-by-awesome-rewriting-and-other-tricks approach.
I set -O2 for my packages, mostly. But I imagine that since it is NOT the
default ~/.cabal/config setting, that most people installing Haskell
packages are getting the default (-O0), and most packages are not
specifying otherwise. In fact, Hackage specifically discourages you when
you upload a package with -O2 set!
Anyway, personally I don't care much about -O0 performance. But I do get a
little nervous when radical changes in asymptotic complexity (rather than
just constant factors) come from optimization settings. For example, as an
idiom, "forM_ [1..10^9] $ ..." makes me extremely nervous as opposed to
writing a proper non-allocating for-loop construct.
On Fri, Jul 5, 2013 at 2:18 AM, John Lato
Upon re-reading, this was much more terse than I intended, mostly because I didn't want to get bogged down in an off-topic tangent. What I mean is that I think it's quite rare indeed to be in a situation where you care about performance, code with -O2 performs acceptably well, but being performance-bound at -O0 is a serious hindrance.
But that's my own experience, and I'm sure that others have had different experiences. And this issue in particular comes up often enough that I'm sure there's something I'm missing. Hence my question.
On Fri, Jul 5, 2013 at 10:48 AM, John Lato
wrote: Slightly off-topic, but why do you care about the performance of code compiled with -O0? I can think of at least one valid reason for doing so, but even for that particular instance I doubt that changing the code is the proper solution.
Aside from that, I definitely agree that most packages could do a better job documenting the heap usage of functions.
On Fri, Jul 5, 2013 at 8:41 AM, Ryan Newton
wrote: After playing a bit with Ryan's benchmark, I no longer think that the
order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard.
Yeah, it's all rather different to predict in advance, isn't it?
I tried your alternate foldrWithKey and I saw it heap allocating as well.
Further, -O0 vs. -O2 can make a big difference. It's a little frustrating because for dealing efficiently with big data sets, especially in parallel. It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level.
By the way, is traverse/traverseWithKey supposed to guarantee a specific order? The doc uses this code in the definition:
traversehttp://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Trave... ((k, v) -> (,) k $http://hackage.haskell.org/packages/archive/containers/latest/doc/html/$ f k v) (toListhttp://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-... m)
And I thought "toList" didn't guarantee anything (as opposed to toAscList)...
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Ok, here's a little test to confirm what happens when you try to use
foldrWithKey for this.
*--------------------------------------------------------*
*import Control.DeepSeq*
*import GHC.Stats*
*import qualified Data.Map.Strict as M*
*import Data.Time.Clock*
*import Control.Exception*
*import System.Mem*
*
*
*main :: IO ()*
*main = do*
* t0 <- getCurrentTime*
* let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])*
* evaluate$ rnf m0 *
* t1 <- getCurrentTime*
* performGC*
* s1 <- getGCStats *
* putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show
s1++"\n"*
* let fn 500000 v = putStrLn "Got it!"*
* fn _ _ = return ()*
* *
* M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0*
* t2 <- getCurrentTime*
* performGC*
* s2 <- getGCStats *
* putStrLn$"Consumed map in "++show (diffUTCTime t2 t0)++"\n "++ show
s2++"\n"*
* putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s2 -
bytesAllocated s1)*
* return ()*
*-------------------------------------------------------- *
It's also at this Gist:
https://gist.github.com/rrnewton/5912513#file-maptest-hs
And here is the loop ("go10") generated "fn":
https://gist.github.com/rrnewton/5912513#file-maptest-ddump-simple-L214
Ok, empirically, in -O2, the consumption phase allocates 32MB additional
data (for a 1 million element map), and in -O0 it allocates 200MB. Here's
the recursive case of the loop:
((\ (eta_B1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case x1_s2rr of _ {
__DEFAULT ->
((go10_r2uL z'_a10K r_a10S)
`cast` (
Hi all,
Thanks for the responses. I want to go through and make sure I understand these.
-------------------------------------------------------- First, Henning, won't both of these allocate in proportion to the size of the map?
Map.foldrWithKey (\k a -> f k a >>) (return ()) Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when building up that large monadic computation in the foldrWithKey?
-------------------------------------------------------- Edward said to use foldMapWithKey, which hasn't been released yet, but it sounds like it will be.
https://github.com/ekmett/containers/commit/40187f32a43689ff02ca2b97465aa4fc...
Even then, I might argue it is somewhat non-obvious to the programmers how to use this to do a non-allocating "for-each". For example, I am not totally clear on what will happen with that tree of mappend calls -- will it allocate thunks? Further, IO is not a monoid, so am I to create an instance of "Monoid (IO ())" in order to use foldMapWithKey to iterate over the Map?
-------------------------------------------------------- On Tue, Jul 2, 2013 at 10:29 AM, Shachaf Ben-Kiki
wrote: *> Is there a reason you couldn't implement this just as well using traverseWithKey, à la http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Le... *
That function looks more overloaded than the traverse in Data.Map that I'm referring to, e.g. here:
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-St...
I'm afraid I don't understand the proposal then -- is it to use lens somehow? For the traversal I need to do over a Data.Map.Map, I need to fix 't' to be IO or Par or whatever I'm working with, so that the (k -> a -> t b) function I'm passing in can do the effects I need.
To be specific I'm proposing providing these variants:
traverseWithKey :: **Applicative t => (k -> a -> t b) -> Map k a -> t (Map k b) traverseWithKey_ :: **Applicative t => (k -> a -> t ()) -> Map k a -> t ()
And without exposing the latter natively, I still don't understand how to trick the former into not allocating, if that's the proposal.
-Ryan
On Tue, Jul 2, 2013 at 2:54 PM, Edward Kmett
wrote: On Tue, Jul 2, 2013 at 2:45 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
Foldable.sequence_ . Map.mapWithKey f
which looks more elegant.
This has the unfortunate consequence that it builds an entire new map strictly before sequencing it, due to the fact that Data.Map is spine-strict. =(
-Edward

On Tue, Jul 2, 2013 at 12:32 PM, Ryan Newton
On Tue, Jul 2, 2013 at 10:29 AM, Shachaf Ben-Kiki
wrote: Is there a reason you couldn't implement this just as well using traverseWithKey, à la
http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Le...
That function looks more overloaded than the traverse in Data.Map that I'm referring to, e.g. here:
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-St...
I'm afraid I don't understand the proposal then -- is it to use lens somehow? For the traversal I need to do over a Data.Map.Map, I need to fix 't' to be IO or Par or whatever I'm working with, so that the (k -> a -> t b) function I'm passing in can do the effects I need.
To be specific I'm proposing providing these variants:
traverseWithKey :: Applicative t => (k -> a -> t b) -> Map k a -> t (Map k b) traverseWithKey_ :: Applicative t => (k -> a -> t ()) -> Map k a -> t ()
And without exposing the latter natively, I still don't understand how to trick the former into not allocating, if that's the proposal.
-Ryan
The suggestion is that (a) You can derive a balanced foldMapWithKey from traverseWithKey, as follows: foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x)) Since the Applicative used is Const (newtype Const m a = Const m), the structure is never built up. (b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows: newtype Traverse_ f = Traverse_ { runTraverse_ :: f () } instance Applicative f => Monoid (Traverse_ f) where mempty = Traverse_ (pure ()) Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b) traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f () traverseWithKey_ f = runTraverse_ . foldMapWithKey (\k x -> Traverse_ (void (f k x))) As Henning and Edward pointed out, though, foldrWithKey/foldlWithKey are already exported by Data.Map (and they give you right/left associativity, so they're possibly better... Of course, you can derive them from traverseWithKey too!). Shachaf

foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))
Shachaf, thanks for fleshing it out. I updated the test with your version as well: https://gist.github.com/rrnewton/5912908 In short, it performs exactly the same as the foldrWithKey version, as you pointed out (32M allocation). In both cases, using first class monadic/applicative values seems to foil GHC. And btw, these do show the scaling you would expect, on 2M elements, it allocates 64MB, 4M -> 128MB, and so on, whereas the traverseWithKey_ version allocates a constant amount. -Ryan

On Tue, Jul 2, 2013 at 1:46 PM, Ryan Newton
foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))
Shachaf, thanks for fleshing it out. I updated the test with your version as well:
https://gist.github.com/rrnewton/5912908
In short, it performs exactly the same as the foldrWithKey version, as you pointed out (32M allocation).
In both cases, using first class monadic/applicative values seems to foil GHC.
And btw, these do show the scaling you would expect, on 2M elements, it allocates 64MB, 4M -> 128MB, and so on, whereas the traverseWithKey_ version allocates a constant amount.
-Ryan
If you're actually benchmarking it, you should note a few things: * As I mentioned, lens's actual implementation of this function is slightly different: -- The argument 'a' of the result should not be used! newtype Traversed a f = Traversed { getTraversed :: f a } instance Applicative f => Monoid (Traversed a f) where mempty = Traversed (pure (error "Traversed: value used")) Traversed ma `mappend` Traversed mb = Traversed (ma *> mb) with one "void" applied at the very end of the fold, instead of one in each step. This may or may not be better; it should probably be measured. * Following a discussion with Milan off-list, he implemented a simple optimization in traverseWithKey which might have a significant impact. See https://github.com/haskell/containers/commit/4d24ff5d08f0bb27ca73a9888286d64... . It should probably be considered in these benchmarks. * I haven't thought it through, but it's possible that using a "difference monoid" -- i.e. folding with Endo, the way Data.Foldable.foldr is implemented by default -- would also be useful to measure, to compare with the existing foldrWithKey. Shachaf

Shachaf, I checked and Milan's commits that improve traverseWithKey were
already incorporated when I ran my tests above. The extra speedup is good
but doesn't change the O(1) vs. O(N) allocation situation.
Ok, so the discussion period for this one is over two weeks. I don't
believe there were any real objections to traverseWithKey_, rather there
were several questions raised about whether the non-allocating behavior
could be accomplished in other ways, which, alas didn't pan out.
Thus if there are no other objections, can someone merge pull request #30?
https://github.com/haskell/containers/pull/30
P.S. I've been doing a lot of performance-oriented monadic programming with
large data structures (for the LVar project), and this Map issue is only
one of several places where containers API's force you to go through lists
or to otherwise allocate. I think a continuing audit would be good and
will make other suggestions and pull requests as I come to them.
On Fri, Jul 5, 2013 at 3:51 AM, Shachaf Ben-Kiki
On Tue, Jul 2, 2013 at 1:46 PM, Ryan Newton
wrote: foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))
Shachaf, thanks for fleshing it out. I updated the test with your
version
as well:
https://gist.github.com/rrnewton/5912908
In short, it performs exactly the same as the foldrWithKey version, as you pointed out (32M allocation).
In both cases, using first class monadic/applicative values seems to foil GHC.
And btw, these do show the scaling you would expect, on 2M elements, it allocates 64MB, 4M -> 128MB, and so on, whereas the traverseWithKey_ version allocates a constant amount.
-Ryan
If you're actually benchmarking it, you should note a few things:
* As I mentioned, lens's actual implementation of this function is slightly different:
-- The argument 'a' of the result should not be used! newtype Traversed a f = Traversed { getTraversed :: f a } instance Applicative f => Monoid (Traversed a f) where mempty = Traversed (pure (error "Traversed: value used")) Traversed ma `mappend` Traversed mb = Traversed (ma *> mb)
with one "void" applied at the very end of the fold, instead of one in each step. This may or may not be better; it should probably be measured.
* Following a discussion with Milan off-list, he implemented a simple optimization in traverseWithKey which might have a significant impact. See https://github.com/haskell/containers/commit/4d24ff5d08f0bb27ca73a9888286d64... . It should probably be considered in these benchmarks.
* I haven't thought it through, but it's possible that using a "difference monoid" -- i.e. folding with Endo, the way Data.Foldable.foldr is implemented by default -- would also be useful to measure, to compare with the existing foldrWithKey.
Shachaf

On Mon, Jul 29, 2013 at 11:47 AM, Ryan Newton
Shachaf, I checked and Milan's commits that improve traverseWithKey were already incorporated when I ran my tests above. The extra speedup is good but doesn't change the O(1) vs. O(N) allocation situation.
Wait, are you saying you couldn't get it to work in constant memory at all without modifying containers? I didn't actually look at the benchmarks in much detail before. I just looked at the code -- it looks like in your latest posted benchmark you don't actually use the alternate traverseWithKey_ anywhere -- instead you use foldrWithKey twice (the perils of not compiling with -Wall!). I just ran the benchmark with the alternate version and it looks like it uses constant memory. Maybe I'm misunderstanding. I'm not against adding this to containers, but it should be clear with such a patch whether it's being added for necessity or convenience. It looks to me like it's the latter. (By the way: lens also provides this function, with the name "itraverse_" (i for indexed). I tried it and it looks like it also uses constant memory.) Shachaf

Oops, that was sloppy. Yes, your version does get the job done without
allocating. The corrected test is attached to this email.
The lens interface does look quite full featured! And it's nice to see
that it consistently includes '_' variants. I cite that as additional
evidence for the norm ;-).
Personally, I still want traverseWithKey_ for convenience, especially
because the solution you used is non-obvious. I imagine many Data.Map
users would not come up with it (as the rest of us on this thread didn't).
Best,
-Ryan
-----------------------------
import Control.Applicative
import Control.Monad
import Control.DeepSeq
import Control.Exception
import GHC.Stats
import qualified Data.Map.Strict as M
import Data.Time.Clock
import Data.Monoid
import System.Mem
import System.Environment
main :: IO ()
main = do
args <- getArgs
let size = case args of
[] -> 1000000::Int
[n] -> read n
let m0 = M.fromList (map (\i -> (i,i)) [1..size])
let fn 500000 v = putStrLn "Got it!"
fn _ _ = return ()
-- let fn i v = putStrLn$"fn: "++show(i,v)
st <- getCurrentTime
evaluate$ rnf m0
en <- getCurrentTime
performGC
s1 <- getGCStats
putStrLn$"Constructed map in "++show (diffUTCTime en st)++"\n "++ show
s1++"\n"
------------------------------------------------------------
-- Regular traverseWithKey uses 48MB
-- traverseWithKey_ uses 200K of allocation:
st <- getCurrentTime
M.traverseWithKey_ fn m0
en <- getCurrentTime
performGC
s2 <- getGCStats
putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime en
st)++"\n "++ show s2++"\n"
putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s2 -
bytesAllocated s1)
------------------------------------------------------------
-- foldrWithKey uses 32MB allocation:
st <- getCurrentTime
M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0
en <- getCurrentTime
performGC
s3 <- getGCStats
putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime en st)++"\n
"++ show s3++"\n"
putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s3 -
bytesAllocated s2)
------------------------------------------------------------
-- An alternate version was proposed by Shachaf Ben-Kiki:
st <- getCurrentTime
traverseWithKey_ fn m0
en <- getCurrentTime
performGC
s4 <- getGCStats
putStrLn$"[alternate traverseWithKey_] Consumed map in "++show
(diffUTCTime en st)++"\n "++ show s4++"\n"
putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s4 -
bytesAllocated s3)
return ()
foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r
foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))
-- Since the Applicative used is Const (newtype Const m a = Const m), the
-- structure is never built up.
--(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows:
newtype Traverse_ f = Traverse_ { runTraverse_ :: f () }
instance Applicative f => Monoid (Traverse_ f) where
mempty = Traverse_ (pure ())
Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b)
traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f ()
traverseWithKey_ f = runTraverse_ .
foldMapWithKey (\k x -> Traverse_ (void (f k x)))
On Mon, Jul 29, 2013 at 3:08 PM, Shachaf Ben-Kiki
On Mon, Jul 29, 2013 at 11:47 AM, Ryan Newton
wrote: Shachaf, I checked and Milan's commits that improve traverseWithKey were already incorporated when I ran my tests above. The extra speedup is good but doesn't change the O(1) vs. O(N) allocation situation.
Wait, are you saying you couldn't get it to work in constant memory at all without modifying containers? I didn't actually look at the benchmarks in much detail before. I just looked at the code -- it looks like in your latest posted benchmark you don't actually use the alternate traverseWithKey_ anywhere -- instead you use foldrWithKey twice (the perils of not compiling with -Wall!). I just ran the benchmark with the alternate version and it looks like it uses constant memory.
Maybe I'm misunderstanding. I'm not against adding this to containers, but it should be clear with such a patch whether it's being added for necessity or convenience. It looks to me like it's the latter.
(By the way: lens also provides this function, with the name "itraverse_" (i for indexed). I tried it and it looks like it also uses constant memory.)
Shachaf

On Mon, Jul 29, 2013 at 8:57 PM, Ryan Newton
Oops, that was sloppy. Yes, your version does get the job done without allocating. The corrected test is attached to this email.
The lens interface does look quite full featured! And it's nice to see that it consistently includes '_' variants. I cite that as additional evidence for the norm ;-).
Personally, I still want traverseWithKey_ for convenience, especially because the solution you used is non-obvious. I imagine many Data.Map users would not come up with it (as the rest of us on this thread didn't).
Sure, putting it in containers is fine. I just want to mention that these kinds of functions are very general-purpose. In particular, using just traverseWithKey and the proposed alterF, you can implement at least all of: (!) adjust adjustWithKey alter delete empty findWithDefault foldl foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey foldrWithKey' insert insertLookupWithKey insertWith insertWithKey lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe mapMaybeWithKey mapWithKey member notMember toList update updateLookupWithKey updateWithKey With optimal asymptotic behavior and, in many cases, with just-about-optimal constant factors too (this is essentially what lens does). But adding this function directly to containers for completeness seems reasonable. Shachaf

(!) adjust adjustWithKey alter delete empty findWithDefault foldl foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey foldrWithKey' insert insertLookupWithKey insertWith insertWithKey lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe mapMaybeWithKey mapWithKey member notMember toList update updateLookupWithKey updateWithKey
So then this becomes a recommendation for the implementation strategy then right? (e.g. within containers) Users probably still want a rich set of variants, and they need to be packaged somewhere. But it seems like there's an argument that deriving these from a small core helps to ensure that there are no bugs. Yet that sounds like a battle for another day. On the topic of this proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind merging this patch? -Ryan

Hi Ryan,
-----Original message----- From: Ryan Newton
Sent: 30 Jul 2013, 09:35 (!) adjust adjustWithKey alter delete empty findWithDefault foldl foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey foldrWithKey' insert insertLookupWithKey insertWith insertWithKey lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe mapMaybeWithKey mapWithKey member notMember toList update updateLookupWithKey updateWithKey
So then this becomes a recommendation for the implementation strategy then right? (e.g. within containers) Users probably still want a rich set of variants, and they need to be packaged somewhere. But it seems like there's an argument that deriving these from a small core helps to ensure that there are no bugs.
Yet that sounds like a battle for another day. On the topic of this proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind merging this patch?
I would like to postpone merging for a bit and spend more time investigating the issue. I want to investigate why exactly foldrWithKey (\k a b -> f k a *> b) (pure ()) does not work as I would expect it not to allocate. I am not convinced that traverseWithKey_ is a good addition to the API because my original thought was "this is just a fold". The API of containers has tendency to grow, so I am careful about adding functions that can be expressed easily using existing ones. Of course, ability to iterate effectively over the elements of the containers definitely IS useful, so if I am unable to make the fold work (or GHC optimizer cannot do it at present), I will merge it. Cheers, Milan

Ok, let us know what you find out. I totally agree that Data.Map has
gotten pretty cluttered.
But, also, part of my concern is that even if the optimizer does know how
to optimize that fold, how does the programmer who needs this kind of
routine *know* that it will work? (Especially since we didn't.) I think
it's a bad habit to for users to stop paying attention to the cost model
entirely and instead believe that compiler magic will eliminate everything!
Yet exactly that kind of thinking seems to be encouraged by data structure
libraries that create unnecessary copies frivolously (esp. with fusion
frameworks that sometimes/maybe fix the problem).
On Tue, Jul 30, 2013 at 11:32 AM, Milan Straka
Hi Ryan,
-----Original message----- From: Ryan Newton
Sent: 30 Jul 2013, 09:35 (!) adjust adjustWithKey alter delete empty findWithDefault foldl foldl' foldlWithKey foldlWithKey' foldr foldr' foldrWithKey foldrWithKey' insert insertLookupWithKey insertWith insertWithKey lookup map mapAccum mapAccumRWithKey mapAccumWithKey mapMaybe mapMaybeWithKey mapWithKey member notMember toList update updateLookupWithKey updateWithKey
So then this becomes a recommendation for the implementation strategy then right? (e.g. within containers) Users probably still want a rich set of variants, and they need to be packaged somewhere. But it seems like there's an argument that deriving these from a small core helps to ensure that there are no bugs.
Yet that sounds like a battle for another day. On the topic of this proposal -- since Shachaf doesn't object, would you (Johan / Milan) mind merging this patch?
I would like to postpone merging for a bit and spend more time investigating the issue.
I want to investigate why exactly foldrWithKey (\k a b -> f k a *> b) (pure ()) does not work as I would expect it not to allocate.
I am not convinced that traverseWithKey_ is a good addition to the API because my original thought was "this is just a fold". The API of containers has tendency to grow, so I am careful about adding functions that can be expressed easily using existing ones.
Of course, ability to iterate effectively over the elements of the containers definitely IS useful, so if I am unable to make the fold work (or GHC optimizer cannot do it at present), I will merge it.
Cheers, Milan

-----Original message----- From: Ryan Newton
Sent: 30 Jul 2013, 12:01 Ok, let us know what you find out. I totally agree that Data.Map has gotten pretty cluttered.
But, also, part of my concern is that even if the optimizer does know how to optimize that fold, how does the programmer who needs this kind of routine *know* that it will work? (Especially since we didn't.)
I believe implementing traverseWithKey_ using a foldWithKey is quite natural (you want to go over the elements and do something with them, in this case perform some action on them). I expected it to work and I am surprised it does not. So for me it is the other way around -- I have a function which I expected to behave nice as a fold and it does not. There is probably a reason for that and that may cause addition of traverseWithKey_ to the API.
I think it's a bad habit to for users to stop paying attention to the cost model entirely and instead believe that compiler magic will eliminate everything! Yet exactly that kind of thinking seems to be encouraged by data structure libraries that create unnecessary copies frivolously (esp. with fusion frameworks that sometimes/maybe fix the problem).
Yes, that is true. Nevertheless, it is not easy to think about what will and what will not cause heap allocation in Haskell. Also, heap allocation is quite cheap in Haskell, sometimes faster implementations of containers methods allocate more memory (but we usually do not use them and prefer less allocation). Wanting to avoid allocation sometimes causes to avoid higher order functions and advanced functional techniques, which is also not ideal. Nevertheless, I will try to investigate. Cheers, Milan

I believe implementing traverseWithKey_ using a foldWithKey is quite natural (you want to go over the elements and do something with them, in this case perform some action on them). I expected it to work and I am surprised it does not. So for me it is the other way around -- I have a function which I expected to behave nice as a fold and it does not
Re: expectations. You don't get a funny feeling when monadic values are used as first class rather than second class ;-)? Whether in the accumulator of a fold, or in cases like this: do act <- f x act My expectation was that the optimizer would have more trouble. But maybe that's off base. Also, heap
allocation is quite cheap in Haskell, sometimes faster implementations of containers methods allocate more memory (but we usually do not use them and prefer less allocation). Wanting to avoid allocation sometimes causes to avoid higher order functions and advanced functional techniques, which is also not ideal.
Yes, my situation, which I agree is a minority position, is that I work on and care about parallelism. The current state of GHC is that it does NOT have a scalable GC, and virtually *every* parallel program that contains "idiomatic Haskell" levels of allocation fails to scale rather quickly as you increase threads. I.e. you quickly get to productivities of less than 50%. (Not necessarily at 32 threads either, often you are up against this wall at 8 or less.) So for the time being good parallel Haskell programs are low-allocating parallel Haskell programs.

* Ryan Newton
I believe implementing traverseWithKey_ using a foldWithKey is quite natural (you want to go over the elements and do something with them, in this case perform some action on them). I expected it to work and I am surprised it does not. So for me it is the other way around -- I have a function which I expected to behave nice as a fold and it does not
Re: expectations. You don't get a funny feeling when monadic values are used as first class rather than second class ;-)? Whether in the accumulator of a fold, or in cases like this:
do act <- f x act
I don't. I don't even believe that the compiler can spot the difference between the two. (But maybe it's just my ignorance.) Do you have a specific example where this is a problem? Roman

Re: expectations. You don't get a funny feeling when monadic values are used as first class rather than second class ;-)? Whether in the accumulator of a fold, or in cases like this:
do act <- f x act
I don't. I don't even believe that the compiler can spot the difference between the two. (But maybe it's just my ignorance.)
Do you have a specific example where this is a problem?
Well, I just generally assumed that anywhere where inlining is foiled would, any kind of allocation-eliminating optimization is unlikely to happen. As one example, in the monad-par scheduler(s) https://github.com/simonmar/monad-par/blob/0e0401ea2cfbab787c67c2ed826c123c5...we do inscrutable computations to retrieve a monadic-value, and then run it. This works fine, but it does mean that we pay the full cost of an abstract monadic action at that point (indirect jump, etc), rather than the highly discounted rate we are used to paying when we chain together monadic actions in a syntactically adjacent way within a function. Personally, I don't understand all the GHC optimization steps well. But I have verified certain things that it *doesn't *do. For example, if I have an allocating action in the tail (only exiting case) of a recursive monadic function, it cannot do "loop peeling" to eliminate that allocation against whatever context is calling the recursive function. For example the STG for the following leaves the allocation of the tuple and two Ints: *foo :: Int -> IO (Int,Int)* *foo x | x < 10 = return (x, 2*x)* *foo x = foo (x-1)* * * *main = do* * (x,y) <- foo 1000* * print x* * print y* Which is normally fine if the trip count is high. But consider something like compare-and-swap where there's a loop, but the expected trip count is very low. Perhaps this is just one example of how GHC tends not to explicitly represent and optimize loops? The Data.Map.Base.foldRWithKey function discussed in this thread is another example. That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop: *go10_r3TD :: IO () -> Map Int Int -> IO ()* *[GblId, Arity=2, Str=DmdType LS, Unf=OtherCon []] =* * sat-only \r srt:(0,*bitmap*) [z'_s3Yk ds_s3Y7]* * case ds_s3Y7 of _ {* * Bin rb_s43z kx_s3Ye x_s43A l_s3Ys r_s3Yl ->* * case kx_s3Ye of _ {* * I# x1_s3Yi ->* * let {* * sat_s43x :: IO ()* * [LclId] =* * \r srt:(0,*bitmap*) [eta_s3Ym]* * case x1_s3Yi of _ {* * __DEFAULT -> go10_r3TD z'_s3Yk r_s3Yl eta_s3Ym;* * 500000 ->* * case hPutStr2 stdout lvl_r3Ty True eta_s3Ym of _ {* * (#,#) ipv_s3Yq _ -> go10_r3TD z'_s3Yk r_s3Yl ipv_s3Yq;* * };* * };* * } in go10_r3TD sat_s43x l_s3Ys;* * };* * Tip -> z'_s3Yk;* * };* *SRT(go10_r3TD): [hPutStr2, stdout, lvl_r3Ty, go10_r3TD]* The specific problem in this example seems to be that -- based on a literal reading of the above -- it's creating a closure that closes over the Int#, * x1_s3Yi*. Shachaf's version that uses a newtype seems to avoid this trouble by not allowing IO's (>>=) into it at all. The traverseWithKey_ version is attached below [1], and it manages to get rid of the IO newtype in the loop and resolves to a State#. Perhaps there is a missing optimization tweak that would help GHC get rid of the IO type in the above STG? Cheers, -Ryan [1] P.S. Here's the non-allocating loop produced by traverseWithKey_: *a_r3X7* * :: Map Int Int -> State# RealWorld -> (# State# RealWorld, () #)* *[GblId, Arity=2, Str=DmdType SL, Unf=OtherCon []] =* * sat-only \r srt:(0,*bitmap*) [ds_s41Z eta_s42c]* * case ds_s41Z of _ {* * Bin rb_s47H k_s426 v_s47I l_s42b r_s42g ->* * case k_s426 of _ {* * I# x_s429 ->* * case x_s429 of _ {* * __DEFAULT ->* * case a_r3X7 l_s42b eta_s42c of _ {* * (#,#) ipv_s42h _ -> a_r3X7 r_s42g ipv_s42h;* * };* * 500000 ->* * case hPutStr2 stdout lvl_r3X1 True eta_s42c of _ {* * (#,#) ipv_s42l _ ->* * case a_r3X7 l_s42b ipv_s42l of _ {* * (#,#) ipv2_s42p _ -> a_r3X7 r_s42g ipv2_s42p;* * };* * };* * };* * };* * Tip -> (#,#) [eta_s42c ()];* * };* *SRT(a_r3X7): [hPutStr2, stdout, lvl_r3X1, a_r3X7]*

For example the STG for the following leaves the allocation of the tuple and two Ints: foo :: Int -> IO (Int,Int) foo x | x < 10 = return (x, 2*x) foo x = foo (x-1) Fixing this involves *nested* CPR analysis, which I am working on at the moment. The Data.Map.Base.foldRWithKey function discussed in this thread is another example. That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop: This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t'other? Then I could investigate more easily. thanks! Simon From: Libraries [mailto:libraries-bounces@haskell.org] On Behalf Of Ryan Newton Sent: 03 August 2013 08:45 To: Roman Cheplyaka Cc: Haskell Libraries Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
Re: expectations. You don't get a funny feeling when monadic values are used as first class rather than second class ;-)? Whether in the accumulator of a fold, or in cases like this:
do act <- f x act I don't. I don't even believe that the compiler can spot the difference between the two. (But maybe it's just my ignorance.)
Do you have a specific example where this is a problem? Well, I just generally assumed that anywhere where inlining is foiled would, any kind of allocation-eliminating optimization is unlikely to happen. As one example, in the monad-par scheduler(s) https://github.com/simonmar/monad-par/blob/0e0401ea2cfbab787c67c2ed826c123c5... we do inscrutable computations to retrieve a monadic-value, and then run it. This works fine, but it does mean that we pay the full cost of an abstract monadic action at that point (indirect jump, etc), rather than the highly discounted rate we are used to paying when we chain together monadic actions in a syntactically adjacent way within a function. Personally, I don't understand all the GHC optimization steps well. But I have verified certain things that it doesn't do. For example, if I have an allocating action in the tail (only exiting case) of a recursive monadic function, it cannot do "loop peeling" to eliminate that allocation against whatever context is calling the recursive function. For example the STG for the following leaves the allocation of the tuple and two Ints: foo :: Int -> IO (Int,Int) foo x | x < 10 = return (x, 2*x) foo x = foo (x-1) main = do (x,y) <- foo 1000 print x print y Which is normally fine if the trip count is high. But consider something like compare-and-swap where there's a loop, but the expected trip count is very low. Perhaps this is just one example of how GHC tends not to explicitly represent and optimize loops? The Data.Map.Base.foldRWithKey function discussed in this thread is another example. That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop: go10_r3TD :: IO () -> Map Int Int -> IO () [GblId, Arity=2, Str=DmdType LS, Unf=OtherCon []] = sat-only \r srt:(0,*bitmap*) [z'_s3Yk ds_s3Y7] case ds_s3Y7 of _ { Bin rb_s43z kx_s3Ye x_s43A l_s3Ys r_s3Yl -> case kx_s3Ye of _ { I# x1_s3Yi -> let { sat_s43x :: IO () [LclId] = \r srt:(0,*bitmap*) [eta_s3Ym] case x1_s3Yi of _ { __DEFAULT -> go10_r3TD z'_s3Yk r_s3Yl eta_s3Ym; 500000 -> case hPutStr2 stdout lvl_r3Ty True eta_s3Ym of _ { (#,#) ipv_s3Yq _ -> go10_r3TD z'_s3Yk r_s3Yl ipv_s3Yq; }; }; } in go10_r3TD sat_s43x l_s3Ys; }; Tip -> z'_s3Yk; }; SRT(go10_r3TD): [hPutStr2, stdout, lvl_r3Ty, go10_r3TD] The specific problem in this example seems to be that -- based on a literal reading of the above -- it's creating a closure that closes over the Int#, x1_s3Yi. Shachaf's version that uses a newtype seems to avoid this trouble by not allowing IO's (>>=) into it at all. The traverseWithKey_ version is attached below [1], and it manages to get rid of the IO newtype in the loop and resolves to a State#. Perhaps there is a missing optimization tweak that would help GHC get rid of the IO type in the above STG? Cheers, -Ryan [1] P.S. Here's the non-allocating loop produced by traverseWithKey_: a_r3X7 :: Map Int Int -> State# RealWorld -> (# State# RealWorld, () #) [GblId, Arity=2, Str=DmdType SL, Unf=OtherCon []] = sat-only \r srt:(0,*bitmap*) [ds_s41Z eta_s42c] case ds_s41Z of _ { Bin rb_s47H k_s426 v_s47I l_s42b r_s42g -> case k_s426 of _ { I# x_s429 -> case x_s429 of _ { __DEFAULT -> case a_r3X7 l_s42b eta_s42c of _ { (#,#) ipv_s42h _ -> a_r3X7 r_s42g ipv_s42h; }; 500000 -> case hPutStr2 stdout lvl_r3X1 True eta_s42c of _ { (#,#) ipv_s42l _ -> case a_r3X7 l_s42b ipv_s42l of _ { (#,#) ipv2_s42p _ -> a_r3X7 r_s42g ipv2_s42p; }; }; }; }; Tip -> (#,#) [eta_s42c ()]; }; SRT(a_r3X7): [hPutStr2, stdout, lvl_r3X1, a_r3X7]

Hi Ryan and Simon,
From: Libraries [mailto:libraries-bounces@haskell.org] On Behalf Of Ryan Newton Sent: 03 August 2013 08:45 To: Roman Cheplyaka Cc: Haskell Libraries Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
The Data.Map.Base.foldRWithKey function discussed in this thread is another example. That's a place where even after it inlines the provided function into the fold, we end up with the below STG with an allocation of an IO () function inside the inner loop:
I have been thinking about it and I think GHC optimizer is not to blame in this case. The two methods we are interested in are: traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t () traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l and we call them like this: let actionTraverse _k 1000000 = putStrLn "Hit 1000000" actionTraverse _k _v = return () in traverseWithKey_ actionTraverse map and this: let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z actionFoldr _k _v z = z in foldrWithKey actionFoldr (pure ()) map So although both traverseWithKey_ and foldrWithKey build an action, only traverseWithKey_ can execute the action 'on-the-fly'. When foldrWithKey is calling itself recursively, it has to allocate thunk with the IO action which is to be performed after the subtree is processed. If a strict fold is used, the memory allocated is lower (it allocated only the resulting IO action itself), but still linear in the size of the map. If the GHC optimizer were to avoid allocating, it would have to rewrite the 'foldrWithKey actionFoldr' by pushing the *> up to the definition of the recursive case of foldrWithKey, which seems extremely complicated to perform automatically (it would have to add several 'pure ()' to the code). But maybe I am not seeing some other way around it. Nevertheless, I am not sure how to proceed. The problem is that Data.Foldable offers many non-overloadable methods that are provided through the Foldable instance, notably traverse_ foldrM foldlM which could be implemented so that they iterate over all elements without allocating, but the Data.Foldable implementation allocates and cannot be overloaded. Therefore, if we add traverseWithKey_, we still will not have means of iterating an action over Set and IntSet elements, and although Map.traverseWithKey_ will not allocate, using traverse_ on Map will. Also I am not happy that traverseWithKey_ does not specify order in which it processed the elements. That is not such a big issue for traverseWithKey, because the map is reassembled afterwards, but traverseWithKey_ is only performing the action (without constructing the map) and it does so in unspecified order. What I am thinking at the moment about is foldlM and foldrM -- I checked that they can be implemented not to allocate and they specify evaluation order. But they require a Monad, not only Applicative, and would clash with Data.Foldable.fold[lr]M. Maybe we will end up adding both traverseWithKey_ and fold[lr]M? But than again, API growth, API growth :) Cheers, Milan

On Aug 4, 2013, at 9:49 AM, Milan Straka
The problem is that Data.Foldable offers many non-overloadable methods that are provided through the Foldable instance, notably traverse_ foldrM foldlM which could be implemented so that they iterate over all elements without allocating, but the Data.Foldable implementation allocates and cannot be overloaded.
Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable! greetings, Sjoerd

Fixing this involves **nested** CPR analysis, which I am working on at
On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones
This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other? Then I could investigate more easily.
Milan pasted the relevant code (thanks) for traverseWithKey_ vs.
foldWithKey which I reattach at the bottom of this email.
On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher
Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!
That sounds good to me! And straightforward. Any objections? Milan, why does it bother you that there is no specified order? I'm perfectly happy as long as its deterministic on a particular machine. (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.) Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified? Further, other languages (like Scheme) have maps/folds that do not specify order of side effects. -Ryan P.S.: Relevant code: traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t () traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l and we call them like this: let actionTraverse _k 1000000 = putStrLn "Hit 1000000" actionTraverse _k _v = return () in traverseWithKey_ actionTraverse map and this: let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z actionFoldr _k _v z = z in foldrWithKey actionFoldr (pure ()) map

Hi Ryan,
-----Original message----- From: Ryan Newton
Sent: 6 Aug 2013, 09:51 Milan, why does it bother you that there is no specified order? I'm perfectly happy as long as its deterministic on a particular machine.
it seems to me that having specified order (e.g., ascending) is more useful, if it can be provided. If I use traverseWithKey_ to process the elements (e.g., write them to a file, add them to other structure, send across the network), it seems to me that process them in ascending order is more beneficial that iterating over them in some order specified by the shape of the structure. I am not saying that having an operation without defined order is useless, only that operation with ascending order seems more useful to me.
Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified?
Well, we also have toAscList and toDescList. Moreover, toList is actually toAscList, so I wonder how many libraries would be affected if toList started returning the elements in some random order :) My point is this -- if we think having an Applicative / Monadic iteration which does not allocate is useful, I believe we should also provide such an iteration in ascending order. Cheers, Milan

traverse and traverse_ visit elements in the same order as foldMap and
foldr up to the Monoid/Applicative laws permitting (finite) reassociation
and unit mapping.
It would stand to reason that traverseWithKey, traverseWithKey_,
foldMapWithKey and foldrWithKey should retain that relationship.
I don't particularly care about what the order is, but that said, if you
start breaking the invariant of the current ordering, you'll silently break
a lot of people's code while still permitting it to typecheck.
This means that the errors will be insidious and difficult to find. e.g.
the lens-based zipper code for walking into a Map will silently break and
I'm sure I'm not the only one who has taken advantage of the existing
ordering on these combinators.
-Edward
On Tue, Aug 6, 2013 at 9:51 AM, Ryan Newton
On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones
wrote: Fixing this involves **nested** CPR analysis, which I am working on at the moment. Sounds neat! Is nested CPR analysis on a branch?
This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t’other? Then I could investigate more easily.
Milan pasted the relevant code (thanks) for traverseWithKey_ vs. foldWithKey which I reattach at the bottom of this email.
On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher
wrote: Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!
That sounds good to me! And straightforward. Any objections?
Milan, why does it bother you that there is no specified order? I'm perfectly happy as long as its deterministic on a particular machine. (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.) Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified? Further, other languages (like Scheme) have maps/folds that do not specify order of side effects.
-Ryan
P.S.: Relevant code:
traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t () traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r
foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l
and we call them like this: let actionTraverse _k 1000000 = putStrLn "Hit 1000000" actionTraverse _k _v = return () in traverseWithKey_ actionTraverse map
and this: let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z actionFoldr _k _v z = z in foldrWithKey actionFoldr (pure ()) map
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi Edward,
-----Original message----- From: Edward Kmett
Sent: 6 Aug 2013, 13:38 traverse and traverse_ visit elements in the same order as foldMap and foldr up to the Monoid/Applicative laws permitting (finite) reassociation and unit mapping.
It would stand to reason that traverseWithKey, traverseWithKey_, foldMapWithKey and foldrWithKey should retain that relationship.
I don't particularly care about what the order is, but that said, if you start breaking the invariant of the current ordering, you'll silently break a lot of people's code while still permitting it to typecheck.
This means that the errors will be insidious and difficult to find. e.g. the lens-based zipper code for walking into a Map will silently break and I'm sure I'm not the only one who has taken advantage of the existing ordering on these combinators.
I am not suggesting we should change the behaviour of existing functions and traverseWithKey_ should definitely use the same order as traverseWithKey. Changing semantics without changing type signatures is really suspicious and usually plainly wrong. Nevertheless, I was wondering whether we should have a monadic fold (foldrM and foldlM) which would process the elements in a given order (ascending and descending, analogously to foldr and foldl). From one point of view, we can implement foldrM and foldlM using foldr and foldl, nevertheless using linear heap space complexity compared to constant heap space complexity we can achieve with specialized implementations. This is the same situation as traverseWithKey_ -- we can implement it using traverseWithKey, but the heap space complexity increases. Cheers, Milan

On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka
Hi Edward,I am not suggesting we should change the behaviour of existing functions and traverseWithKey_ should definitely use the same order as traverseWithKey. Changing semantics without changing type signatures is really suspicious and usually plainly wrong.
I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r
... which visits the key/value pairs out of order unlike, say: go (Bin _ k v l r = go l *> f k v *> go r
Nevertheless, I was wondering whether we should have a monadic fold (foldrM and foldlM) which would process the elements in a given order (ascending and descending, analogously to foldr and foldl). From one point of view, we can implement foldrM and foldlM using foldr and foldl,
Sure, foldrM is typically implemented in terms of foldl and foldlM is typically implemented in terms of foldr. Do the usual definitions like that leak on a Map? foldrM :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m bfoldrM f z0 xs = foldl f' return xs z0 where f' k x z = f x z >>= k foldlM :: (Foldable t, Monad m) => (a -> b -> m a) -> a -> t b -> m afoldlM f z0 xs = foldr f' return xs z0 where f' x k z = f z x >>= k nevertheless using linear heap space complexity compared to constant
heap space complexity we can achieve with specialized implementations. This is the same situation as traverseWithKey_ -- we can implement it using traverseWithKey, but the heap space complexity increases.
traverseWithKey_ would normally be implemented with an appropriate newtype and foldMapWithKey, rather than traverseWithKey. Does that also leak? -Edward

Hi Edward,
-----Original message----- From: Edward Kmett
Sent: 6 Aug 2013, 14:26 On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka
wrote: Hi Edward,I am not suggesting we should change the behaviour of existing functions and traverseWithKey_ should definitely use the same order as traverseWithKey. Changing semantics without changing type signatures is really suspicious and usually plainly wrong.
I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r
... which visits the key/value pairs out of order unlike, say:
go (Bin _ k v l r = go l *> f k v *> go r
Oh, yes, we will definitely use the definition you suggest.
Nevertheless, I was wondering whether we should have a monadic fold (foldrM and foldlM) which would process the elements in a given order (ascending and descending, analogously to foldr and foldl). From one point of view, we can implement foldrM and foldlM using foldr and foldl,
Sure, foldrM is typically implemented in terms of foldl and foldlM is typically implemented in terms of foldr.
Do the usual definitions like that leak on a Map?
It is difficult to say whether it is a 'leak'. These methods (they are the same as Foldable.foldrM and Foldable.foldlM) heap-allocate space linear in the size of the map (to create the closures). When implemented directly, they do not heap-allocate.
foldrM :: (Foldable t, Monad m) => (a -> b -> m b) -> b -> t a -> m bfoldrM f z0 xs = foldl f' return xs z0 where f' k x z = f x z >>= k
foldlM :: (Foldable t, Monad m) => (a -> b -> m a) -> a -> t b -> m afoldlM f z0 xs = foldr f' return xs z0 where f' x k z = f z x >>= k
nevertheless using linear heap space complexity compared to constant
heap space complexity we can achieve with specialized implementations. This is the same situation as traverseWithKey_ -- we can implement it using traverseWithKey, but the heap space complexity increases.
traverseWithKey_ would normally be implemented with an appropriate newtype and foldMapWithKey, rather than traverseWithKey. Does that also leak?
That does not leak, as Shachaf Ben-Kiki pointed out. That is one of the reasons why this discussion is so long :) BTW, Foldable.traverse_ also heap-allocates space linear in the size of the map, because it is defined as traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f () traverse_ f = foldr ((*>) . f) (pure ()) Maybe it would be better to define it using foldMap + the appropriate newtype? Then it would not heap-allocate, at least for Data.Map. Cheers, Milan

On Tue, Aug 6, 2013 at 4:19 PM, Milan Straka
Hi Edward,
-----Original message----- From: Edward Kmett
Sent: 6 Aug 2013, 14:26 On Tue, Aug 6, 2013 at 2:09 PM, Milan Straka
wrote: Hi Edward,I am not suggesting we should change the behaviour of existing functions and traverseWithKey_ should definitely use the same order as traverseWithKey. Changing semantics without changing type signatures is really suspicious and usually plainly wrong.
I wholeheartedly agree. =) I was just basing that on the code Ryan posted:
traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r
... which visits the key/value pairs out of order unlike, say:
go (Bin _ k v l r = go l *> f k v *> go r
Oh, yes, we will definitely use the definition you suggest.
Nevertheless, I was wondering whether we should have a monadic fold (foldrM and foldlM) which would process the elements in a given order (ascending and descending, analogously to foldr and foldl). From one point of view, we can implement foldrM and foldlM using foldr and foldl,
Sure, foldrM is typically implemented in terms of foldl and foldlM is typically implemented in terms of foldr.
Do the usual definitions like that leak on a Map?
It is difficult to say whether it is a 'leak'. These methods (they are the same as Foldable.foldrM and Foldable.foldlM) heap-allocate space linear in the size of the map (to create the closures). When implemented directly, they do not heap-allocate.
Ick. I hadn't walked through the expansion of those by hands and had admittedly just hoped they worked out of the box. That means the similar combinators we have for them in lens probably also leak. =( This indicates to me we may want to bite the bullet and move foldlM and foldrM into the Foldable class. As hideous as that is, the unmitigable space leak strikes me as worse.
traverseWithKey_ would normally be implemented with an appropriate newtype and foldMapWithKey, rather than traverseWithKey. Does that also leak?
That does not leak, as Shachaf Ben-Kiki pointed out. That is one of the reasons why this discussion is so long :)
BTW, Foldable.traverse_ also heap-allocates space linear in the size of the map, because it is defined as traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f () traverse_ f = foldr ((*>) . f) (pure ()) Maybe it would be better to define it using foldMap + the appropriate newtype? Then it would not heap-allocate, at least for Data.Map.
Definitely. I'll talk to Shachaf and craft a suitable proposal to fix up traverse_. -Edward

Sounds neat! Is nested CPR analysis on a branch?
Not yet.. on my laptop only so far.
I've lost context on this traverseWithKey_ thread. If you or Milan need my help, could you start again, stating the problem with a standalone test case? I gather from Milan that may there isn't a problem, or at least not a problem that GHC can reasonably solve. If there's nothing to follow up, no need to reply. Thanks
Simon
From: Ryan Newton [mailto:rrnewton@gmail.com]
Sent: 06 August 2013 14:51
To: Sjoerd Visscher
Cc: Milan Straka; Simon Peyton-Jones; Haskell Libraries
Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
On Sat, Aug 3, 2013 at 5:31 PM, Simon Peyton-Jones
Fixing this involves *nested* CPR analysis, which I am working on at the moment. Sounds neat! Is nested CPR analysis on a branch? This one I do not understand. Could you pull out the two alternative ways of phrasing this algorithm into a standalone form, in which one allocates more than t'other? Then I could investigate more easily. Milan pasted the relevant code (thanks) for traverseWithKey_ vs. foldWithKey which I reattach at the bottom of this email. On Sun, Aug 4, 2013 at 7:48 AM, Sjoerd Visscher
mailto:sjoerd@w3future.com> wrote: Would it help if traverse_ was implemented with foldMap using the Traverse_ newtype? In which case maybe we should fix Data.Foldable!
That sounds good to me! And straightforward. Any objections? Milan, why does it bother you that there is no specified order? I'm perfectly happy as long as its deterministic on a particular machine. (I have never been sure whether pure code in Haskell must be deterministic across multiple machines... numCapabilities was a counter example for a long time.) Aren't we already used to using Data.Map.toList and Data.Set.toList where order is not specified? Further, other languages (like Scheme) have maps/folds that do not specify order of side effects. -Ryan P.S.: Relevant code: traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t () traverseWithKey_ f = go where go Tip = pure () go (Bin _ k v l r) = f k v *> go l *> go r foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b foldrWithKey f z = go z where go z' Tip = z' go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l and we call them like this: let actionTraverse _k 1000000 = putStrLn "Hit 1000000" actionTraverse _k _v = return () in traverseWithKey_ actionTraverse map and this: let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z actionFoldr _k _v z = z in foldrWithKey actionFoldr (pure ()) map

This could be implemented very easily with an appropriate Monoid with
foldMapWithKey. I put in a proposal for adding it (and a ticket on
containers) that received a couple of +1s, but I confess I never followed
up on it with Milan and Johan after the proposal period ended.
A simpler construction that works today for this case would be to just use
foldrWithKey or foldlWithKey.
-Edward
On Tue, Jul 2, 2013 at 9:58 AM, Ryan Newton
If there's not an existing way to do this, I suppose this is a minor proposal.
When writing monadic code that consumes a container (Set, Map) etc, I often find that I just want what iterators in imperative languages provide: that is, to iterate over the contents of a large container, performing monadic effects along the way, but without allocating.
The Foldable instance for Data.Map almost gives you what you want, but it only exposes the values, not their keys. The "traverseWithKey" function is close too, but it builds a new Map as output:
traverseWithKey :: Applicativehttp://../base-4.6.0.1/Control-Applicative.html#t:Applicative t => (k -> a -> t b) -> Map http://Data-Map-Lazy.html#t:Map k a -> t (Maphttp://Data-Map-Lazy.html#t:Map k b)
So in practice what I do, and what I assume others do is this:
mapM_ f (toList myMap)
And I don't see any fusion rules that would ameliorate this (they seem limited to pure code), so I'm betting I always pay the cost of conversion to a list.
In the case of Data.Map* the proposal is simply a "traverseWithKey_" that returns "t ()".* This follows the suffixing convention of mapM/mapM_ etc.
More generally, it would be nice to do an audit of all popular containers with the assumption of very large collections that cant' be frivolously copied.
-Ryan
P.S. Actually, there are a number of places where the standard libraries could do with a more complete set of "_" functions -- e.g. it always chafes to not have "atomicModifyIORef_", which would apply a (a->a) function like a regular modifyIORef.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi Edward,
-----Original message----- From: Edward Kmett
Sent: 2 Jul 2013, 14:51 This could be implemented very easily with an appropriate Monoid with foldMapWithKey. I put in a proposal for adding it (and a ticket on containers) that received a couple of +1s, but I confess I never followed up on it with Milan and Johan after the proposal period ended.
Oh, I completely forgot. I went through the last year's mailboxes and found it. Merging now :) Cheers, Milan
participants (10)
-
Andreas Abel
-
Edward Kmett
-
Henning Thielemann
-
John Lato
-
Milan Straka
-
Roman Cheplyaka
-
Ryan Newton
-
Shachaf Ben-Kiki
-
Simon Peyton-Jones
-
Sjoerd Visscher