
After playing a bit with Ryan's benchmark, I no longer think that the order matters much for the total number of allocations. Nor do I believe in first-class vs non-first-class IO actions. All that should matter is how many allocations we can move to the stack. But I haven't yet figured out why exactly different versions differ so drastically in this regard.
Yeah, it's all rather different to predict in advance, isn't it? I tried your alternate foldrWithKey and I saw it heap allocating as well. Further, -O0 vs. -O2 can make a big difference. It's a little frustrating because for dealing efficiently with big data sets, especially in parallel. It would be nice to have big-O numbers in the docs for heap allocation as well as time cost -- and ones you could trust irrespective of optimize level. By the way, is traverse/traverseWithKey supposed to guarantee a specific order? The doc uses this code in the definition: traversehttp://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Trave... ((k, v) -> (,) k $http://hackage.haskell.org/packages/archive/containers/latest/doc/html/$ f k v) (toListhttp://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-... m) And I thought "toList" didn't guarantee anything (as opposed to toAscList)...