Oops, that was sloppy.  Yes, your version does get the job done without allocating.  The corrected test is attached to this email.

The lens interface does look quite full featured!  And it's nice to see that it consistently includes '_' variants.  I cite that as additional evidence for the norm ;-).

Personally, I still want traverseWithKey_ for convenience, especially because the solution you used is non-obvious.  I imagine many Data.Map users would not come up with it (as the rest of us on this thread didn't).

Best,
  -Ryan

-----------------------------
import Control.Applicative
import Control.Monad
import Control.DeepSeq
import Control.Exception
import GHC.Stats
import qualified Data.Map.Strict as M
import Data.Time.Clock
import Data.Monoid
import System.Mem
import System.Environment

main :: IO ()
main = do
  args <- getArgs
  let size = case args of
              []  -> 1000000::Int
              [n] -> read n
  let m0 = M.fromList (map (\i -> (i,i)) [1..size])
  let fn 500000 v = putStrLn "Got it!"
      fn _      _ = return ()
--  let fn i v = putStrLn$"fn: "++show(i,v)
  
  st <- getCurrentTime      
  evaluate$ rnf m0 
  en <- getCurrentTime
  performGC
  s1 <- getGCStats  
  putStrLn$"Constructed map in "++show (diffUTCTime en st)++"\n "++ show s1++"\n"

  ------------------------------------------------------------
  -- Regular traverseWithKey uses 48MB
  -- traverseWithKey_ uses 200K of allocation:
  st <- getCurrentTime 
  M.traverseWithKey_ fn m0
  en <- getCurrentTime
  performGC
  s2 <- getGCStats 
  putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s2++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s2 - bytesAllocated s1)

  ------------------------------------------------------------
  -- foldrWithKey uses 32MB allocation:
  st <- getCurrentTime  
  M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0
  en <- getCurrentTime
  performGC
  s3 <- getGCStats 
  putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s3++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s3 - bytesAllocated s2)

  ------------------------------------------------------------
  -- An alternate version was proposed by Shachaf Ben-Kiki:
  st <- getCurrentTime
  traverseWithKey_ fn m0  
  en <- getCurrentTime
  performGC
  s4 <- getGCStats 
  putStrLn$"[alternate traverseWithKey_] Consumed map in "++show (diffUTCTime en st)++"\n "++ show s4++"\n"
  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s4 - bytesAllocated s3)
  return ()


foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r
foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))

-- Since the Applicative used is Const (newtype Const m a = Const m), the
-- structure is never built up.
--(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows:

newtype Traverse_ f = Traverse_ { runTraverse_ :: f () }

instance Applicative f => Monoid (Traverse_ f) where
  mempty = Traverse_ (pure ())
  Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b)

traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f ()
traverseWithKey_ f = runTraverse_ .
                     foldMapWithKey (\k x -> Traverse_ (void (f k x)))




On Mon, Jul 29, 2013 at 3:08 PM, Shachaf Ben-Kiki <shachaf@gmail.com> wrote:
On Mon, Jul 29, 2013 at 11:47 AM, Ryan Newton <rrnewton@gmail.com> wrote:
> Shachaf, I checked and Milan's commits that improve traverseWithKey were
> already incorporated when I ran my tests above.  The extra speedup is good
> but doesn't change the O(1) vs. O(N) allocation situation.

Wait, are you saying you couldn't get it to work in constant memory at
all without modifying containers? I didn't actually look at the
benchmarks in much detail before. I just looked at the code -- it
looks like in your latest posted benchmark you don't actually use the
alternate traverseWithKey_ anywhere -- instead you use foldrWithKey
twice (the perils of not compiling with -Wall!). I just ran the
benchmark with the alternate version and it looks like it uses
constant memory.

Maybe I'm misunderstanding. I'm not against adding this to containers,
but it should be clear with such a patch whether it's being added for
necessity or convenience. It looks to me like it's the latter.

(By the way: lens also provides this function, with the name
"itraverse_" (i for indexed). I tried it and it looks like it also
uses constant memory.)

    Shachaf