Ok, here's a little test to confirm what happens when you try to use foldrWithKey for this.
import Control.DeepSeq
import GHC.Stats
import qualified Data.Map.Strict as M
import Data.Time.Clock
import Control.Exception
import System.Mem
main :: IO ()
main = do
t0 <- getCurrentTime
let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])
evaluate$ rnf m0
t1 <- getCurrentTime
performGC
s1 <- getGCStats
putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show s1++"\n"
let fn 500000 v = putStrLn "Got it!"
fn _ _ = return ()
M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0
t2 <- getCurrentTime
performGC
s2 <- getGCStats
putStrLn$"Consumed map in "++show (diffUTCTime t2 t0)++"\n "++ show s2++"\n"
putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s2 - bytesAllocated s1)
return ()
--------------------------------------------------------
And here is the loop ("go10") generated "fn":
Ok, empirically, in -O2, the consumption phase allocates 32MB additional data (for a 1 million element map), and in -O0 it allocates 200MB. Here's the recursive case of the loop:
((\ (eta_B1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case x1_s2rr of _ {
__DEFAULT ->
((go10_r2uL z'_a10K r_a10S)
`cast` (<GHC.Types.NTCo:IO <()>>
~#
(GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, () #))))
eta_B1;
Ok, so I didn't yet look at the STG where there's a clear allocation story. So, actually, I'm not sure if this recursive case forces GHC to build some O(1M) first class representation of the IO action here, since eta is abstract? At least, I'm assuming that's what the 32Mb-200Mb allocated is (though 32MB is actually rather skimpy for such a thing... it would have to spend only 4 words per closure...)