Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

2 Jul 2013

      Ok, here's a little test to confirm what happens when you try to use
foldrWithKey for this.

*--------------------------------------------------------*
*import Control.DeepSeq*
*import GHC.Stats*
*import qualified Data.Map.Strict as M*
*import Data.Time.Clock*
*import Control.Exception*
*import System.Mem*
*
*
*main :: IO ()*
*main = do*
*  t0 <- getCurrentTime*
*  let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])*
*  evaluate$ rnf m0 *
*  t1 <- getCurrentTime*
*  performGC*
*  s1 <- getGCStats  *
*  putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show
s1++"\n"*
*  let fn 500000 v = putStrLn "Got it!"*
*      fn _      _ = return ()*
*  *
*  M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0*
*  t2 <- getCurrentTime*
*  performGC*
*  s2 <- getGCStats *
*  putStrLn$"Consumed map in "++show (diffUTCTime t2 t0)++"\n "++ show
s2++"\n"*
*  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s2 -
bytesAllocated s1)*
*  return ()*
*--------------------------------------------------------  *

It's also at this Gist:
https://gist.github.com/rrnewton/5912513#file-maptest-hs

And here is the loop ("go10") generated "fn":
https://gist.github.com/rrnewton/5912513#file-maptest-ddump-simple-L214

Ok, empirically, in -O2, the consumption phase allocates 32MB additional
data (for a 1 million element map), and in -O0 it allocates 200MB.  Here's
the recursive case of the loop:

          ((\ (eta_B1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
              case x1_s2rr of _ {

                __DEFAULT ->
                  ((go10_r2uL z'_a10K r_a10S)
                   `cast` (>
                           :: GHC.Types.IO ()
                                ~#
                              (GHC.Prim.State# GHC.Prim.RealWorld
                               -> (# GHC.Prim.State#
GHC.Prim.RealWorld, () #))))
                    eta_B1;

Ok, so I didn't yet look at the STG where there's a clear allocation story.
 So, actually, I'm not sure if this recursive case forces GHC to build some
O(1M) first class representation of the IO action here, since eta is
abstract?  At least, I'm assuming that's what the 32Mb-200Mb allocated is
(though 32MB is actually rather skimpy for such a thing... it would have to
spend only 4 words per closure...)

On Tue, Jul 2, 2013 at 3:32 PM, Ryan Newton  wrote:
...
Hi all,
Thanks for the responses.  I want to go through and make sure I understand
these.
--------------------------------------------------------
First, Henning, won't both of these allocate in proportion to the size of
the map?
Map.foldrWithKey (\k a -> f k a >>) (return ())
    Foldable.sequence_ . Map.mapWithKey f
In particular, will the compiler be able to avoid allocating when building
up that large monadic computation in the foldrWithKey?
--------------------------------------------------------
Edward said to use foldMapWithKey, which hasn't been released yet, but it
sounds like it will be.
https://github.com/ekmett/containers/commit/40187f32a43689ff02ca2b97465aa4fc...
Even then, I might argue it is somewhat non-obvious to the programmers how
to use this to do a non-allocating "for-each".  For example, I am not
totally clear on what will happen with that tree of mappend calls -- will
it allocate thunks?  Further, IO is not a monoid, so am I to create an
instance of "Monoid (IO ())" in order to use foldMapWithKey to iterate over
the Map?
--------------------------------------------------------
On Tue, Jul 2, 2013 at 10:29 AM, Shachaf Ben-Kiki 
wrote:
 *> Is there a reason you couldn't implement this just as well
using traverseWithKey, à la
...
http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Le...
*
That function looks more overloaded than the traverse in Data.Map that I'm
referring to, e.g. here:
http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-St...
I'm afraid I don't understand the proposal then -- is it to use lens
somehow?  For the traversal I need to do over a Data.Map.Map, I need to fix
't' to be IO or Par or whatever I'm working with, so that the (k -> a -> t
b) function I'm passing in can do the effects I need.
To be specific I'm proposing providing these variants:
traverseWithKey :: **Applicative t => (k -> a -> t b) -> Map k a -> t
(Map k b)
   traverseWithKey_ :: **Applicative t => (k -> a -> t ()) -> Map k a ->
t ()
And without exposing the latter natively, I still don't understand how to
trick the former into not allocating, if that's the proposal.
-Ryan
On Tue, Jul 2, 2013 at 2:54 PM, Edward Kmett  wrote:
...
On Tue, Jul 2, 2013 at 2:45 PM, Henning Thielemann <
lemming@henning-thielemann.de> wrote:
...
Foldable.sequence_ . Map.mapWithKey f
which looks more elegant.
...
This has the unfortunate consequence that it builds an entire new map
strictly before sequencing it, due to the fact that Data.Map is
spine-strict. =(
-Edward