Thanks, that's interesting to know (re: Fortress).

Interestingly, in my Fortress days we looked at both using a split-like interface and at a more foldMap / reduce - like interface, and it seemed like the latter worked better – it requires a lot less boilerplate for controlling recursion, and better matches the fanout of whatever structure you're actually using underneath.

Ok, we'll have to try that.  I may be underestimating the power of a newtype and a monoid instance to expose the structure..  I was wrong about this before [1].  Here's the foldMap instance for Data.Map:
  foldMap _ Tip = mempty
  foldMap f (Bin _ _ v l r) = Foldable.foldMap f l `mappend` f v `mappend` Foldable.foldMap f r
Simon Marlow in his recent Haxl talk also had a domain where they wanted a symmetric (non-monadic) parallel spawn operation...
But it remains pretty hard for me to reason about the operational behavior of these things... especially since foldMap instances may vary.  
Thanks,
   -Ryan
[1] For example, here is a non-allocating traverseWithKey_ that I failed to come up with:
   
-- Version of traverseWithKey_ from Shachaf Ben-Kiki
-- (See thread on Haskell-cafe.)
-- Avoids O(N) allocation when traversing for side-effect.

newtype Traverse_ f = Traverse_ { runTraverse_ :: f () }
instance Applicative f => Monoid (Traverse_ f) where
  mempty = Traverse_ (pure ())
  Traverse_ a `mappend` Traverse_ b = Traverse_ (a *> b)
-- Since the Applicative used is Const (newtype Const m a = Const m), the
-- structure is never built up.
--(b) You can derive traverseWithKey_ from foldMapWithKey, e.g. as follows:
traverseWithKey_ :: Applicative f => (k -> a -> f ()) -> M.Map k a -> f ()
traverseWithKey_ f = runTraverse_ .
                     foldMapWithKey (\k x -> Traverse_ (void (f k x)))
foldMapWithKey :: Monoid r => (k -> a -> r) -> M.Map k a -> r
foldMapWithKey f = getConst . M.traverseWithKey (\k x -> Const (f k x))