I believe implementing traverseWithKey_ using a foldWithKey is
quite natural (you want to go over the elements and do something with
them, in this case perform some action on them). I expected it to work
and I am surprised it does not. So for me it is the other way around --
I have a function which I expected to behave nice as a fold and it does
not

Re: expectations.  You don't get a funny feeling when monadic values are used as first class rather than second class ;-)?  Whether in the accumulator of a fold, or in cases like this:

  do act <- f x
       act

My expectation was that the optimizer would have more trouble.  But maybe that's off base.

 Also, heap
allocation is quite cheap in Haskell, sometimes faster implementations
of containers methods allocate more memory (but we usually do not use
them and prefer less allocation). Wanting to avoid allocation sometimes
causes to avoid higher order functions and advanced functional
techniques, which is also not ideal.

Yes, my situation, which I agree is a minority position, is that I work on and care about parallelism.  The current state of GHC is that it does NOT have a scalable GC, and virtually every parallel program that contains "idiomatic Haskell" levels of allocation fails to scale rather quickly as you increase threads.  I.e. you quickly get to productivities of less than 50%.  (Not necessarily at 32 threads either, often you are up against this wall at 8 or less.)

So for the time being good parallel Haskell programs are low-allocating parallel Haskell programs.