Hi Bertram

Scott Dillard wrote:
> I think maybe you guys (Don and Andrew) are misunderstanding my proposal.
> The lazy/strict tradeoff is a subtle issue, and I'll be sure to re-read
> Okasaki's stuff with this in mind, but what I'm talking about here is not a
> trade off. It's laziness "for free". Move the strictness annotations out of
> the constructors and into the library functions using 'seq'. Laziness is
> exposed through _separate_ functions.

It's not for free. When the compiler does a pattern match on the Bin
constructor, Bin sz kx x l r, it can no longer assume that l and r are
fully evaluated, so it has to add code to evaluate them in case they are
not. And in fact, this code will be needed if any of your proposed lazy
functions are added later. I have not checked whether this has a
measurable performance or code size impact.

> So mapWithKey retains all semantics, including guarantees.

Semantically the change is safe, agreed.

You're right of course (hadn't thought about that) but I'd like to point out that Adrien Hey's AVL tree library is structured in the way I propose: A node's subtree fields are not marked strict, instead seq is used to force construction of the tree. I think it's generally faster than Data.Map, at least it's claimed to be so in the documentation. The trees are not built with the same algorithm, but if the use of seq instead of bangs does impose a significant overhead, then that speaks very highly to the AVL algorithm.

This of course raises the question of why I don't just use that library. I'm checking it out, but it's quite a bit bigger than Data.Map. The thing I like about Data.Map is that it's relatively simple and I can see whats going on very quickly, enough to make modifications (which I'm about to do.) Reading Data.Tree.AVL is a more daunting task.

Also, there's something to be said for "lazy by default." I don't see much difference between Data.List.map and Data.Map.map (for finite lists anyway) and so if laziness is the correct default for the former then it should also be for the latter. The current Data.Map.map is only half strict: it builds an entire tree and fills it with thunks. Better I think to either build the whole thing completely or build only what's needed. The current Data.Map.map is the worst of both, no laziness + heap burn.

So I guess I'll start writing. Anyone have any good benchmarks? Which naming scheme is less ugly, Data.Map.mapWithKeyL or Data.Map.Lazy.mapWithKey? Any other suggestions would be appreciated.

Scott