New subject: Proposal: add laziness to Data.Map / IntMap

5 Aug 2008

      Hi Bertram

Scott Dillard wrote:
...
...
I think maybe you guys (Don and Andrew) are misunderstanding my proposal.
The lazy/strict tradeoff is a subtle issue, and I'll be sure to re-read
Okasaki's stuff with this in mind, but what I'm talking about here is not
a
trade off. It's laziness "for free". Move the strictness annotations out
of
the constructors and into the library functions using 'seq'. Laziness is
exposed through _separate_ functions.
It's not for free. When the compiler does a pattern match on the Bin
constructor, Bin sz kx x l r, it can no longer assume that l and r are
fully evaluated, so it has to add code to evaluate them in case they are
not. And in fact, this code will be needed if any of your proposed lazy
functions are added later. I have not checked whether this has a
measurable performance or code size impact.
...
So mapWithKey retains all semantics, including guarantees.
Semantically the change is safe, agreed.
You're right of course (hadn't thought about that) but I'd like to point out
that Adrien Hey's AVL tree library is structured in the way I propose: A
node's subtree fields are not marked strict, instead seq is used to force
construction of the tree. I think it's generally faster than Data.Map, at
least it's claimed to be so in the documentation. The trees are not built
with the same algorithm, but if the use of seq instead of bangs does impose
a significant overhead, then that speaks very highly to the AVL algorithm.

This of course raises the question of why I don't just use that library. I'm
checking it out, but it's quite a bit bigger than Data.Map. The thing I like
about Data.Map is that it's relatively simple and I can see whats going on
very quickly, enough to make modifications (which I'm about to do.) Reading
Data.Tree.AVL is a more daunting task.

Also, there's something to be said for "lazy by default." I don't see much
difference between Data.List.map and Data.Map.map (for finite lists anyway)
and so if laziness is the correct default for the former then it should also
be for the latter. The current Data.Map.map is only half strict: it builds
an entire tree and fills it with thunks. Better I think to either build the
whole thing completely or build only what's needed. The current Data.Map.map
is the worst of both, no laziness + heap burn.

So I guess I'll start writing. Anyone have any good benchmarks? Which naming
scheme is less ugly, Data.Map.mapWithKeyL or Data.Map.Lazy.mapWithKey? Any
other suggestions would be appreciated.

Scott

Re: Proposal: add laziness to Data.Map / IntMap

Scott Dillard

Sittampalam, Ganesh

Adrian Hey

Bertram Felgenhauer

Jamie Brandon

tags

participants (5)