Data.HashMap: Strict or lazy by default?

Hi all, I'm about to release my new unordered-containers library, which includes a Data.HashMap module. I have one remaining design issue I'd like to get some feedback on. Here are the constraints: * There are a number of functions that need both lazy and strict versions. Example: insertWith * There are a number of functions that don't need a strict version, but where having a strict version would help avoid space leaks and support the common strict use case. Example: insert * A particular use case for a map rarely uses both the lazy and strict versions (suggesting a module split). * Keeping both the strict and lazy versions in the same module makes the module very large and users typically want to see only half of the functions in there (the ones that fit their current use case). My current thinking is to provide lazy and strict versions in different modules. The two modules could still share the same data type. The question is, what should be the default? Haskell is a lazy language but the most common use cases for maps are strict (e.g. a map from strings to integer counters). I intend to to use one of three possible module layouts: Option 1: Data.HashMap (strict) Data.HashMap.Lazy Option 2: Data.HashMap (lazy) Data.HashMap.Strict Option 3: Data.HashMap (re-export one of the two modules below) Data.HashMap.Lazy Data.HashMap.Strict I'd like to see some arguments for and against each of the two possible defaults (lazy or strict). Also, real life examples of cases where lazy maps are useful would be appreciated. Cheers, Johan

On 18 February 2011 11:51, Johan Tibell
Option 1: Data.HashMap (strict) Data.HashMap.Lazy
This would probably work the best, especially as it's the behaviour already taken by ByteString, Text, etc. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Thu, Feb 17, 2011 at 8:15 PM, Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
On 18 February 2011 11:51, Johan Tibell
wrote: Option 1: Data.HashMap (strict) Data.HashMap.Lazy
This would probably work the best, especially as it's the behaviour already taken by ByteString, Text, etc.
However, it is quite different than the behavior of Data.Map, which is a much closer analogue. Bytestring and Text's strict and lazy versions are radically different structures from one another. Here the miniscule differences between strict and lazy WriterT, StateT, etc. is probably a much closer analogue. There they use the separate data types to reflect the fact that you have different operations that you want classes to provide (like the strict map in the above discussion). Here I'd probably look more towards the mtl and transformers as a guideline, as it has been balancing tensions between the two world views with regards to maximizing laziness for a long time. That would steer towards Option 3, likely with being lazy in the value being the default exported from Data.HashMap. But even that analogy is flawed, because there the only difference is the use of a few irrefutable patterns to avoid a couple of bottoms where possible. Here the difference in strictness is much more dramatic as it causes you to fail even the *Functor* laws and I for one would really like to see these be Foldable/Traversable, etc. Making the map strict in the value (at least for map, traverse, mapM) is pretty ugly because it changes the set of things that it can contain by making it an error to try to stuff an error, undefined, or potentially bottom value into the map. Making the map strict in its internal structure never introduces non-termination into your code, and is almost always a win because of the decreased amount of indirect jumps made by the STG. Making the map strict in the values shrinks the size of the range of the map unnaturally, and makes it veer radically from the model of a function from k -> Maybe v. -Edward Kmett

Edward, Thanks for your thorough reply. Note that we can still have the actual data type be lazy in the values and thus have a Functor instance etc. Strictness can be added at the level of functions. Knowing that we can have all the lazy goodies the question is: what should be the default API in terms of strictness and what should the module layout look like (e.g. where should we put the lazy and strict insert insertWith functions?) Johan

NB I believe that it is possible to make a lazy-valued Map (IntMap, HashMap, etc...) from a strict one via: data Lazy a = Lazy a newtype LazyMap k v = StrictMap k (Lazy v) So in principle (and I believe this is only useful for key-strict structures) the strict map is more "universal". A separate lazy implementation mostly saves us a layer of indirection. -Jan-Willem Maessen

On Thu, Feb 17, 2011 at 10:06 PM, Jan-Willem Maessen
NB I believe that it is possible to make a lazy-valued Map (IntMap, HashMap, etc...) from a strict one via:
data Lazy a = Lazy a
newtype LazyMap k v = StrictMap k (Lazy v)
So in principle (and I believe this is only useful for key-strict structures) the strict map is more "universal". A separate lazy implementation mostly saves us a layer of indirection.
A reasonable point, but it seems to me you don't save a layer of indirection. Regardless of if the underlying map is lazy or strict it still has the same indirection to its elements: Node -> value, and here it becomes Node -> Lazy -> value, and since you can't unbox polymorphically, your argument actually cuts the other way, by adding a layer for the lazy user. To elide the layer of indirection and unpack the value into the node constructor you'd need an 'adaptive-containers' or 'unboxed-containers' -style strict map. What you do gain with your approach is potentially the ability to elide an indirect jump or two due to the known strictness of the case scrutinee, but I have no idea if the magnitude of that effect would warrant increasing the memory footprint for lazy users, given that the other benefits of a strict-by-default map in terms of avoiding space leaks, etc. can all be had by supplying a second set of combinators. -Edward -Jan-Willem Maessen

On Thu, Feb 17, 2011 at 10:46 PM, Edward Kmett
On Thu, Feb 17, 2011 at 10:06 PM, Jan-Willem Maessen
wrote: NB I believe that it is possible to make a lazy-valued Map (IntMap, HashMap, etc...) from a strict one via:
data Lazy a = Lazy a
newtype LazyMap k v = StrictMap k (Lazy v)
So in principle (and I believe this is only useful for key-strict structures) the strict map is more "universal". A separate lazy implementation mostly saves us a layer of indirection.
A reasonable point, but it seems to me you don't save a layer of indirection. Regardless of if the underlying map is lazy or strict it still has the same indirection to its elements: Node -> value, and here it becomes Node -> Lazy -> value, and since you can't unbox polymorphically, your argument actually cuts the other way, by adding a layer for the lazy user.
Er, I think we're violently agreeing here, which probably means I didn't state myself clearly enough. You want a specialized type rather than just a StrictMap k (Lazy v) because the latter introduces a layer of indirection at every leaf. But it does suggest how to quickly get a *working* pair of strict & lazy maps: build the best possible strict map, *then* specialize. -Jan

On Thu, Feb 17, 2011 at 9:37 PM, Johan Tibell
Edward,
Thanks for your thorough reply. Note that we can still have the actual data type be lazy in the values and thus have a Functor instance etc. Strictness can be added at the level of functions. Knowing that we can have all the lazy goodies the question is: what should be the default API in terms of strictness and what should the module layout look like (e.g. where should we put the lazy and strict insert insertWith functions?)
I'm relieved to hear you say that. =) Without seeing the code, without the (!) annotations, while you'll get the same control over space leaks, the strictness analyzer won't really derive any knowledge about the evaluatedness of a retrieved value, so the emitted code may differ slightly, but I doubt it'd be a measurable difference. I'd vote for Data.HashMap.Lazy and Data.HashMap.Strict with Data.HashMap just re-exporting Data.HashMap.Lazy. That way someone can just swap imports from Data.Map to Data.HashMap and not be surprised by strictness changes and it mirrors the behavior of mtl/transformers and similar libraries (even my comonad-transformers ;) ). Since there is no structural difference, and the instances could be shared you could get away with a single common data type that you export through both modules. This would break a bit from the traditions of these other packages, but it'd let users sprinkle a little strictness (or conversely, laziness) through their code as needed with qualified Strict.map, Strict.insertWith, etc's or change imports to swap wholesale. The price is they couldn't build separate instances for strict vs. lazy Data.Hashmaps, the benefit is that they don't _have_ to build separate instances for strict vs. lazy Data.Hashmaps, and that they could mix and match methods between them. -Edward Kmett

I should have mentioned that I'm talking about being lazy or strict in the values. The data structure is already strict in the keys. Johan

Excerpts from Johan Tibell's message of Thu Feb 17 19:51:28 -0500 2011:
I'd like to see some arguments for and against each of the two possible defaults (lazy or strict). Also, real life examples of cases where lazy maps are useful would be appreciated.
My favorite example of lazy maps being useful is the case of dynamic programming, where the particular value you are attempting to compute only uses a small subset of the true "table". You can then actually write out the map (which refers to itself) and "initialize" it with all values, but not actually calculate all of them. Less efficient in some cases? Probably. But it's hella elegant. Cheers, Edward

On Thu, Feb 17, 2011 at 4:51 PM, Johan Tibell
My current thinking is to provide lazy and strict versions in different modules. The two modules could still share the same data type. The question is, what should be the default? Haskell is a lazy language but the most common use cases for maps are strict (e.g. a map from strings to integer counters).
What about class instances? With a shared data type we can only provide one set of instances, probably the lazy ones. Assuming we used to separate data types, can we even define instances for all common classes (e.g. Functor, Foldable, Monoid) or will the strict versions violate some of the laws for these classes? Johan

On Fri, Feb 18, 2011 at 3:32 PM, Johan Tibell
On Thu, Feb 17, 2011 at 4:51 PM, Johan Tibell
wrote: My current thinking is to provide lazy and strict versions in different modules. The two modules could still share the same data type. The question is, what should be the default? Haskell is a lazy language but the most common use cases for maps are strict (e.g. a map from strings to integer counters).
What about class instances? With a shared data type we can only provide one set of instances, probably the lazy ones. Assuming we used to separate data types, can we even define instances for all common classes (e.g. Functor, Foldable, Monoid) or will the strict versions violate some of the laws for these classes?
Hrmm, the strict version will violate the laws for Functor: (fmap (const 12) . fmap error) /= fmap (const 12 . error) and strictness is irrelevant for Monoid and Foldable as the former would be grafting trees that already contain forced values, and the Foldable instance only evaluates hashmaps, it doesn't produce them. Traversable doesn't have laws explicitly stated, but the fmapDefault it generates would violate the Functor laws given a strict map, indicating something isn't kosher with a strict traversable either. So it would seem at first glance, no (valid) instances are affected for stock classes. Now, I have some more ad-hoc methods in my 'keys' package which might have their behavior affected by the strict/lazy divide since I don't provide a full suite of laws for them (they provide some generic insert, adjust, etc. properties that i wanted for manipulating representable functors). But, if you split the types, you probably still want to allow users to splash a bit of laziness/strictness in turn, so it would seem you'd wind up with 2 types, 4 modules, and a lot of boilerplate. Ick. -Edward Kmett Johan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 02/18/11 18:21, Edward Kmett wrote:
On Fri, Feb 18, 2011 at 3:32 PM, Johan Tibell
wrote: On Thu, Feb 17, 2011 at 4:51 PM, Johan Tibell
wrote: My current thinking is to provide lazy and strict versions in different modules. The two modules could still share the same data type. The question is, what should be the default? Haskell is a lazy language but the most common use cases for maps are strict (e.g. a map from strings to integer counters).
What about class instances? With a shared data type we can only provide one set of instances, probably the lazy ones. Assuming we used to separate data types, can we even define instances for all common classes (e.g. Functor, Foldable, Monoid) or will the strict versions violate some of the laws for these classes?
Hrmm, the strict version will violate the laws for Functor:
(fmap (const 12) . fmap error) /= fmap (const 12 . error)
That's an issue, though. It means you can't use a Map that tries to keep its values always strict if you want to use class Functor with it. *Even if* it's a separate type in Data.SomeMap.Strict. (Unless you're willing to break the law in the presence of _|_ values.) -Isaac

On Fri, Feb 18, 2011 at 6:36 PM, Isaac Dupree < ml@isaac.cedarswampstudios.org> wrote:
On 02/18/11 18:21, Edward Kmett wrote:
Hrmm, the strict version will violate the laws for Functor:
(fmap (const 12) . fmap error) /= fmap (const 12 . error)
That's an issue, though. It means you can't use a Map that tries to keep its values always strict if you want to use class Functor with it. *Even if* it's a separate type in Data.SomeMap.Strict. (Unless you're willing to break the law in the presence of _|_ values.)
That was pretty much my point. I think, as Jan-Willem put it, we are in 'violent agreement'. =) -Edward

On 2/18/11 6:36 PM, Isaac Dupree wrote:
On 02/18/11 18:21, Edward Kmett wrote:
Hrmm, the strict version will violate the laws for Functor:
(fmap (const 12) . fmap error) /= fmap (const 12 . error)
That's an issue, though. It means you can't use a Map that tries to keep its values always strict if you want to use class Functor with it. *Even if* it's a separate type in Data.SomeMap.Strict. (Unless you're willing to break the law in the presence of _|_ values.)
FWIW, it's still a (category theoretic) functor: (fmap (const 12) . fmap error) == fmap (const 12 .! error) where (f .! g) x = f $! g x ...it's just not an endofunctor. But the Functor class is for encoding endofunctors, so this detail doesn't help us much. At least, not directly. -- Live well, ~wren
participants (7)
-
Edward Kmett
-
Edward Z. Yang
-
Isaac Dupree
-
Ivan Lazar Miljenovic
-
Jan-Willem Maessen
-
Johan Tibell
-
wren ng thornton