RFC: Should Data.IntMap.Strict be value-strict in the function arguments or the map itself

Hi all, As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness. There are two subtly different ways these APIs can be strict: 1. All the functions are strict in the value argument, or 2. The structure is strict in the value field. The difference shows up in functions like insertWith. For example, given insertWith (\ old _new -> old + 1) k v someMap should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.) If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e. insert k v someMap is strict in the key even if the map is empty (which wouldn't require us to look at the key). Thoughts? I'd love to see some design pros/cons of both approaches. -- Johan

I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.) Edward Excerpts from Johan Tibell's message of Thu Oct 27 12:56:10 -0400 2011:
Hi all,
As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness.
There are two subtly different ways these APIs can be strict:
1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.
The difference shows up in functions like insertWith. For example, given
insertWith (\ old _new -> old + 1) k v someMap
should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.)
If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e.
insert k v someMap
is strict in the key even if the map is empty (which wouldn't require us to look at the key).
Thoughts? I'd love to see some design pros/cons of both approaches.
-- Johan

On Thu, Oct 27, 2011 at 10:03 AM, Edward Z. Yang
I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.)
I'm not quite sure if I follow. Here's Data.IntMap.insert: insert :: Key -> a -> IntMap a -> IntMap a insert k x t = k `seq` case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> Tip k x Without the explicit `seq` insert undefined 1 Nil terminates but with the `seq` it's _|_. Are you saying it will eventually get force if someone e.g. does a lookup after the insert? -- Johan

Here's the definition of Tip. data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil So we effectively have: insert :: Key -> a -> IntMap a -> IntMap a insert k x t = case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> k `seq` Tip k x So it does not terminate in this case. Edward Excerpts from Johan Tibell's message of Thu Oct 27 13:13:56 -0400 2011:
On Thu, Oct 27, 2011 at 10:03 AM, Edward Z. Yang
wrote: I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.)
I'm not quite sure if I follow. Here's Data.IntMap.insert:
insert :: Key -> a -> IntMap a -> IntMap a insert k x t = k `seq` case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> Tip k x
Without the explicit `seq`
insert undefined 1 Nil
terminates but with the `seq` it's _|_.
Are you saying it will eventually get force if someone e.g. does a lookup after the insert?
-- Johan

On Thu, Oct 27, 2011 at 10:17 AM, Edward Z. Yang
Here's the definition of Tip.
data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil
You're right. I forgot the definition of the data type itself. I guess there's no function in the API currently that takes a key parameter and isn't strict in that parameter because the key is either * compared to another key, or * inserted into the map which both causes the key to be evaluated. -- Johan

On Thu, Oct 27, 2011 at 10:17 AM, Edward Z. Yang
wrote: Here's the definition of Tip.
data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil
You're right. I forgot the definition of the data type itself. I guess there's no function in the API currently that takes a key parameter and isn't strict in that parameter because the key is either
* compared to another key, or * inserted into the map
which both causes the key to be evaluated.
I believe delete undefined Nil does not terminate with current definition, but would terminate if the explicit seq would not be present. Cheers, Milan

On Thu, Oct 27, 2011 at 10:24 AM, Milan Straka
I believe
delete undefined Nil
does not terminate with current definition, but would terminate if the explicit seq would not be present.
Good point. I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments. -- Johan

I believe
delete undefined Nil
does not terminate with current definition, but would terminate if the explicit seq would not be present.
Good point.
I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments.
This is even a problem of current API. Although most of the functions are strict in keys, it is not mentioned at any place in the documentation. You are right that option 1 is easily explained and documented. Cheers, Milan

On Thu, Oct 27, 2011 at 10:35 AM, Milan Straka
I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments.
This is even a problem of current API. Although most of the functions are strict in keys, it is not mentioned at any place in the documentation.
That's mostly an oversight. After we agreed a while back that the API should be consistently key strict I intended to update the documentation, but never got around to it. -- Johan

Hi,
As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness.
There are two subtly different ways these APIs can be strict:
1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.
The difference shows up in functions like insertWith. For example, given
insertWith (\ old _new -> old + 1) k v someMap
should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.)
If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e.
insert k v someMap
is strict in the key even if the map is empty (which wouldn't require us to look at the key).
Thoughts? I'd love to see some design pros/cons of both approaches.
At first I thought there would be performance gain in 1., as the value could be passed unboxed. But now I believe it is not true -- the value type is never specialized as it is stored as a generic type all the time. Without the performance bias, the possibility 2. seems fine. BTW, 1. does not fully specify the behaviour -- for example, does insertWith' f k v map forces the result of (f old new) when the key is already present? Possibility 2. says 'yes', possibility 1. does not say anything. Maybe the 1. should be "the structure is strict in the value field and also all the functions are strict in the value argument". Although my opinion was different just a while ago, I am now slightly in favor of 2. Cheers, Milan

On Thu, Oct 27, 2011 at 10:33 AM, Milan Straka
BTW, 1. does not fully specify the behaviour -- for example, does insertWith' f k v map forces the result of (f old new) when the key is already present? Possibility 2. says 'yes', possibility 1. does not say anything. Maybe the 1. should be "the structure is strict in the value field and also all the functions are strict in the value argument".
Yes. Option 1 would both talk about any value passed to a function and to any value stored in the map. Option 1 is really as if Haskell was a strict language. -- Johan

Johan Tibell
There are two subtly different ways these APIs can be strict:
1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.
The problem that should be addressed by strict API is accumulation of unevaluated thunks that user of the API has no control over. Therefore strict API should solve stack-overflow issues and memory leaks. Taking that thought further...
The difference shows up in functions like insertWith. For example, given
insertWith (\ old _new -> old + 1) k v someMap
should 'v' be evaluated is the key 'k' is already in the map.
Here user has the control of thunks by, for example, doing:
insertWith (\ old new -> new `seq` (old + 1)) k v someMap
So the user has full control if v should be evaluated or not. There is no need to force v always and seems limiting application scenarios.
If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not...
Which IMHO is not a problem. For me always problem was the data that was already in the Map, but in unevaluated form. I vote for option 2 as it is the minimal option that solves my problem. (I have no comments about keys) -- Gracjan

Hi all, After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that * it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed. Milan, what are your current thoughts? * It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones. Cheers, Johan

I've finally come around to liking the slightly stricter implementation. If
someone _really_ needs the slightly lazy version they can usually check to
see if the map is null by hand, before calling in to the library, but in
the converse situation, you can't tell ghc "no, really, that argument has
already been seq'd so don't bother."
Now I have to hunt for this behavior in my _own_ libraries. Blech.
-Edward
On Thu, Nov 3, 2011 at 3:43 PM, Johan Tibell
Hi all,
After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that
* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.
Milan, what are your current thoughts?
* It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones.
Cheers, Johan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

I also prefer the fully strict version. From an application programmer
perspective, it makes the following reasoning valid:
If my Map is in WHNF, then all computations that it depends on have
been performed.
This makes hunting for space leaks a LOT easier in my opinion. This
options also allows applications like stateful servers to easily
extend the full strictness to their whole state, such that WHNF on
their state implies that there are no dangling computations left. I
imagine that, in many cases, this is a good implementation decision
for such applications.
best regards,
Simon
2011/11/3 Edward Kmett
I've finally come around to liking the slightly stricter implementation. If someone _really_ needs the slightly lazy version they can usually check to see if the map is null by hand, before calling in to the library, but in the converse situation, you can't tell ghc "no, really, that argument has already been seq'd so don't bother." Now I have to hunt for this behavior in my _own_ libraries. Blech. -Edward
On Thu, Nov 3, 2011 at 3:43 PM, Johan Tibell
wrote: Hi all,
After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that
* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.
Milan, what are your current thoughts?
* It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones.
Cheers, Johan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi Johan,
After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that
* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.
Milan, what are your current thoughts?
I agree with going with option one. It is a simple rule and it is consistent with the method-are-strict-in-keys behaviour. Cheers, Milan

On Fri, Nov 4, 2011 at 9:14 AM, Milan Straka
Hi Johan,
After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that
* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.
Milan, what are your current thoughts?
I agree with going with option one. It is a simple rule and it is consistent with the method-are-strict-in-keys behaviour.
I will go ahead and make the changes as soon as I find some time. -- Johan
participants (6)
-
Edward Kmett
-
Edward Z. Yang
-
Gracjan Polak
-
Johan Tibell
-
Milan Straka
-
Simon Meier