RFC: Should Data.IntMap.Strict be value-strict in the function arguments or the map itself

older
Proposal: Add NFData instances for...

Johan Tibell

27 Oct 2011 27 Oct '11

4:56 p.m.

Hi all, As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness. There are two subtly different ways these APIs can be strict: 1. All the functions are strict in the value argument, or 2. The structure is strict in the value field. The difference shows up in functions like insertWith. For example, given insertWith (\ old _new -> old + 1) k v someMap should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.) If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e. insert k v someMap is strict in the key even if the map is empty (which wouldn't require us to look at the key). Thoughts? I'd love to see some design pros/cons of both approaches. -- Johan

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Edward Z. Yang

27 Oct 27 Oct

5:03 p.m.

I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.) Edward Excerpts from Johan Tibell's message of Thu Oct 27 12:56:10 -0400 2011:

...

Hi all,

As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness.

There are two subtly different ways these APIs can be strict:

1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.

The difference shows up in functions like insertWith. For example, given

insertWith (\ old _new -> old + 1) k v someMap

should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.)

If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e.

insert k v someMap

is strict in the key even if the map is empty (which wouldn't require us to look at the key).

Thoughts? I'd love to see some design pros/cons of both approaches.

-- Johan

Johan Tibell

5:13 p.m.

On Thu, Oct 27, 2011 at 10:03 AM, Edward Z. Yang wrote:

...

I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.)

I'm not quite sure if I follow. Here's Data.IntMap.insert: insert :: Key -> a -> IntMap a -> IntMap a insert k x t = k `seq` case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> Tip k x Without the explicit `seq` insert undefined 1 Nil terminates but with the `seq` it's _|_. Are you saying it will eventually get force if someone e.g. does a lookup after the insert? -- Johan

Edward Z. Yang

5:17 p.m.

Here's the definition of Tip. data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil So we effectively have: insert :: Key -> a -> IntMap a -> IntMap a insert k x t = case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> k `seq` Tip k x So it does not terminate in this case. Edward Excerpts from Johan Tibell's message of Thu Oct 27 13:13:56 -0400 2011:

...

On Thu, Oct 27, 2011 at 10:03 AM, Edward Z. Yang wrote:

...
I'd like to remark that the situation with keys is not /quite/ comparable. Even if we don't need to look at the key for inserting into an empty map, it will get forced anyway because IntMaps are spine-strict (in particular, they are strict in their keys.)

I'm not quite sure if I follow. Here's Data.IntMap.insert:

insert :: Key -> a -> IntMap a -> IntMap a insert k x t = k `seq` case t of Bin p m l r | nomatch k p m -> join k (Tip k x) p t | zero k m -> Bin p m (insert k x l) r | otherwise -> Bin p m l (insert k x r) Tip ky _ | k==ky -> Tip k x | otherwise -> join k (Tip k x) ky t Nil -> Tip k x

Without the explicit `seq`

insert undefined 1 Nil

terminates but with the `seq` it's _|_.

Are you saying it will eventually get force if someone e.g. does a lookup after the insert?

-- Johan

Johan Tibell

5:22 p.m.

On Thu, Oct 27, 2011 at 10:17 AM, Edward Z. Yang wrote:

...

Here's the definition of Tip.

data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil

You're right. I forgot the definition of the data type itself. I guess there's no function in the API currently that takes a key parameter and isn't strict in that parameter because the key is either * compared to another key, or * inserted into the map which both causes the key to be evaluated. -- Johan

Milan Straka

5:24 p.m.

...

On Thu, Oct 27, 2011 at 10:17 AM, Edward Z. Yang wrote:

...
Here's the definition of Tip.

data IntMap a = Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(IntMap a) !(IntMap a) | Tip {-# UNPACK #-} !Key a | Nil

You're right. I forgot the definition of the data type itself. I guess there's no function in the API currently that takes a key parameter and isn't strict in that parameter because the key is either

* compared to another key, or * inserted into the map

which both causes the key to be evaluated.

I believe delete undefined Nil does not terminate with current definition, but would terminate if the explicit seq would not be present. Cheers, Milan

Edward Z. Yang

5:28 p.m.

Excerpts from Milan Straka's message of Thu Oct 27 13:24:35 -0400 2011:

...

I believe

delete undefined Nil

does not terminate with current definition, but would terminate if the explicit seq would not be present.

Good catch. Edward

Johan Tibell

5:28 p.m.

On Thu, Oct 27, 2011 at 10:24 AM, Milan Straka wrote:

...

I believe

delete undefined Nil

does not terminate with current definition, but would terminate if the explicit seq would not be present.

Good point. I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments. -- Johan

Milan Straka

5:35 p.m.

...

...
I believe

delete undefined Nil

does not terminate with current definition, but would terminate if the explicit seq would not be present.

Good point.

I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments.

This is even a problem of current API. Although most of the functions are strict in keys, it is not mentioned at any place in the documentation. You are right that option 1 is easily explained and documented. Cheers, Milan

Johan Tibell

5:51 p.m.

On Thu, Oct 27, 2011 at 10:35 AM, Milan Straka wrote:

...

...
I think option 2 is bad as it gives users no way to know which functions are strict in their value argument without knowing the implementation or us documenting every function with the strictness properties. We could do the latter but the API would be simpler if we just said that all functions are strict in their key and value arguments.

This is even a problem of current API. Although most of the functions are strict in keys, it is not mentioned at any place in the documentation.

That's mostly an oversight. After we agreed a while back that the API should be consistently key strict I intended to update the documentation, but never got around to it. -- Johan

Milan Straka

5:33 p.m.

Hi,

...

As discussed in an earlier (accepted) proposal we're adding Data.[Int]Map.{Strict,Lazy} APIs to make it easier for users to get the desired strictness.

There are two subtly different ways these APIs can be strict:

1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.

The difference shows up in functions like insertWith. For example, given

insertWith (\ old _new -> old + 1) k v someMap

should 'v' be evaluated is the key 'k' is already in the map. (Note that the value isn't used in the update function.)

If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not (as in the above example). We currently use option 1 for keys i.e.

insert k v someMap

is strict in the key even if the map is empty (which wouldn't require us to look at the key).

Thoughts? I'd love to see some design pros/cons of both approaches.

At first I thought there would be performance gain in 1., as the value could be passed unboxed. But now I believe it is not true -- the value type is never specialized as it is stored as a generic type all the time. Without the performance bias, the possibility 2. seems fine. BTW, 1. does not fully specify the behaviour -- for example, does insertWith' f k v map forces the result of (f old new) when the key is already present? Possibility 2. says 'yes', possibility 1. does not say anything. Maybe the 1. should be "the structure is strict in the value field and also all the functions are strict in the value argument". Although my opinion was different just a while ago, I am now slightly in favor of 2. Cheers, Milan

Johan Tibell

5:50 p.m.

On Thu, Oct 27, 2011 at 10:33 AM, Milan Straka wrote:

...

BTW, 1. does not fully specify the behaviour -- for example, does insertWith' f k v map forces the result of (f old new) when the key is already present? Possibility 2. says 'yes', possibility 1. does not say anything. Maybe the 1. should be "the structure is strict in the value field and also all the functions are strict in the value argument".

Yes. Option 1 would both talk about any value passed to a function and to any value stored in the map. Option 1 is really as if Haskell was a strict language. -- Johan

Gracjan Polak

31 Oct 31 Oct

2:35 p.m.

New subject: RFC: Should Data.IntMap.Strict be value-strict in the function arguments or the map itself

Johan Tibell writes:

...

There are two subtly different ways these APIs can be strict:

1. All the functions are strict in the value argument, or 2. The structure is strict in the value field.

The problem that should be addressed by strict API is accumulation of unevaluated thunks that user of the API has no control over. Therefore strict API should solve stack-overflow issues and memory leaks. Taking that thought further...

...

The difference shows up in functions like insertWith. For example, given

insertWith (\ old _new -> old + 1) k v someMap

should 'v' be evaluated is the key 'k' is already in the map.

Here user has the control of thunks by, for example, doing:

...

insertWith (\ old new -> new `seq` (old + 1)) k v someMap

So the user has full control if v should be evaluated or not. There is no need to force v always and seems limiting application scenarios.

...

If we go with option 2 it is more difficult for the user to reason about if a given value passed to the API is evaluated or not...

Which IMHO is not a problem. For me always problem was the data that was already in the Map, but in unevaluated form. I vote for option 2 as it is the minimal option that solves my problem. (I have no comments about keys) -- Gracjan

Johan Tibell

3 Nov 3 Nov

7:43 p.m.

Hi all, After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that * it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed. Milan, what are your current thoughts? * It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones. Cheers, Johan

Edward Kmett

8:34 p.m.

I've finally come around to liking the slightly stricter implementation. If someone _really_ needs the slightly lazy version they can usually check to see if the map is null by hand, before calling in to the library, but in the converse situation, you can't tell ghc "no, really, that argument has already been seq'd so don't bother." Now I have to hunt for this behavior in my _own_ libraries. Blech. -Edward On Thu, Nov 3, 2011 at 3:43 PM, Johan Tibell wrote:

...

Hi all,

After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that

* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.

Milan, what are your current thoughts?

* It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones.

Cheers, Johan

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Simon Meier

4 Nov 4 Nov

9:31 a.m.

I also prefer the fully strict version. From an application programmer perspective, it makes the following reasoning valid: If my Map is in WHNF, then all computations that it depends on have been performed. This makes hunting for space leaks a LOT easier in my opinion. This options also allows applications like stateful servers to easily extend the full strictness to their whole state, such that WHNF on their state implies that there are no dangling computations left. I imagine that, in many cases, this is a good implementation decision for such applications. best regards, Simon 2011/11/3 Edward Kmett :

...

I've finally come around to liking the slightly stricter implementation. If someone _really_ needs the slightly lazy version they can usually check to see if the map is null by hand, before calling in to the library, but in the converse situation, you can't tell ghc "no, really, that argument has already been seq'd so don't bother." Now I have to hunt for this behavior in my _own_ libraries. Blech. -Edward

On Thu, Nov 3, 2011 at 3:43 PM, Johan Tibell wrote:

...
Hi all,

After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that

* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.

Milan, what are your current thoughts?

* It's easier to be too lazy by mistake if there are a few functions that are less strict than the remaining ones.

Cheers, Johan

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Milan Straka

4:14 p.m.

Hi Johan,

...

After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that

* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.

Milan, what are your current thoughts?

I agree with going with option one. It is a simple rule and it is consistent with the method-are-strict-in-keys behaviour. Cheers, Milan

Johan Tibell

4:17 p.m.

On Fri, Nov 4, 2011 at 9:14 AM, Milan Straka wrote:

...

Hi Johan,

...
After thinking about this some more I think we should go with option one (the most strict one). My arguments in favor are that

* it's easier to explain and remember, * it's harder to be too lazy by mistake*, * Simon M thinks it'll help GHC to avoid some redundant seq:s, and * if we can specialize the structure in the future to use monomorphic representations we can pass the value unboxed.

Milan, what are your current thoughts?

I agree with going with option one. It is a simple rule and it is consistent with the method-are-strict-in-keys behaviour.

I will go ahead and make the changes as soon as I find some time. -- Johan

4994

Age (days ago)

5002

Last active (days ago)

List overview

Download

17 comments

6 participants

participants (6)

Edward Kmett
Edward Z. Yang
Gracjan Polak
Johan Tibell
Milan Straka
Simon Meier