Containers and strictness continued

Milan Straka

9 Jul 2010 9 Jul '10

9:56 a.m.

Hi all, for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions. Discussion ends: 23. July 2010. 1) Strictness of keys and values in the datatype. Currently we have the following strictness flags in the datatypes: - IntMap: !key value - IntSet: !key - Map: !key value - Set: key I vote for changing Set to store keys also strict. Storing values as nonstrict probably makes sense. 2) Strictness of keys and values in the method definitions. The IntMap.lookup (lookup k t = ... seq k ...) evaluates the given key even if it is not needed (when searching empty tree). Some methods are more carefull. I vote for all IntMap, IntSet, Map and Set methods that are given a key value (insert, delete, member, ...) to be strict in the keys. This a) would be consistent with 1) b) would be a bit more efficient (~5% in the IntMap.lookup case) 3) Strict folds Currently there are no strict folds. I vote for adding strict folds (fold', foldWithKey' when appropriate) to all Map, Set, IntMap, IntSet. Cheers, Milan Straka PS: I will be unavailable and not reading mail till 19. July 2010.

Show replies by date

Felipe Lessa

9 Jul 9 Jul

11:08 a.m.

On Fri, Jul 9, 2010 at 6:56 AM, Milan Straka wrote:

...

Hi all,

for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions.

Thanks for looking into this! =)

...

Discussion ends: 23. July 2010.

1) Strictness of keys and values in the datatype.

Currently we have the following strictness flags in the datatypes: - IntMap: !key value - IntSet: !key - Map: !key value - Set: key

I vote for changing Set to store keys also strict.

Storing values as nonstrict probably makes sense.

+1 for strict Set keys

...

2) Strictness of keys and values in the method definitions.

The IntMap.lookup (lookup k t = ... seq k ...) evaluates the given key even if it is not needed (when searching empty tree). Some methods are more carefull.

I vote for all IntMap, IntSet, Map and Set methods that are given a key value (insert, delete, member, ...) to be strict in the keys. This a) would be consistent with 1) b) would be a bit more efficient (~5% in the IntMap.lookup case)

+1 for strictness in keys on all functions.

...

3) Strict folds

Currently there are no strict folds.

I vote for adding strict folds (fold', foldWithKey' when appropriate) to all Map, Set, IntMap, IntSet.

Not sure about this one, but it's probably +1 as well. Cheers, -- Felipe.

Johan Tibell

12:44 p.m.

On Fri, Jul 9, 2010 at 11:56 AM, Milan Straka wrote:

...

Hi all,

for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions.

Discussion ends: 23. July 2010.

1) Strictness of keys and values in the datatype.

Currently we have the following strictness flags in the datatypes: - IntMap: !key value - IntSet: !key - Map: !key value - Set: key

I vote for changing Set to store keys also strict.

Storing values as nonstrict probably makes sense.

2) Strictness of keys and values in the method definitions.

The IntMap.lookup (lookup k t = ... seq k ...) evaluates the given key even if it is not needed (when searching empty tree). Some methods are more carefull.

I vote for all IntMap, IntSet, Map and Set methods that are given a key value (insert, delete, member, ...) to be strict in the keys. This a) would be consistent with 1) b) would be a bit more efficient (~5% in the IntMap.lookup case)

3) Strict folds

Currently there are no strict folds.

I vote for adding strict folds (fold', foldWithKey' when appropriate) to all Map, Set, IntMap, IntSet.

I agree on all points. I've looked into the core generated for lookup, insert, etc and it looks better when the key is strict. It's hard to actually make use of the fact that some function aren't strict in the key, as they are only non-strict in the case of an empty data structure (so they're conditionally non-strict). Johan

Nicolas Pouillard

2:11 p.m.

On Fri, 9 Jul 2010 11:56:25 +0200, Milan Straka wrote:

...

Hi all,

for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions.

Discussion ends: 23. July 2010.

1) Strictness of keys and values in the datatype. 2) Strictness of keys and values in the method definitions. 3) Strict folds

+1 on all points -- Nicolas Pouillard http://nicolaspouillard.fr

Edward Kmett

2:28 p.m.

+1 across the board You may also want to look at adding monadic inserts/adjusts/etc in the sense that there is currently no way to insertWith and extract any information in one pass. This means a lot of otherwise trivial operations on tries built out of Map/IntMap take twice as long as would be otherwise necessary. Sent from my iPhone On Jul 9, 2010, at 5:56 AM, Milan Straka wrote:

...

Hi all,

for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions.

Discussion ends: 23. July 2010.

1) Strictness of keys and values in the datatype.

Currently we have the following strictness flags in the datatypes: - IntMap: !key value - IntSet: !key - Map: !key value - Set: key

I vote for changing Set to store keys also strict.

Storing values as nonstrict probably makes sense.

2) Strictness of keys and values in the method definitions.

The IntMap.lookup (lookup k t = ... seq k ...) evaluates the given key even if it is not needed (when searching empty tree). Some methods are more carefull.

I vote for all IntMap, IntSet, Map and Set methods that are given a key value (insert, delete, member, ...) to be strict in the keys. This a) would be consistent with 1) b) would be a bit more efficient (~5% in the IntMap.lookup case)

3) Strict folds

Currently there are no strict folds.

I vote for adding strict folds (fold', foldWithKey' when appropriate) to all Map, Set, IntMap, IntSet.

Cheers, Milan Straka

PS: I will be unavailable and not reading mail till 19. July 2010. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Claus Reinke

5:22 p.m.

...

You may also want to look at adding monadic inserts/adjusts/etc in the sense that there is currently no way to insertWith and extract any information in one pass.

This means a lot of otherwise trivial operations on tries built out of Map/IntMap take twice as long as would be otherwise necessary.

Would Data.IntMap.{mapAccumWithKey,insertLookupWithKey} help?

...

im fromList [(0,0),(1,1),(2,4),(3,9),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)]

...

let upd k f acc key val = if k==key then (Just val,f val) else (acc,val)

...

mapAccumWithKey (upd 3 $ const 0) Nothing im (Just 9,fromList [(0,0),(1,1),(2,4),(3,0),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)])

...

insertLookupWithKey (const $ (+)) 4 4 im (Just 16,fromList [(0,0),(1,1),(2,4),(3,9),(4,20),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)])

...

insertLookupWithKey (const $ (+)) 42 4 im (Nothing,fromList [(0,0),(1,1),(2,4),(3,9),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100),(42,4)])

Admittedly, I only remember them because I found them rather non-obvious when I needed them.. Also, mapAccumWithKey traverses the whole tree rather than the path to the entry. Claus

Edward Kmett

5:36 p.m.

Sadly, no. I wind up utilizing those operations, but you still need two passes. Inserting into a trie where the first level child node already exists has no way to report if you had to create the node n levels down. The lookup returns Just (some trie) but then you still have to walk it! So you wind up having to lookup and then insert or you wind up not memoizing the size of the trie. Both are highly suboptimal solutions. Sent from my iPhone On Jul 9, 2010, at 1:22 PM, "Claus Reinke" wrote:

...

...
You may also want to look at adding monadic inserts/adjusts/etc in the sense that there is currently no way to insertWith and extract any information in one pass.

This means a lot of otherwise trivial operations on tries built out of Map/IntMap take twice as long as would be otherwise necessary.

Would Data.IntMap.{mapAccumWithKey,insertLookupWithKey} help?

...
im fromList [(0,0),(1,1),(2,4),(3,9),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)]

...
let upd k f acc key val = if k==key then (Just val,f val) else (acc,val)

...
mapAccumWithKey (upd 3 $ const 0) Nothing im (Just 9,fromList [(0,0),(1,1),(2,4),(3,0),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)])

...
insertLookupWithKey (const $ (+)) 4 4 im (Just 16,fromList [(0,0),(1,1),(2,4),(3,9),(4,20),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100)])

...
insertLookupWithKey (const $ (+)) 42 4 im (Nothing,fromList [(0,0),(1,1),(2,4),(3,9),(4,16),(5,25),(6,36),(7,49),(8,64),(9,81),(10,100),(42,4)])

Admittedly, I only remember them because I found them rather non-obvious when I needed them.. Also, mapAccumWithKey traverses the whole tree rather than the path to the entry.

Claus

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Paulo Tanimoto

2:48 p.m.

On Fri, Jul 9, 2010 at 4:56 AM, Milan Straka wrote:

...

Hi all,

for my work on the containers I need to settle some questions about strictness. Thank you very much for your opinions.

Discussion ends: 23. July 2010.

+1 You also have my support for all proposals. Thanks for working on this. Paulo

Claus Reinke

6 p.m.

...

for my work on the containers I need to settle some questions about strictness.

Thanks again for driving this. +1 for consistent strictness in keys +1 for adding strict folds Since I've been wondering about this problem for a while, I would like to add two closely related suggestions: (a) the useability of several higher-order functions in the API can be improved by splitting them into wrapper and worker (the wrapper gets inlined away, leaving an inlined worker for which the functional parameter is fixed, so that the strictness analyser can do a better job; see GHC's implementations of list folds) (b) the only reason not to add strict-in-value versions of (Int)Map has been code duplication; it turns out that it might be possible to use the standard SPECIALIZE pragma to do most of the work for us, using nothing more than FlexibleInstances and MultiParamTypeClasses (no functional dependencies, no type families). The idea would be to provide both IntMap and IntMap', the latter being strict in the keys. There would be one type class abstractly representing IntMap constructors and (pattern match) views, and each of the API functions would be implemented once, in terms of this class. Then each of these abstract implementations would be specialized to both IntMap and IntMap'. Appended below is a concrete example of this abstract idea, a module with both strict (L1) and non-strict (L2) lists, an abstract interface (L) and a single definition of map (mapL) that is specialized for both use cases. For comparison, handcoded versions of mapL1 and mapL2 are also included. This module is small enough that one can compare the core output (-ddump-simpl), and it looks as if the -O2 compiled core for the specialized versions of mapL matches that for the handcoded versions. No manual duplication, just pragmas, same code and API, just different types for the two use cases. (a) could be included directly, (b) is just offered for discussion at the moment. If there are no hidden traps in (b), I would like to see it implemented as well, to reduce the frequent space issues with Maps and large datasets. Note that the combination of (a) and (b) could replace some of the proposed special-purpose strict functions, keeping the API simpler. Claus ----------------- {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE MultiParamTypeClasses #-} module Main(mapL1,mapL2,mapL,main) where data L1 a = N1 | C1 !a (L1 a) deriving Show -- mapL specialised to L1 should be as good as mapL1 mapL1 f N1 = N1 mapL1 f (C1 a l) = C1 (f a) (mapL1 f l) data L2 a = N2 | C2 a (L2 a) deriving Show -- mapL specialised to L2 should be as good as mapL2 mapL2 f N2 = N2 mapL2 f (C2 a l) = C2 (f a) (mapL2 f l) class L l a where -- constructors n :: l a c :: a -> l a -> l a -- view (abstract pattern match) v :: l a -> b -> (a -> l a -> b) -> b mapL :: (L l a,L l b) => (a -> b) -> l a -> l b {-# SPECIALIZE mapL :: (a -> b) -> L1 a -> L1 b #-} {-# SPECIALIZE mapL :: (a -> b) -> L2 a -> L2 b #-} mapL f l = (v l) n (\x xs->c (f x) (mapL f xs)) instance L L1 a where n = N1 c = C1 v N1 n c = n v (C1 a l) n c = c a l instance L L2 a where n = N2 c = C2 v N2 n c = n v (C2 a l) n c = c a l main = do print $ mapL (+1) $ C1 0 $ C1 1 $ C1 2 $ N1 print $ mapL (+1) $ C2 0 $ C2 1 $ C2 2 $ N2 print $ mapL1 (+1) $ C1 0 $ C1 1 $ C1 2 $ N1 print $ mapL2 (+1) $ C2 0 $ C2 1 $ C2 2 $ N2

Claus Reinke

8:18 p.m.

...

The idea would be to provide both IntMap and IntMap', the latter being strict in the keys. There would be one

strict in the values, of course.. So instead of choosing between function' and function (doubling the API size) or between Data.IntMap.strict and Data.IntMap.nonstrict, clients would simply choose between IntMap' and IntMap (even a conversion could be offered - just another specialisation of map).

...

This module is small enough that one can compare the core output (-ddump-simpl), and it looks as if the -O2 compiled core for the specialized versions of mapL matches that for the handcoded versions. No manual duplication, just pragmas, same code and API, just different types for the two use cases.

Since I don't have much practice reading core, it would be good if those of you with more core experience could check this assertion; also on whether there is any reason to expect difficulties extending this to Data.{IntMap,Map}.

...

mapL :: (L l a,L l b) => (a -> b) -> l a -> l b {-# SPECIALIZE mapL :: (a -> b) -> L1 a -> L1 b #-} {-# SPECIALIZE mapL :: (a -> b) -> L2 a -> L2 b #-}

If only the type-specialized maps would be exported (with map instead of mapL and IntMap'/IntMap instead of L1/L2), neither the class nor the language pragmas would affect client code, so this could be a mostly compatible extension. I'm not sure how to do that without writing more code, though - any suggestions? Claus

Edward Kmett

8:29 p.m.

On Fri, Jul 9, 2010 at 4:18 PM, Claus Reinke wrote:

...

The idea would be to provide both IntMap and IntMap',

...
the latter being strict in the keys. There would be one

strict in the values, of course..

So instead of choosing between function' and function (doubling the API size) or between Data.IntMap.strict and Data.IntMap.nonstrict, clients would simply choose between IntMap' and IntMap (even a conversion could be offered - just another specialisation of map). This module is small enough that one can compare

the core output (-ddump-simpl), and it looks as if

...

the -O2 compiled core for the specialized versions of mapL matches that for the handcoded versions. No manual duplication, just pragmas, same code and API, just different types for the two use cases.

Since I don't have much practice reading core, it would

...

be good if those of you with more core experience could check this assertion; also on whether there is any reason to expect difficulties extending this to Data.{IntMap,Map}.

Honestly I think the code duplication is more likely to yield robust, fast code. It can be hard to make anything where the entire API goes through a huge dictionary generate well-performing core. foldl', foldr' and their ilk are common enough that folks are pretty comfortable with that convention. mapL :: (L l a,L l b) => (a -> b) -> l a -> l b

...

{-# SPECIALIZE mapL :: (a -> b) -> L1 a -> L1 b #-} {-# SPECIALIZE mapL :: (a -> b) -> L2 a -> L2 b #-}

If only the type-specialized maps would be exported

...

(with map instead of mapL and IntMap'/IntMap instead of L1/L2), neither the class nor the language pragmas would affect client code, so this could be a mostly compatible extension. I'm not sure how to do that without writing more code, though - any suggestions?

Providing an entire IntMap' seems fairly drastic. Now you wind up with a whole host of interoperability scenarios. It strikes me that the cure is worse than the disease. -Edward Kmett

5475

Age (days ago)

5475

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Claus Reinke
Edward Kmett
Felipe Lessa
Johan Tibell
Milan Straka
Nicolas Pouillard
Paulo Tanimoto