containers: to be or not to be (strict, in this case)?

[addressing libraries@ as the maintainer of containers] summary: could we please have equal support for all container operations, parameterised by element strictness? We are now starting to see issues with container strictness almost as often as we used to see the foldl/foldl' issue (the most recent example on cafe even used foldl' to accumulate an IntMap, unfortunately running into the non-strict nature of unionWith). As with foldl' itself, this is only partially a learning from experience issue: foldl' wasn't always available, and containers is rather inconsistent about supporting (element-)strict operations (Data.Map offers insertWith', Data.IntMap doesn't, neither offers strict unionWith; Data.Set is completely ambivalent about element strictness, depending on whether or not comparison is used). For the case of Maps, a partial workaround is known, namely to tie the availability of keys to evaluation of values. But this only works if key and value are supplied from the outside - which is not the case for the *With* family of functions (the supplied operation is applied to the old value, from within the Map, and the new value, from outside - there is no leverage to apply strictness), nor for map. Another workaround is to define your own strict insertWith', then to avoid the non-strict parts of the API: insertWith' op (k,v) m = maybe (IM.insert k v m) (\old->((IM.insert k) $! (v `op` old)) m) (IM.lookup k m) Apart from costing an extra lookup, that cannot be the intended way to use the API. If possible, I'd like to see both element-strict and element-non-strict containers supported, with otherwise the same APIs, and without a separate strict-containers package. The obvious disadvantage of code duplication (Data.Map has two definitions of insertWith) could perhaps be avoided, by parameterising the code over element strictness, as demonstrated here for Data.IntMap.insertWith: insertWithKey :: (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKey = insertWithKeyS ($) insertWithKey' :: (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKey' = insertWithKeyS ($!) type Strictness c a = (a -> c a) -> (a -> c a) -- constructor transformers insertWithKeyS :: Strictness IntMap a -> (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKeyS ($) f k x t = case t of Bin p m l r | nomatch k p m -> join k (Tip k $ x) p t | zero k m -> Bin p m (insertWithKeyS ($) f k x l) r | otherwise -> Bin p m l (insertWithKeyS ($) f k x r) Tip ky y | k==ky -> Tip k $ (f k x y) | otherwise -> join k (Tip k $ x) ky t Nil -> Tip k $ x The idea being to abstract over every application of the container constructors to the element type, then to supply either strict or non-strict application. Ultimately, one might prefer a type-constructor-based abstraction instead, to gain additional performance, but this should be at least as good as the current situation (perhaps with an INLINE on the parameterised versions), without the duplication. Btw, this variant is slightly stricter than Data.Map.insertWith', applying the strictness not only to the function passed in, but to all IntMap construction from the element type (seemed more consistent that way..). Claus

One can always turn a strict data structure into a non-strict one by introducing lazyness explicitly like data Box a = Box a deriving(Eq,Ord,..) then having Set (Box Int) for a set of lazy ints. It might make sense to make the strict implementation the default and emulate the lazy one with boxes like above. It would give us both with very little code duplication. John -- John Meacham - ⑆repetae.net⑆john⑈

On Wed, 2009-03-04 at 22:55 +0000, Claus Reinke wrote:
[addressing libraries@ as the maintainer of containers]
summary: could we please have equal support for all container operations, parameterised by element strictness?
We are now starting to see issues with container strictness almost as often as we used to see the foldl/foldl' issue (the most recent example on cafe even used foldl' to accumulate an IntMap, unfortunately running into the non-strict nature of unionWith).
As with foldl' itself, this is only partially a learning from experience issue: foldl' wasn't always available, and containers is rather inconsistent about supporting (element-)strict operations (Data.Map offers insertWith', Data.IntMap doesn't, neither offers strict unionWith; Data.Set is completely ambivalent about element strictness, depending on whether or not comparison is used).
For the case of Maps, a partial workaround is known, namely to tie the availability of keys to evaluation of values. But this only works if key and value are supplied from the outside - which is not the case for the *With* family of functions (the supplied operation is applied to the old value, from within the Map, and the new value, from outside - there is no leverage to apply strictness), nor for map.
Another workaround is to define your own strict insertWith', then to avoid the non-strict parts of the API:
insertWith' op (k,v) m = maybe (IM.insert k v m) (\old->((IM.insert k) $! (v `op` old)) m) (IM.lookup k m)
Apart from costing an extra lookup, that cannot be the intended way to use the API.
If possible, I'd like to see both element-strict and element-non-strict containers supported, with otherwise the same APIs, and without a separate strict-containers package. The obvious disadvantage of code duplication (Data.Map has two definitions of insertWith) could perhaps be avoided, by parameterising the code over element strictness, as demonstrated here for Data.IntMap.insertWith:
insertWithKey :: (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKey = insertWithKeyS ($)
insertWithKey' :: (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKey' = insertWithKeyS ($!)
type Strictness c a = (a -> c a) -> (a -> c a) -- constructor transformers
insertWithKeyS :: Strictness IntMap a -> (Key -> a -> a -> a) -> Key -> a -> IntMap a -> IntMap a insertWithKeyS ($) f k x t = case t of Bin p m l r | nomatch k p m -> join k (Tip k $ x) p t | zero k m -> Bin p m (insertWithKeyS ($) f k x l) r | otherwise -> Bin p m l (insertWithKeyS ($) f k x r) Tip ky y | k==ky -> Tip k $ (f k x y) | otherwise -> join k (Tip k $ x) ky t Nil -> Tip k $ x
The idea being to abstract over every application of the container constructors to the element type, then to supply either strict or non-strict application. Ultimately, one might prefer a type-constructor-based abstraction instead, to gain additional performance, but this should be at least as good as the current situation (perhaps with an INLINE on the parameterised versions), without the duplication.
Btw, this variant is slightly stricter than Data.Map.insertWith', applying the strictness not only to the function passed in, but to all IntMap construction from the element type (seemed more consistent that way..).
data MyContainerType elem = ... !elem ... data Box a = Box a type MyLazyContainerType elem = MyContainerType (Box elem)

Claus Reinke wrote:
The idea being to abstract over every application of the container constructors to the element type, then to supply either strict or non-strict application. Ultimately, one might prefer a type-constructor-based abstraction instead, to gain additional performance, but this should be at least as good as the current situation (perhaps with an INLINE on the parameterised versions), without the duplication.
the "Box" approach is attractive but performance-unoptimal... we can try that type-constructor stuff. Is it like: data IntMap_ strictness a = ... --(not including "strictness" values) data Strict data Lazy class Strictness s where ... well, it turns out to have similar performance issues to passing around ($) or ($!), except they'd be resolved in the compiler by SPECIALIZE rather than INLINE stuff (I guess). Anyway, the compiler basically can't optimize away Box at all. And changing the strictness of a Box-based IntMap, would be linear time rather than constant (zero) time for a purely newtype-based solution (if indeed they need to be separate types, rather than just different modules with different implementations of e.g. insertWith for the same data-type). -Isaac

On Wed, Mar 04, 2009 at 07:45:18PM -0500, Isaac Dupree wrote:
Claus Reinke wrote:
The idea being to abstract over every application of the container constructors to the element type, then to supply either strict or non-strict application. Ultimately, one might prefer a type-constructor-based abstraction instead, to gain additional performance, but this should be at least as good as the current situation (perhaps with an INLINE on the parameterised versions), without the duplication.
the "Box" approach is attractive but performance-unoptimal...
we can try that type-constructor stuff. Is it like:
data IntMap_ strictness a = ... --(not including "strictness" values) data Strict data Lazy class Strictness s where ...
well, it turns out to have similar performance issues to passing around ($) or ($!), except they'd be resolved in the compiler by SPECIALIZE rather than INLINE stuff (I guess).
Anyway, the compiler basically can't optimize away Box at all.
And changing the strictness of a Box-based IntMap, would be linear time rather than constant (zero) time for a purely newtype-based solution (if indeed they need to be separate types, rather than just different modules with different implementations of e.g. insertWith for the same data-type).
Hmm.. would it make sense to want to use both lazy and strict operations on the same data type? It seems like it could be useful which argues against making them distinct types. Actually, how many of the standard operations do we need strict versions of? most are strictness agnostic to begin with, it seems that just providing insert' alternatives for those ones that matter would be enough and leaving the implementation of the strict versions to the internals of the library. parameterizing by the application function and pervasive inlining like you said seems like a reasonable way to go about it. for Data.Set it seems like we would only need alternate versions of not more than a half dozen or so functions. Since only a few actually add elements to the set and the other operations are strictness preserving (as in, a union of strict sets is a strict set). John -- John Meacham - ⑆repetae.net⑆john⑈
participants (4)
-
Claus Reinke
-
Derek Elkins
-
Isaac Dupree
-
John Meacham