
Hi all, After trying to document the strictness properties of the new Data.Map.Strict module, which is meant to provides a value strict version of the Data.Map API, I feel less sure about our prior decision to make all functions strict in value arguments (even though I was the main proponent). The documentation I ended up with in Data.Map.Strict was: Strictness properties ===================== This module satisfies the following strictness properties: 1. Key and value arguments are evaluated to WHNF; 2. Keys and values are evaluated to WHNF before they are stored in the map. Here are some examples that illustrate the first property: insertWith (\ old new -> old) k undefined m == undefined delete undefined m == undefined Here are some examples that illustrate the second property: map (\ v -> undefined) m == undefined -- m is not empty mapKeys (\ k -> undefined) m == undefined -- m is not empty More than one person said the distinction between (1) and (2) isn't clear and someone else felt that the extra (1) is unnecessary (for values). We definitely want (2): it's the property that ensures that the map never contains any thunks (if we also guarantee that the map is also spine strict). This property eliminates the space leaks that people sometimes run into. This property is the raison d'etre for the Data.Map.Strict module. I would like to settle the strictness properties of Data.Map.Strict once and for all. I would prefer to find a principled reason for the resulting decision or, if I can't have that, I want to lay out all the pros and cons and then make a decision. I think this issue is important, as we're likely to see more modules that help the users get a hold on evaluation order in the future. For reference, here are the arguments I used to argue for having functions be strict in value arguments last time we discussed this: ## Ease of reasoning about evaluation order I felt that being consistently strict would it make it easier to reason about the space usage/evaluation order of code that use the API. Without the extra strictness there are degenerate cases that can catch people off guard. For example, insertWith (\ old new -> if old == maxValue then old else new + 1) k v m is lazy in 'v' because there's a rare case (old == maxValue) where 'new' isn't evaluated. Another potential benefit: in f :: Int -> Int -> Int -> Int f x y z | x == y = z | otherwise = 1 you can rely on == being strict in both arguments and avoid putting redundant bang on x and y (assuming that you want them strict). I was hoping that making all function value strict would enable similar reasoning where Data.Map.Strict is used. ## Constant factor performance improvements We already have (1) for keys in Data.Map. It's an important optimization as it allows us to keep the key unboxed in loops. Consider: lookup :: Ord k => k -> Map k a -> Maybe a lookup = go where go !_ Tip = Nothing go k (Bin _ kx x l r) = case compare k kx of LT -> go k l GT -> go k r EQ -> Just x {-# INLINABLE lookup #-} This function is strict in the key argument, even though the first case alternative doesn't use it. This allows us to compile the inner loop to go :: Int# -> Map Int Int -> Maybe Int go k# m = case m of Tip -> Nothing (Bin _ kx x l r) -> case kx of I# kx# -> case <# k# kx# of True -> go k# l ... when lookup is specialized for some unboxable key type (here: Int). This is a performance win. There's a small (or even tiny) potential performance win in having strict value arguments. Given insertWith :: Ord k => (v -> v -> v) -> k -> v -> Map k v -> Map k v insertWith = go where go _ !k def Tip = def `seq` Bin 1 k def Tip Tip go f k def (Bin sz ky y l r) = case compare k ky of LT -> balanceL ky y (go f k x l) r GT -> balanceR ky y l (go f k x r) EQ -> let x = f x y in x `seq` Bin sz k x l r {-# INLINEABLE insertWith #-} and some call site ...let v = ... in v `seq` insertWith (+) k v m... v will end up being evaluated twice, once at the call site and once inside insertWith. If insertWith was strict in the value argument the evaluation at the call site could perhaps be avoided (Simons?). A second, perhaps even more theoretical, possible optimization we could do is this: If we could specialize the runtime representation of Map to have unboxed keys and value (using e.g. associated data types), then we could write more allocation free functions. For example: findWithDefault :: Ord k => k -> v -> Map k v -> v findWithDefault = go where go !_ !def Tip = def go k def (Bin _ kx x l r) = case compare k kx of LT -> go k def l GT -> go k def r EQ -> x {-# INLINABLE findWithDefault #-} specialized for Ints (and thus Int#) could (using a w/w transformation) take def as an Int# and return it (or the found) value as an Int#. We don't really know how to do this well in practice so this remains a theoretical benefit, for now. Thoughts? Cheers, Johan