Various folds in the containers package

Hi all, currently we have the following folds in the containers package: Map: fold, foldWithKey, foldrWithKey, foldlWithKey Set: fold IntMap; fold, foldWithKey IntSet: fold If the direction (right, left) is not specified, the fold is the right fold. So currently we have no left folds (except for the Map) and also Map and IntMap are not consistent. There are several issues I am thinking about: a) we could left and right folds. For the Set and IntSet it is simple -- we can add foldr and foldl, and deprecated fold with the message 'use foldr instead'. For the Map and IntMap it is not so simple -- should we add both foldr, foldl, foldlWithKey and foldrWithKey? b) we could use the Foldable instances more. We can implement foldr and foldl efficiently and let the users to use those instead of providing eg Set.foldr, Set.foldl. This does not cover the *WithKey variant, as the Foldable instances over Map and IntMap does not utilize the key. c) strict folds. If we add also strict folds, we would have a total of _four_ folds in Set and IntSet and __eight__ folds in Map and IntMap. The Foldable module exports foldl' and foldr', which use foldr, foldl respectively, but needs some intermediate memory (it creates a thunk for every node of the structure). Apart from that, this does not cover the *WithKey variants as in b). Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and specialised foldr, foldl implementations in the Foldable classes. I am not sure about the strict folds -- I used an equivalent of the foldr' from the Foldable module in GHC and it was reasonable fast. Any oppinions? Cheers, Milan

On Wed, Jun 15, 2011 at 11:15 AM, Milan Straka
a) we could left and right folds. For the Set and IntSet it is simple -- we can add foldr and foldl, and deprecated fold with the message 'use foldr instead'.
I think we should have both left and right folds for ordered containers.
For the Map and IntMap it is not so simple -- should we add both foldr, foldl, foldlWithKey and foldrWithKey?
I think so.
b) we could use the Foldable instances more. We can implement foldr and foldl efficiently and let the users to use those instead of providing eg Set.foldr, Set.foldl.
This does not cover the *WithKey variant, as the Foldable instances over Map and IntMap does not utilize the key.
I think we should provide Foldable instances, but singe e.g. foldl' isn't a method of the Foldable class we cannot do so as efficiently as we'd like. I looked into this before and the core for foldl' in Foldable is quite poor.
c) strict folds. If we add also strict folds, we would have a total of _four_ folds in Set and IntSet and __eight__ folds in Map and IntMap.
The Foldable module exports foldl' and foldr', which use foldr, foldl respectively, but needs some intermediate memory (it creates a thunk for every node of the structure). Apart from that, this does not cover the *WithKey variants as in b).
I think we definitely need strict folds. These are by far the most useful folds in my experience (foldr is mostly used to implement toList).
Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and specialised foldr, foldl implementations in the Foldable classes. I am not sure about the strict folds -- I used an equivalent of the foldr' from the Foldable module in GHC and it was reasonable fast.
Any opinions?
I think we need both lazy/strict, left/right, with/without key folds. The duplication is quite annoying but they all provide necessary functionality. The duplication touches on a bigger issue in the design of our container libraries. We currently fail at separating traversal from "consumption" efficiently. Ideally we'd have two functions: preorder :: Map k v -> [(k,v)] postorder :: Map k v -> [(k,v)] and use the list folds to work on these. However, build/foldr fusion doesn't fuse this List.foldl' f (preorder m) which makes the separation less viable, for performance reasons. This also applies to things like union, intersection, etc. Ideally we could define e.g. intersection using a general intersection function on streams (lists) module Data.List where -- Precondition: the lists must be sorted intersection :: Ord a => [a] -> [a] -> [a] module Data.Set where intersection :: Ord a => Set a -> Set a -> Set a intersection s1 s2 = fromList . List.intersection (preorder s1) (preorder s2) or something similar. In other words, I'd like to achieve the separation of algorithms and data structure that the STL achieves. (Note: Scala also does something similar). Right now there's rampant code duplication in the container library implementation that I'd like to address, but at the moment I don't know how to do so efficiently. Cheers, Johan

On Wed, Jun 15, 2011 at 5:15 AM, Milan Straka
Hi all,
currently we have the following folds in the containers package: Map: fold, foldWithKey, foldrWithKey, foldlWithKey Set: fold IntMap; fold, foldWithKey IntSet: fold
If the direction (right, left) is not specified, the fold is the right fold. So currently we have no left folds (except for the Map) and also Map and IntMap are not consistent.
There are several issues I am thinking about:
a) we could left and right folds. For the Set and IntSet it is simple -- we can add foldr and foldl, and deprecated fold with the message 'use foldr instead'.
For the Map and IntMap it is not so simple -- should we add both foldr, foldl, foldlWithKey and foldrWithKey?
b) we could use the Foldable instances more. We can implement foldr and foldl efficiently and let the users to use those instead of providing eg Set.foldr, Set.foldl.
This does not cover the *WithKey variant, as the Foldable instances over Map and IntMap does not utilize the key.
In the keys package I have a notion of FoldableWithKey class that covers this use case, defined over a large number of data types, but it can't be used by containers as it requires a type family to talk about the container key, and it would be nice to have raw material to link it to. ;) There is also traverseWithKey.
c) strict folds. If we add also strict folds, we would have a total of _four_ folds in Set and IntSet and __eight__ folds in Map and IntMap.
The Foldable module exports foldl' and foldr', which use foldr, foldl respectively, but needs some intermediate memory (it creates a thunk for every node of the structure). Apart from that, this does not cover the *WithKey variants as in b).
Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and specialised foldr, foldl implementations in the Foldable classes. I am not sure about the strict folds -- I used an equivalent of the foldr' from the Foldable module in GHC and it was reasonable fast.
Any oppinions?
Cheers, Milan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Wed, 2011-06-15 at 11:15 +0200, Milan Straka wrote:
Hi all,
currently we have the following folds in the containers package: Map: fold, foldWithKey, foldrWithKey, foldlWithKey Set: fold IntMap; fold, foldWithKey IntSet: fold
If the direction (right, left) is not specified, the fold is the right fold. So currently we have no left folds (except for the Map) and also Map and IntMap are not consistent.
My recommendation is that we add all four folds: foldr, foldl' foldl, foldr' Don't get confused with thinking about the "direction". Direction is an implementation issue (albeit an important one). As you know, the 'l' and 'r' in foldl and foldr are not direction but the left or right bracketing of the operations. left bracketing: ((0 + x_0) + x_1) + x_2 right bracketing: x_0 + (x_1 + (x_2 + 0)) Then the strictness is whether we reduce to WHNF before applying the operator. So I think those are the appropriate function names and abstract definitions, because they correspond to what we're all used to with lists. The implicit order of the order of the set/map keys, low to high. Now, when it comes to implementation the direction is important. The foldr and foldl' functions are best implemented as forwards/ascending traversals while foldl and foldr' are best implemented as backwards/descending traversals. In particular note that we can write things like: biggest = foldl (\y x -> x) undefined and evaluate it in O(log n) time. We get "early exit" behaviour. So by having all four traversals we cover the use cases where people want folds that start "from the back" (thinking about it operationally).
The Foldable module exports foldl' and foldr', which use foldr, foldl respectively, but needs some intermediate memory (it creates a thunk for every node of the structure). Apart from that, this does not cover the *WithKey variants as in b).
This is unfortunate. For data structures with equal access to all elements (arrays, map/set etc) we can implement foldl' and foldr' directly and more efficiently.
Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and specialised foldr, foldl implementations in the Foldable classes. I am not sure about the strict folds -- I used an equivalent of the foldr' from the Foldable module in GHC and it was reasonable fast.
Any oppinions?
So I suggest all four folds. They are all useful and can all be implemented efficiently. Perhaps you need the withKey variety too. Do provide a Foldable instance, and perhaps later we can lobby for foldl' and foldr' to be made members of the Foldable class rather than fixed definitions in terms of foldl and foldr (bytestring, vector and text could benefit from this too). Duncan

On Wed, Jun 15, 2011 at 8:43 AM, Duncan Coutts wrote: On Wed, 2011-06-15 at 11:15 +0200, Milan Straka wrote: Hi all, currently we have the following folds in the containers package:
Map: fold, foldWithKey, foldrWithKey, foldlWithKey
Set: fold
IntMap; fold, foldWithKey
IntSet: fold If the direction (right, left) is not specified, the fold is the right
fold. So currently we have no left folds (except for the Map) and also
Map and IntMap are not consistent. My recommendation is that we add all four folds: foldr, foldl'
foldl, foldr' Don't get confused with thinking about the "direction". Direction is an
implementation issue (albeit an important one). As you know, the 'l' and
'r' in foldl and foldr are not direction but the left or right
bracketing of the operations. left bracketing: ((0 + x_0) + x_1) + x_2
right bracketing: x_0 + (x_1 + (x_2 + 0)) Then the strictness is whether we reduce to WHNF before applying the
operator. So I think those are the appropriate function names and abstract
definitions, because they correspond to what we're all used to with
lists. The implicit order of the order of the set/map keys, low to high. Now, when it comes to implementation the direction is important. The
foldr and foldl' functions are best implemented as forwards/ascending
traversals while foldl and foldr' are best implemented as
backwards/descending traversals. In particular note that we can write things like: biggest = foldl (\y x -> x) undefined and evaluate it in O(log n) time. We get "early exit" behaviour. So by having all four traversals we cover the use cases where people
want folds that start "from the back" (thinking about it operationally). The Foldable module exports foldl' and foldr', which use foldr, foldl
respectively, but needs some intermediate memory (it creates a thunk
for every node of the structure). Apart from that, this does not
cover the *WithKey variants as in b). This is unfortunate. For data structures with equal access to all
elements (arrays, map/set etc) we can implement foldl' and foldr'
directly and more efficiently. Perhaps you need the withKey variety too. Do provide a Foldable instance, and perhaps later we can lobby for foldl' and foldr' to be
made members of the Foldable class rather than fixed definitions in
terms of foldl and foldr (bytestring, vector and text could benefit from
this too). I vaguely recall a discussion about adding them to the Foldable class with
the current definitions as defaults, but I don't think an actual
libraries@submission was ever made. I would definitely support this
motion.
-Edward

Duncan Coutts
So I suggest all four folds. They are all useful and can all be implemented efficiently.
For certain classes of operation ⓧ, a tree-fold (( _ ⓧ _) ⓧ (_ ⓧ _)) gives better complexity. Is there room for that, or is it too difficult to decide what to do about the unbalanced parts? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Thu, Jun 16, 2011 at 11:39 AM, Jon Fairbairn
Duncan Coutts
writes: So I suggest all four folds. They are all useful and can all be implemented efficiently.
For certain classes of operation ⓧ, a tree-fold
(( _ ⓧ _) ⓧ (_ ⓧ _))
gives better complexity. Is there room for that, or is it too difficult to decide what to do about the unbalanced parts?
I've been thinking about adding monoidal fold to unordered-containers. One interesting property is that you can evaluate branches in parallel. Perhaps containers should have one too. Cheers, Johan

On Thu, Jun 16, 2011 at 6:23 AM, Johan Tibell
On Thu, Jun 16, 2011 at 11:39 AM, Jon Fairbairn
wrote: Duncan Coutts
writes: So I suggest all four folds. They are all useful and can all be implemented efficiently.
For certain classes of operation ⓧ, a tree-fold
(( _ ⓧ _) ⓧ (_ ⓧ _))
gives better complexity. Is there room for that, or is it too difficult to decide what to do about the unbalanced parts?
I've been thinking about adding monoidal fold to unordered-containers. One interesting property is that you can evaluate branches in parallel. Perhaps containers should have one too.
Yes please. Note that monoidal fold + fmap is cata on leaf-labeled trees. -Jan

On Thu, 2011-06-16 at 10:39 +0100, Jon Fairbairn wrote:
Duncan Coutts
writes: So I suggest all four folds. They are all useful and can all be implemented efficiently.
For certain classes of operation ⓧ, a tree-fold
(( _ ⓧ _) ⓧ (_ ⓧ _))
gives better complexity. Is there room for that, or is it too difficult to decide what to do about the unbalanced parts?
That would be covered by the Foldable instance since Foldable provides fold :: (Data.Monoid.Monoid m) => t m -> m which is exactly what you want. So we'd just want to provide a native implementation of it (rather than getting the default defined in terms of foldr). Milan: so that's another good argument in favour of providing a Foldable instance. (And also of having foldl' and foldr' as Foldable class methods) Duncan

So I suggest all four folds. They are all useful and can all be implemented efficiently.
For certain classes of operation ⓧ, a tree-fold
(( _ ⓧ _) ⓧ (_ ⓧ _))
gives better complexity. Is there room for that, or is it too difficult to decide what to do about the unbalanced parts?
That would be covered by the Foldable instance since Foldable provides
fold :: (Data.Monoid.Monoid m) => t m -> m
which is exactly what you want. So we'd just want to provide a native implementation of it (rather than getting the default defined in terms of foldr).
Milan: so that's another good argument in favour of providing a Foldable instance. (And also of having foldl' and foldr' as Foldable class methods)
We actually do provide Foldable instance now, but we define only foldMap. I will add implementation for all other methods to achieve best complexity. Having foldl' and foldr' in Foldable class would be great. If that happens, we would definitely add specialised implementations. Cheers, Milan

On Mon, 2011-06-20 at 18:49 +0200, Milan Straka wrote:
Milan: so that's another good argument in favour of providing a Foldable instance. (And also of having foldl' and foldr' as Foldable class methods)
We actually do provide Foldable instance now, but we define only foldMap. I will add implementation for all other methods to achieve best complexity.
Great.
Having foldl' and foldr' in Foldable class would be great. If that happens, we would definitely add specialised implementations.
I've just sent a new proposal for exactly that. Duncan
participants (6)
-
Duncan Coutts
-
Edward Kmett
-
Jan-Willem Maessen
-
Johan Tibell
-
Jon Fairbairn
-
Milan Straka