
I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in: https://github.com/haskell/containers/issues/135 https://github.com/haskell/containers/issues/135 The motivation behind this proposal is three-fold: * for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is: take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-} This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API. Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this: take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-} I then added the following benchmark to `benchmarks/Set.hs`: diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs index 3a6e8aa..03c99fb 100644 --- a/benchmarks/Set.hs +++ b/benchmarks/Set.hs @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems Here is the performance on my machine when implementing `take` via the public API: benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated) … and the performance improved by 61% from using the internal API: benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated) … and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets. I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary. I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities. The deadline for discussion is two weeks.

I would prefer that the Ord constraint be retained in the type signature,
even if not used in the implementation. Sets and Maps conceptually do not
have an ordering; the Ord constraint indicates in which order one is
sequencing the values.
-- Dan Burton
On Mon, Mar 7, 2016 at 4:14 PM, Gabriel Gonzalez
I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Sets and maps don't inherently have orderings, but Set and Map do. I think
you could still make an argument for retaining the constraint, but it's a
thin one. It is possible to imagine that there could be some *other*
Ord-based representation of sets and maps for which having the ordering
directly available would lead to more efficient splits. Retaining the
constraint could then be seen as forward compatibility with such a
hypothetical reimplementation.
On Mar 7, 2016 7:26 PM, "Dan Burton"
I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation. Sets and Maps conceptually do not have an ordering; the Ord constraint indicates in which order one is sequencing the values.
-- Dan Burton
On Mon, Mar 7, 2016 at 4:14 PM, Gabriel Gonzalez
wrote: I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I'm fine with keeping the constraint for forwards compatibility reasons
On Mon, Mar 7, 2016 at 4:33 PM David Feuer
Sets and maps don't inherently have orderings, but Set and Map do. I think you could still make an argument for retaining the constraint, but it's a thin one. It is possible to imagine that there could be some *other* Ord-based representation of sets and maps for which having the ordering directly available would lead to more efficient splits. Retaining the constraint could then be seen as forward compatibility with such a hypothetical reimplementation. On Mar 7, 2016 7:26 PM, "Dan Burton"
wrote: I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation. Sets and Maps conceptually do not have an ordering; the Ord constraint indicates in which order one is sequencing the values.
-- Dan Burton
On Mon, Mar 7, 2016 at 4:14 PM, Gabriel Gonzalez
wrote: I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On 2016-03-08 at 01:26:12 +0100, Dan Burton wrote:
I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation.
Just for the record: You'll have to actively suppress the resulting warning if the Ord dictionary is not used by the implementation, as -Wredundant-constraints is part of -Wall starting with GHC 8.0 -- hvr

On Tue, 8 Mar 2016, Herbert Valerio Riedel wrote:
On 2016-03-08 at 01:26:12 +0100, Dan Burton wrote:
I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation.
Just for the record: You'll have to actively suppress the resulting warning if the Ord dictionary is not used by the implementation, as -Wredundant-constraints is part of -Wall starting with GHC 8.0
It seems to have changed: https://ghc.haskell.org/trac/ghc/ticket/10635#comment:7 But I had those cases in mind when I asked for removing -Wredundant-constraints from -Wall.

On 2016-03-08 at 09:58:00 +0100, Henning Thielemann wrote:
On 2016-03-08 at 01:26:12 +0100, Dan Burton wrote:
I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation.
Just for the record: You'll have to actively suppress the resulting warning if the Ord dictionary is not used by the implementation, as -Wredundant-constraints is part of -Wall starting with GHC 8.0
It seems to have changed: https://ghc.haskell.org/trac/ghc/ticket/10635#comment:7
Indeed, see also https://ghc.haskell.org/ticket/11370#comment:32 which was the result of a lengthy debate... Sadly, aspects in the warning design-space such as how to classify/default warnings in combination with the 3-rls-policy tend to drain the living will out of its participants (paraphrasing Simon Marlow)...
But I had those cases in mind when I asked for removing -Wredundant-constraints from -Wall.

On 03/08/2016 02:26 AM, Dan Burton wrote:
I would prefer that the Ord constraint be retained in the type signature, even if not used in the implementation. Sets and Maps conceptually do not have an ordering; the Ord constraint indicates in which order one is sequencing the values.
I don't quite understand the essence of your argument, but what you're proposing (annotating every function that deals with Sets or Maps with an Ord constraint) sounds like data type contexts[1], which was more or less universally admitted to be a bad idea. Could you explain in more detail why you want that? [1]: https://wiki.haskell.org/Data_declaration_with_constraint Roman

On 03/08/2016 01:14 AM, Gabriel Gonzalez wrote:
I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135 https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
+1 It also seems like a more idiomatic Haskell API to have "take", "drop", "splitAt" operations rather than[1] "lookupIndex/elemAt" + "split". [1] I realize the latter aren't going away. Regards,

Good idea, +1. On 03/08/2016 02:14 AM, Gabriel Gonzalez wrote:
I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

+1 on adding the methods, but I'd really rather see it done without
incurring spurious constraints that they don't need.
We just went through and cleaned up a few similar unused and unusable
constraints in base on various array operations. This seems to beg us to do
the same later, and we don't bother to wastefully pass in Ord constraints
on any other combinators in Data.Set or Data.Map, so why start now?
-Edward
On Mon, Mar 7, 2016 at 7:14 PM, Gabriel Gonzalez
I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Agreed. I was just playing devil's advocate.
On Mar 8, 2016 4:14 PM, "Edward Kmett"
+1 on adding the methods, but I'd really rather see it done without incurring spurious constraints that they don't need.
We just went through and cleaned up a few similar unused and unusable constraints in base on various array operations. This seems to beg us to do the same later, and we don't bother to wastefully pass in Ord constraints on any other combinators in Data.Set or Data.Map, so why start now?
-Edward
On Mon, Mar 7, 2016 at 7:14 PM, Gabriel Gonzalez
wrote: I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in:
https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
*diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs* *index 3a6e8aa..03c99fb 100644* *--- a/benchmarks/Set.hs* *+++ b/benchmarks/Set.hs* @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Alright, so no constraints then, since my reading of this is that nobody is really in favor of them at this point and several people are against.
On Mar 8, 2016, at 1:18 PM, David Feuer
wrote: Agreed. I was just playing devil's advocate.
On Mar 8, 2016 4:14 PM, "Edward Kmett"
mailto:ekmett@gmail.com> wrote: +1 on adding the methods, but I'd really rather see it done without incurring spurious constraints that they don't need. We just went through and cleaned up a few similar unused and unusable constraints in base on various array operations. This seems to beg us to do the same later, and we don't bother to wastefully pass in Ord constraints on any other combinators in Data.Set or Data.Map, so why start now?
-Edward
On Mon, Mar 7, 2016 at 7:14 PM, Gabriel Gonzalez
mailto:gabriel439@gmail.com> wrote: I would like to propose adding `take`/`drop`/`splitAt` to both `Data.Map` and `Data.Set` as originally requested in: https://github.com/haskell/containers/issues/135 https://github.com/haskell/containers/issues/135
The motivation behind this proposal is three-fold:
* for convenience - these functions are commonly used to implement pagination or previews of maps/sets * for type accuracy - the public API impose an unnecessary `Ord` constraint * for efficiency - these can be implemented more efficiently using the internal API
Currently the only way you can implement this functionality via the public API is to use `lookupIndex`/`elemAt` + `split`. For example, one way to implement `Data.Set.take` is:
take :: Ord a => Int -> Set a -> Set a take n m | n < 0 = empty | size m <= n = m | otherwise = lt where (lt, _) = split k m k = elemAt n m {-# INLINE take #-}
This implementation incurs an unnecessary `Ord` constraint due to a roundabout way of computing `take`: this extracts the element at the given index and then works backwards from the element’s value to partition the set using O(log N) comparisons. We could eliminate all of the comparisons by using the internal API.
Intuitively, we expect that the performance of `Data.Set.take` would benefit from avoiding those unnecessary comparisons and also avoiding traversing the `Set`’s spine twice. So I tested that hypothesis by implementing `take` via the internal API like this:
take :: Int -> Set a -> Set a take n0 s0 = go s0 n0 where go s@(Bin sz x l r) n = if sz <= n then s else let sl = size l in if n <= sl then go l n else link x l (go r $! n - sl) go Tip _ = Tip {-# INLINE take #-}
I then added the following benchmark to `benchmarks/Set.hs`:
diff --git a/benchmarks/Set.hs b/benchmarks/Set.hs index 3a6e8aa..03c99fb 100644 --- a/benchmarks/Set.hs +++ b/benchmarks/Set.hs @@ -31,6 +31,7 @@ main = do , bench "union" $ whnf (S.union s_even) s_odd , bench "difference" $ whnf (S.difference s) s_even , bench "intersection" $ whnf (S.intersection s) s_even + , bench "take" $ whnf (S.take (2^11)) s , bench "fromList" $ whnf S.fromList elems , bench "fromList-desc" $ whnf S.fromList (reverse elems) , bench "fromAscList" $ whnf S.fromAscList elems
Here is the performance on my machine when implementing `take` via the public API:
benchmarking take time 272.8 ns (266.7 ns .. 278.1 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 266.3 ns (261.8 ns .. 270.8 ns) std dev 15.44 ns (13.26 ns .. 18.95 ns) variance introduced by outliers: 75% (severely inflated)
… and the performance improved by 61% from using the internal API:
benchmarking take time 169.2 ns (166.1 ns .. 172.6 ns) 0.997 R² (0.996 R² .. 0.998 R²) mean 172.1 ns (169.4 ns .. 175.4 ns) std dev 10.68 ns (8.420 ns .. 15.34 ns) variance introduced by outliers: 78% (severely inflated)
… and I’m guessing (but haven’t tested) that the performance gap would only increase the more expensive the comparison function gets.
I haven’t performed comparative performance testing for `drop`/`splitAt` nor have I tested `Map` (because the benchmarks take a while for me to build and run) but I can perform those additional comparisons upon requests if people feel they are necessary.
I haven’t yet written up a full patch since the maintainer asked me to first run this proposal by the libraries mailing list to assess whether it would be wise to expand the `containers` API to include these utilities.
The deadline for discussion is two weeks.
_______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
participants (8)
-
Bardur Arantsson
-
Dan Burton
-
David Feuer
-
Edward Kmett
-
Gabriel Gonzalez
-
Henning Thielemann
-
Herbert Valerio Riedel
-
Roman Cheplyaka