Proposal: add 'findLess' and variants to containers

Hello all, == Overview == I have several times needed functions to find items less than a given key in a container. So I propose we add these to the containers library: -- | /O(log n)/. -- Find the largest element in the set that is @<@ the query. findLess :: Ord a => a -> Set a -> Maybe a -- | /O(log n)/. -- Find the largest element in the set that is @<=@ the query. findLessEqual :: Ord a => a -> Set a -> Maybe a -- | /O(log n)/. -- Find the smallest element in the set that is @>@ the query. findGreater :: Ord a => a -> Set a -> Maybe a -- | /O(log n)/. -- Find the smallest element in the set that is @>=@ the query. findGreaterEqual :: Ord a => a -> Set a -> Maybe a And for Maps: findLess, findLessEqual, findGreater, findGreaterEqual :: Ord a => a -> Map a b -> Maybe (a,b) As well as the equivalent functions for IntSet and IntMap. == Previous proposal and related functions == There was a previous proposal to add such functions about two years ago [1]. That proposal died due to concerns over adding related 'delete' and 'update' functions, and that the API would become too large. I think the 'find' functions should be added regardless of whether we also add 'update' and 'delete' variants. On the other hand, for consitency it would be better to also add 'deleteLess', etc. We should perhaps mimic the findMin/findMax functions. The only issue with that idea is that findMin does not return a Maybe, but rather fails for an empty container. I think that not returning its result in a Maybe would make the findLess useless in practice, since there is no such easy precondition for findLess. As for the API size: This proposal would add a lot of functions to Data.Map and Data.Set (especially when we also include 'update' and 'delete' variants). However, the mental burden of these new functions is not so large. Library users already know what 'find', 'update' and 'delete' mean. Moreover, there is a nice symmetry between the 4 different conditions, which means that you can almost think of them as one thing. == Bikeshedding / naming == There are several possible names for these functions: 1. findLess / findLessEqual / findGreater / findGreaterEqual 2. findMaxLT / findMaxLE / findMinGT / findMinGE 3. findLower / findFloor / findHigher / findCeiling I personally prefer either 1 or 2. To me option 3 (which Java uses) is very confusing. == Implementation == These functions can be implemented in terms of split, but it is a pain (and error prone) to do so: findLess a m = findMax' (fst (split a m)) findLessEqual a m = case splitLookup a s of (_,Just b,_) -> Just (a,b) (m',Nothing,_) -> findMax' m' findGreater a m = findMin' (snd (split a m)) findGreaterEqual a m = case splitLookup a m of (_,Just b,_) -> Just (a,b) (_,Nothing,m') -> findMin' m' where findMin' = fmap fst . minViewWithKey findMax' = fmap fst . maxViewWithKey This is really something that shouldn't have to be done by users of the library. And we might also be able to come up with a more efficient implementation. == Discussion period == Discussion deadline: 2 weeks from now; Sunday, 26 February 2012. Twan [1] http://www.haskell.org/pipermail/libraries/2010-February/013060.html continued here: http://www.haskell.org/pipermail/libraries/2010-March/013062.html

Hi, First, thanks for writing such a detailed proposal. I don't have strong feelings one way or the other. Like always I'm worried about API growth. How about performance? Can these functions be implemented efficiently outside the library? Since I don't feel strongly either way I'm fine with this if Milan and the community are. Cheers, Johan

Thanks for making this proposal, Twan. I've actually been meaning to make
the same proposal for the last few weeks. Before I understood the proposal
process (thanks again to Milan for filling me in) I made a pull request on
GitHub with an implementation of a `successor` function on IntMap (very
similar to your `findGreaterEqual`).
https://github.com/haskell/containers/pull/8
It shows that a more efficient implementation is available, which avoids
`splitLookup` and therefore `union`. (Disclaimer: I haven't run any
benchmarks to show that my internal implementation is actually faster.)
As I mention in that pull, these functions are useful for very practical
things like consistent hashing, etc.
Mike Craig
On Thu, Feb 16, 2012 at 2:37 PM, Johan Tibell
Hi,
First, thanks for writing such a detailed proposal.
I don't have strong feelings one way or the other. Like always I'm worried about API growth. How about performance? Can these functions be implemented efficiently outside the library?
Since I don't feel strongly either way I'm fine with this if Milan and the community are.
Cheers, Johan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 16/02/12 20:37, Johan Tibell wrote:
Hi,
First, thanks for writing such a detailed proposal.
I don't have strong feelings one way or the other. Like always I'm worried about API growth. How about performance? Can these functions be implemented efficiently outside the library?
I also did some performance tests, comparing an implementation that uses only the exposed API (split,findMin) to one that uses the constructors. The benchmark code is attached, it is mostly copy-pasted from benchmarks/Map.hs. Here is a general idea of the results (ghc 7.0.4 on windows 7) : benchmarking GE split absent (based on exposed library) mean: 298.6235 us, lb 295.8455 us, ub 301.4016 us, ci 0.950 std dev: 14.43998 us, lb 12.51990 us, ub 19.21979 us, ci 0.950 benchmarking GE custom absent mean: 45.04386 us, lb 44.62368 us, ub 45.42206 us, ci 0.950 std dev: 2.075708 us, lb 1.874807 us, ub 2.362928 us, ci 0.950 The results on other benchmarks are similar: A custom implementation is 4-6 times faster. I haven't yet compared the performance for Set yet (which I expect will be similar), nor for IntMap (where I have no clue). Besides the question of whether these functions can be implemented efficiently outside the library, I think you should also ask whether they can be implemented in a way that is obviously correct. I argued in the proposal that this is not the case, which is another argument for adding them. Twan

On 2012-02-17 22:21, Twan van Laarhoven wrote:
On 16/02/12 20:37, Johan Tibell wrote:
Hi,
First, thanks for writing such a detailed proposal.
I don't have strong feelings one way or the other. Like always I'm worried about API growth. How about performance? Can these functions be implemented efficiently outside the library?
I also did some performance tests, comparing an implementation that uses only the exposed API (split,findMin) to one that uses the constructors. The benchmark code is attached, it is mostly copy-pasted from benchmarks/Map.hs. Here is a general idea of the results (ghc 7.0.4 on windows 7) :
I haven't yet compared the performance for Set yet (which I expect will be similar), nor for IntMap (where I have no clue).
Here are the results for IntMap: benchmarking GE split present mean: 313.7273 us, lb 312.2985 us, ub 314.8706 us, ci 0.950 std dev: 7.010637 us, lb 4.061641 us, ub 9.947760 us, ci 0.950 benchmarking GE def present mean: 197.7439 us, lb 195.8572 us, ub 199.6307 us, ci 0.950 std dev: 9.474155 us, lb 9.388414 us, ub 9.482203 us, ci 0.950 benchmarking GE split absent mean: 625.5818 us, lb 621.1370 us, ub 630.5826 us, ci 0.950 std dev: 24.49309 us, lb 21.45281 us, ub 26.63424 us, ci 0.950 benchmarking GE Craig absent mean: 239.5751 us, lb 237.8357 us, ub 241.3148 us, ci 0.950 std dev: 9.259975 us, lb 7.455007 us, ub 11.33443 us, ci 0.950 benchmarking GE def absent mean: 186.1479 us, lb 184.3935 us, ub 187.7270 us, ci 0.950 std dev: 8.600419 us, lb 8.080284 us, ub 8.809609 us, ci 0.950 As you can see, when the key is present in the map, a specialized version ("def") takes 2/3 as long as one built on top of split. But when the key is not in the map, the difference becomes larger, around 1/3. For findGreater (as opposed to findGreaterEqual), we are essentially always in the 'not found in the map' case. By the way, in all cases, both for Map and for IntMap, passing a default argument is better than returning nothing and then later unpacking that again. What I mean is: go _ Tip = Nothing go k (Bin _ kx x l r) = case compare k kx of LT -> case go k l of Nothing -> Just (kx,x) ret -> ret GT -> go k r EQ -> Just (kx,x) vs. go def _ Tip = def go def k (Bin _ kx x l r) = case compare k kx of LT -> go (Just (kx,x)) k l GT -> go def k r EQ -> Just (kx,x) If you want to include the attached code in the library, you should use Map_FindGE.findGreater{Equal}3 and IntMap_FindGE.findGreater{Equal}3, those are the most efficient versions. Twan

Hi, Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names. I currently dislike that we have 'lookup' and 'findDefault' (which is lookup with a default value.) We should agree on a principle for when we use the two words. -- Johan

Hi,
Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names. I currently dislike that we have 'lookup' and 'findDefault' (which is lookup with a default value.) We should agree on a principle for when we use the two words.
We should definitely settle on the names. At some point I thought maybe we could add just these two functions: predecessor :: Ord a => a -> Set a -> Maybe a successor :: Ord a => a -> Set a -> Maybe a These functions work like findLess, findGreater, i.e., they return a strict predecessor or successor of a given key. The key does not need to exist in the set/map. Are there any good use cases for findLessEqual/findGreaterEqual, instead of calling lookup and if it fails, call predecessor/successor? Cheers, Milan

On 2012-02-18 00:51, Johan Tibell wrote:
Lets settle on the names. I currently dislike that we have 'lookup' and 'findDefault' (which is lookup with a default value.) We should agree on a principle for when we use the two words.
With regards to 'lookup' vs 'find': the current convention seems to be that lookup* returns 'Maybe a', while find* returns 'a'. Compare for example 'lookupIndex' and 'findIndex'. That means that the proposed functions should be called 'lookupLess', etc. On 2012-02-18 11:46, Milan Straka wrote:
At some point I thought maybe we could add just these two functions:
predecessor :: Ord a => a -> Set a -> Maybe a successor :: Ord a => a -> Set a -> Maybe a
These functions work like findLess, findGreater, i.e., they return a strict predecessor or successor of a given key. The key does not need to exist in the set/map.
The terms successor and predecessor make some sense for Set, but less for a Map. IMO, a name like findGreater makes the intent much clearer.
Are there any good use cases for findLessEqual/findGreaterEqual, instead of calling lookup and if it fails, call predecessor/successor?
Such a two-tier lookup would be two times as slow, and also inconvenient for the user of the library. With this argument we should also get rid of, for example, one of 'isSubmapOf' and 'isProperSubmapOf'. Here is a simple use case: suppose that you have some boxes, Set Size, and want to find the smallest box that will fit a certain item. The natural thing to do would be 'findGreaterEqual itemSize boxes'. If the sizes are integers, you could do 'findGreater (itemSize-1) boxes'. But that makes the intent of the code less clear, and makes it easier to introduce bugs. This trick also fails when sizes are not integers. Twan

On Sat, 18 Feb 2012, Twan van Laarhoven wrote:
On 2012-02-18 00:51, Johan Tibell wrote:
Lets settle on the names. I currently dislike that we have 'lookup' and 'findDefault' (which is lookup with a default value.) We should agree on a principle for when we use the two words.
With regards to 'lookup' vs 'find': the current convention seems to be that lookup* returns 'Maybe a', while find* returns 'a'. Compare for example 'lookupIndex' and 'findIndex'.
Are there any good use cases for findLessEqual/findGreaterEqual, instead of calling lookup and if it fails, call predecessor/successor?
These findLessEqual/findGreaterEqual remind me on the search for a string with the maximum common prefix. That is, if I have a (set :: Set String) and I want to find the member of 'set', that has the longest common prefix with a string 'str', then I know that either (findLessEqual str set) or (findGreaterEqual str set) is the wanted string.

On 12/02/12 14:14, Twan van Laarhoven wrote:
Discussion deadline: 2 weeks from now; Sunday, 26 February 2012.
The deadline for discussion has passed. There were no explicit votes for or against, but 4 reactions that count as positive, and 1 as unconvinced. The only issue seems to be the names:
Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names.
I have made a case for /lookup(Less|Greater)(Equal)?/, based on the fact that lookup* functions return "Maybe x" values, while find* functions return values without a maybe wrapper. But I don't really care that much. Flipping a coin between "find" and "lookup" will also work for me :) Twan

On Fri, Mar 2, 2012 at 10:52 AM, Twan van Laarhoven
I have made a case for /lookup(Less|Greater)(Equal)?/**, based on the fact that lookup* functions return "Maybe x" values, while find* functions return values without a maybe wrapper. But I don't really care that much. Flipping a coin between "find" and "lookup" will also work for me :)
I'd say lookup.

I'm generally +1, but I still don't really like the names "lookupLess"
or "findLess". It just doesn't invoke the right association. Maybe
something that shows the relation to findMin/Max? E.g., something
more along the lines of "lookupMaxBelow", "lookupMinAbove".
On 2 March 2012 18:52, Twan van Laarhoven
On 12/02/12 14:14, Twan van Laarhoven wrote:
Discussion deadline: 2 weeks from now; Sunday, 26 February 2012.
The deadline for discussion has passed. There were no explicit votes for or against, but 4 reactions that count as positive, and 1 as unconvinced.
The only issue seems to be the names:
Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names.
I have made a case for /lookup(Less|Greater)(Equal)?/, based on the fact that lookup* functions return "Maybe x" values, while find* functions return values without a maybe wrapper. But I don't really care that much. Flipping a coin between "find" and "lookup" will also work for me :)
Twan
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Push the envelope. Watch it bend.

2012/3/3 Thomas Schilling
I'm generally +1, but I still don't really like the names "lookupLess" or "findLess". It just doesn't invoke the right association. Maybe something that shows the relation to findMin/Max? E.g., something more along the lines of "lookupMaxBelow", "lookupMinAbove".
+1 for lookupMaxBelow and lookupMinAbove (and lookupMaxBelowEqual / lookupMinAboveEqual). David

On 3/2/12 7:43 PM, David Waern wrote:
2012/3/3 Thomas Schilling
: I'm generally +1, but I still don't really like the names "lookupLess" or "findLess". It just doesn't invoke the right association. Maybe something that shows the relation to findMin/Max? E.g., something more along the lines of "lookupMaxBelow", "lookupMinAbove".
+1 for lookupMaxBelow and lookupMinAbove (and lookupMaxBelowEqual / lookupMinAboveEqual).
If we go with min/max terminology rather than lt/le/gt/ge, might I suggest rather: lookupMaxNotAbove / lookupMinNotBelow ? "Below equal" and "above equal" is mixing metaphors. -- Live well, ~wren

Hi,
On 12/02/12 14:14, Twan van Laarhoven wrote:
Discussion deadline: 2 weeks from now; Sunday, 26 February 2012.
The deadline for discussion has passed. There were no explicit votes for or against, but 4 reactions that count as positive, and 1 as unconvinced.
The only issue seems to be the names:
Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names.
I have made a case for /lookup(Less|Greater)(Equal)?/, based on the fact that lookup* functions return "Maybe x" values, while find* functions return values without a maybe wrapper. But I don't really care that much. Flipping a coin between "find" and "lookup" will also work for me :)
I would go with lookupLT, lookupLE, lookupGE, lookupGT The names are short, and LT/GT are universally known from the Ordering type. Cheers, Milan

On Sat, 3 Mar 2012, Milan Straka wrote:
Hi,
On 12/02/12 14:14, Twan van Laarhoven wrote:
Discussion deadline: 2 weeks from now; Sunday, 26 February 2012.
The deadline for discussion has passed. There were no explicit votes for or against, but 4 reactions that count as positive, and 1 as unconvinced.
The only issue seems to be the names:
Thanks for the benchmark. In light of this and your other arguments I think we should add the functions. Lets settle on the names.
I have made a case for /lookup(Less|Greater)(Equal)?/, based on the fact that lookup* functions return "Maybe x" values, while find* functions return values without a maybe wrapper. But I don't really care that much. Flipping a coin between "find" and "lookup" will also work for me :)
I would go with
lookupLT, lookupLE, lookupGE, lookupGT
The names are short, and LT/GT are universally known from the Ordering type.
+1 for same reason. I would not like the negation in "NotAbove".

On 3/3/12 4:28 AM, Henning Thielemann wrote:
On Sat, 3 Mar 2012, Milan Straka wrote:
I would go with
lookupLT, lookupLE, lookupGE, lookupGT
The names are short, and LT/GT are universally known from the Ordering type.
+1 for same reason.
This is my preferred naming scheme too.
I would not like the negation in "NotAbove".
I merely mentioned it because at least it's consistent/coherent, as opposed to mixing the comparator terminology with the extrema terminology. -- Live well, ~wren
participants (9)
-
David Waern
-
Henning Thielemann
-
Isaac Dupree
-
Johan Tibell
-
Michael Craig
-
Milan Straka
-
Thomas Schilling
-
Twan van Laarhoven
-
wren ng thornton