Proposed additions to Data.Map: lookupFloor and lookupCeiling

Hi, here are two quick, modest additions to Data.Map. As Map is exported abstractly, lookupFlor can't be defined outside the library except via splitLookup and findMax. On the other hand, these implementations should be nearly the same cost as a regular lookup. Best, Leon -- | /O(log n)/. Find the greatest key that is smaller or equal -- to the given target key. This \"floor\" key is returned, along with it's -- associated value, as @'Just' (key, value)@. If the map does not have -- a key that is smaller or equal to the target key, then this function -- returns 'Nothing'. lookupFloor :: Ord k => k -> Map k a -> Maybe (k,a) lookupFloor = go Tip where go s k t = case t of Tip -> case s of Tip -> Nothing Bin _ ka a _ _ -> Just (ka,a) Bin _ ka a l r -> case compare ka k of LT -> go t k r GT -> go s k l EQ -> Just (ka,a) -- | /O(log n)/. Find the smallest key that is greater or equal -- to the given target key. This \"ceiling\" key is returned, along with it's -- associated value, as @'Just' (key, value)@. If the map does not have -- a key that is greater or equal to the target key, then this function -- returns 'Nothing'. lookupCeiling :: Ord k => k -> Map k a -> Maybe (k,a) lookupCeiling = go Tip where go s k t = case t of Tip -> case s of Tip -> Nothing Bin _ ka a _ _ -> Just (ka,a) Bin _ ka a l r -> case compare ka k of LT -> go s k r GT -> go t k l EQ -> Just (ka,a)

-- | /O(log n)/. Find the greatest key that is smaller or equal -- to the given target key. This \"floor\" key is returned,
-- | /O(log n)/. Find the smallest key that is greater or equal -- to the given target key. This \"ceiling\" key is returned,
It seems more intuitive to reverse the naming. The key parameter to lookupFloor is actually the ceiling, and the key returned is the greatest key less than or equal to the ceiling. Since there is no floor involved here, I would call the function lookupCeiling. Regards, Sean

On Mon, Mar 1, 2010 at 2:39 AM, Sean Leather
It seems more intuitive to reverse the naming. The key parameter to lookupFloor is actually the ceiling, and the key returned is the greatest key less than or equal to the ceiling. Since there is no floor involved here, I would call the function lookupCeiling.
The greatest number less than something would be the floor. :-) For example: ghci> let map = Map.fromList [ (x, ()) | x <- [0,10..1000] ] ghci> lookupFloor 75 map Just (70, ()) ghci> floor 7.5 7 ghci> lookupCeiling 115 map Just (120,()) ghci> ceiling 11.5 12 But honestly I almost invariably get the definitions of floor and ceiling wrong unless I stop to think for a few seconds. I've trained myself to do that. ;-) Maybe a few examples in the haddocks would be in order. That's easier to comprehend, if you know approximately what it should do. Also, it occurs to me that maybe a combined function :: (Ord k) => k -> Map k v -> (Maybe k v, Maybe k v) would be useful in addition to or in place of lookupFloor and lookupCeiling. But I'm not sure. Best, Leon

On Mon, Mar 1, 2010 at 09:40, Leon Smith wrote:
On Mon, Mar 1, 2010 at 2:39 AM, Sean Leather wrote:
It seems more intuitive to reverse the naming. The key parameter to lookupFloor is actually the ceiling, and the key returned is the greatest key less than or equal to the ceiling. Since there is no floor involved here, I would call the function lookupCeiling.
The greatest number less than something would be the floor. :-)
So, what I didn't say (but was thinking) was that I would find these renamings more intuitive: lookupFloor --> lookupWithCeiling lookupCeiling --> lookupWithFloor These tell me that the key is the ceiling or floor value. For example:
ghci> let map = Map.fromList [ (x, ()) | x <- [0,10..1000] ] ghci> lookupFloor 75 map Just (70, ()) ghci> floor 7.5 7 ghci> lookupCeiling 115 map Just (120,()) ghci> ceiling 11.5 12
I understand where you're coming from. You're mapping the set of keys to the set of integers and applying the (e.g.) floor function to the key within that set. Alternative (though verbose) names for your functions might be: lookupFloor --> applyFloorAndLookup lookupCeiling --> applyCeilingAndLookup In contrast, I looked at the parameter as being the lower or upper bound. So, perhaps an even better naming (from my point of view) would be: lookupFloor --> lookupWithUpperBound or lookupUpperBound or lookupUB or ... lookupCeiling --> lookupWithLowerBound or lookupLowerBound or lookupLB or ... Something like this might also remove any confusion about floor and ceiling. But honestly I almost invariably get the definitions of floor and
ceiling wrong unless I stop to think for a few seconds. I've trained myself to do that. ;-)
With your current naming scheme, I think I would constantly choose the opposite function. But maybe that's just me. Maybe a few examples in the haddocks would
be in order. That's easier to comprehend, if you know approximately what it should do.
Whatever is the end result, I think you should definitely have some examples in the documentation. Also, it occurs to me that maybe a combined function :: (Ord k) => k
-> Map k v -> (Maybe k v, Maybe k v) would be useful in addition to or in place of lookupFloor and lookupCeiling. But I'm not sure.
I don't know either. Regards, Sean

On Mon, Mar 1, 2010 at 2:34 AM, Sean Leather
On Mon, Mar 1, 2010 at 09:40, Leon Smith wrote:
On Mon, Mar 1, 2010 at 2:39 AM, Sean Leather wrote:
It seems more intuitive to reverse the naming. The key parameter to lookupFloor is actually the ceiling, and the key returned is the greatest key less than or equal to the ceiling. Since there is no floor involved here, I would call the function lookupCeiling.
The greatest number less than something would be the floor. :-)
So, what I didn't say (but was thinking) was that I would find these renamings more intuitive:
lookupFloor --> lookupWithCeiling lookupCeiling --> lookupWithFloor
I have these implemented as lookupBelow and lookupAbove. Perhaps
findBelow and findAbove would be more consistent with findMin findMax.
Mine are implemented as splitLookup followed by (toDescList,
toAscList). The caller can decide how many they want above and below.
It would be nice to have some discussion of performance or maybe
benchmarks to justify this addition over the more flexible
(toDescList, toAscList) implementation.
As an aside, the Data.Map implementation of split has never been
useful for me because it bizarrely omits the key at the given point.
I've never come across a situation where that would be useful.
splitLookup at least gives it to you, but separately as a Maybe rather
than as the first element of the 'above' map, which is how I always
turn out to want it.
Oh well, too late to change those functions I suppose.
The customary definition of a range is (

Sean Leather wrote:
-- | /O(log n)/. Find the greatest key that is smaller or equal -- to the given target key. This \"floor\" key is returned,
-- | /O(log n)/. Find the smallest key that is greater or equal -- to the given target key. This \"ceiling\" key is returned,
It seems more intuitive to reverse the naming. The key parameter to lookupFloor is actually the ceiling, and the key returned is the greatest key less than or equal to the ceiling. Since there is no floor involved here, I would call the function lookupCeiling.
As another point of reference, the Java TreeMap class provides [1]: floorKey(K key) Returns the greatest key less than or equal to the given key, or null if there is no such key. ceilingKey(K key) Returns the least key greater than or equal to the given key, or null if there is no such key. higherKey(K key) Returns the least key strictly greater than the given key, or null if there is no such key. lowerKey(K key) Returns the greatest key strictly less than the given key, or null if there is no such key. My proposal would be to just call things as they are, and add the functions findGreater, findLess, findGreaterEqual and findLessEqual. Twan [1] http://java.sun.com/javase/6/docs/api/java/util/TreeMap.html

Twan van Laarhoven wrote:
My proposal would be to just call things as they are, and add the functions findGreater, findLess, findGreaterEqual and findLessEqual.
I vote for that proposal. I hate adding functions to our libraries. But if we are adding these, we probably also need to add: deleteGreater, deleteLess, deleteGreaterEqual, deleteLessEqual Those common operations are also much more expensive when done from outside the library. Wait - horrors! I hadn't looked at the library lately. We now have, for each of Min and Max: find, delete, deleteFind, update, updateWithKey, view, viewWithKey For consistency, we would need to add each of those seven for each of the new comparisons - 28 new functions, for a total of 42! We have finally discovered the Answer to the Ultimate Question of Life, the Universe, and Everything. Why did someone have to add all of those functions to begin with? Look what bloat it's causing. Here are some ideas of what to do about it: 1. Define all 42 of these functions in a separate module. (This has the pleasant advantage of giving us a new bikeshed to paint to name this new module. What fun!) 2. Get rid of all of the functions except just one that is general enough to provide all of the others as a special case. Add copious documentation, with many clear examples, so that anyone can get whatever they need without having to think too much. The general function might have a type something like: data Extremality = Min | Max | Greater | Less | GreaterEqual | LessEqual Extremality -> (k -> a -> Maybe a) -> Map k a -> Maybe ((k, a), Map k a) 2'. As a variation on (2), we could, in addition to the general function, leave in the most commonly used functions in their simple form. (Which ones? Woohoo, more bikesheds!) 3. Use type trickery to construct a single interface that provides each of the possibilities. So that you could write something like update greater withKey and get the function you want. This one is the most fun to write, even though we probably won't end up using it. Regards, Yitz

On Mon, Mar 1, 2010 at 5:34 AM, Sean Leather
I understand where you're coming from. You're mapping the set of keys to the set of integers and applying the (e.g.) floor function to the key within that set. Alternative (though verbose) names for your functions might be:
lookupFloor --> applyFloorAndLookup lookupCeiling --> applyCeilingAndLookup
In contrast, I looked at the parameter as being the lower or upper bound. So, perhaps an even better naming (from my point of view) would be:
lookupFloor --> lookupWithUpperBound or lookupUpperBound or lookupUB or ... lookupCeiling --> lookupWithLowerBound or lookupLowerBound or lookupLB or ...
Ok, I think I now see where you are coming from, but I don't agree.
We are trying to find the relationship between many keys in a map to
one target key, so there is only one direction in which the standard
definitions of "floor" and "ceiling" make sense.
On Mon, Mar 1, 2010 at 9:24 PM, Twan van Laarhoven
As another point of reference, the Java TreeMap class provides [1]:
floorKey(K key) Returns the greatest key less than or equal to the given key, or null if there is no such key. ceilingKey(K key) Returns the least key greater than or equal to the given key, or null if there is no such key. higherKey(K key) Returns the least key strictly greater than the given key, or null if there is no such key. lowerKey(K key) Returns the greatest key strictly less than the given key, or null if there is no such key.
This is interesting, because I don't recall ever seeing these functions with "floor" and "ceiling" in their name. So this would be a case of convergent evolution.
My proposal would be to just call things as they are, and add the functions findGreater, findLess, findGreaterEqual and findLessEqual.
I find these names acceptable, although I would honestly prefer
something like "findFloor" or "lookupFloor"
On Tue, Mar 2, 2010 at 9:56 AM, Yitzchak Gale
Twan van Laarhoven wrote:
My proposal would be to just call things as they are, and add the functions findGreater, findLess, findGreaterEqual and findLessEqual.
I vote for that proposal.
I hate adding functions to our libraries. But if we are adding these, we probably also need to add:
deleteGreater, deleteLess, deleteGreaterEqual, deleteLessEqual
Those common operations are also much more expensive when done from outside the library.
Wait - horrors! I hadn't looked at the library lately. We now have, for each of Min and Max:
find, delete, deleteFind, update, updateWithKey, view, viewWithKey
For consistency, we would need to add each of those seven for each of the new comparisons - 28 new functions, for a total of 42! We have finally discovered the Answer to the Ultimate Question of Life, the Universe, and Everything.
Why did someone have to add all of those functions to begin with? Look what bloat it's causing.
Of course, this is the balancing act that every library writer has to manage. However, I think that it's likely that a simple "findFloor" will be sufficient to reasonably cover all these possibilities. It's just two passes over the tree, one to find the floor key, and a second to do a deletion or whatever. This is better than using splitLookup and findMax to find the floor key, as this combination involves more memory allocation and more complicated data structure transformations. (whereas lookupFloor involves no changes to the data structure, although it might force something that hadn't been forced before) Best, Leon

Twan van Laarhoven wrote:
My proposal would be to just call things as they are, and add the functions findGreater, findLess, findGreaterEqual and findLessEqual.
Leon Smith wrote:
I find these names acceptable, although I would honestly prefer something like "findFloor" or "lookupFloor"
Why?
I think that it's likely that a simple "findFloor" will be sufficient to reasonably cover all these possibilities. It's just two passes over the tree, one to find the floor key, and a second to do a deletion or whatever. This is better than using splitLookup and findMax to find the floor key, as this combination involves more memory allocation and more complicated data structure transformations. (whereas lookupFloor involves no changes to the data structure, although it might force something that hadn't been forced before)
All of those approaches have the same asymptotic behavior. The question is, how many different functions do we need in this library to improve the coefficient in various special cases? Now that I have become aware of the feature creep that is affecting this library, I am opposed to adding these functions. Enough is enough. But here's another idea: put the internal workings into a separate module called Data.Map.Internals that exports everything, and have Data.Map re-export only the stable public API. This way, people are free to upload to Hackage their own creative ways of using Data.Map more efficiently. And we can tighten up the public interface to be leaner and meaner, without depriving people of the ability to tweak down those coefficients when they need to. Regards, Yitz

Leon Smith wrote:
On Mon, Mar 1, 2010 at 5:34 AM, Sean Leather
wrote: I understand where you're coming from. You're mapping the set of keys to the set of integers and applying the (e.g.) floor function to the key within that set. Alternative (though verbose) names for your functions might be:
lookupFloor --> applyFloorAndLookup lookupCeiling --> applyCeilingAndLookup
In contrast, I looked at the parameter as being the lower or upper bound. So, perhaps an even better naming (from my point of view) would be:
lookupFloor --> lookupWithUpperBound or lookupUpperBound or lookupUB or ... lookupCeiling --> lookupWithLowerBound or lookupLowerBound or lookupLB or ...
Ok, I think I now see where you are coming from, but I don't agree. We are trying to find the relationship between many keys in a map to one target key, so there is only one direction in which the standard definitions of "floor" and "ceiling" make sense.
What is the floor/ceiling? That is, when we use a notation x = fl(y) or x = cl(y), which of x and y is the floor/ceiling? One way of looking at it is that the y we pass in is the boundary and the x is the closest thing to it. Then, y is the floor/ceiling since it's the boundary. Another way of looking at it is that we pass in something that's no good (because it's not an integer, or there's no value for that key), and we get back the nearest thing that is good. Then, x is the floor/ceiling because its the nearest thing to our no-good intermediate value. Both views make sense in their own way. The latter is the perspective used for taking the integer floor/ceiling of reals. The former surely has some analogues in topology or limit series. Personally, I think it'd give less potential for confusion to avoid either interpretation and just use names like "lookupAbove" to mean getting the (k,v) pair for the least k above the query, and analogously for "lookupBelow". -- Live well, ~wren
participants (6)
-
Evan Laforge
-
Leon Smith
-
Sean Leather
-
Twan van Laarhoven
-
wren ng thornton
-
Yitzchak Gale