
Mitchell, Neil
nubOrd: Seems good. Useful function to have. Shame its not in Data.List, but I understand the reasons for that, and think this is a perfectly sensible choice.
Thanks.
nubWith: Seems bad. A not particularly useful function (other than to write nubOrd), with a fairly confusing name. I have previously used nubWith to mean nubOn (nubOn f = nubBy ((==) `on` f)) with cacheing of the function f. As a name, "with" only means "with additional stuff" - it gives no hint what the additional stuff is.
Were we to keep it, do you have a better naming suggestion? I agree that nubWith is not optimal, but I don't have a great alternative. As you sort of suggest below, maybe nubFilter could work? [The problem starts with the belief of everyone I've ever talked to about it that "nub" itself should have been called "uniq". :-)]
The concept of stop-lists is a little confusing, and then you tell the user they almost certainly want to pass [] every single time - in that case why do they get an option?
One can imagine situations in which you want to accumulate a "stop list" over several runs of the function, but of course nubWith as designed won't let you do that, so that's fail. If they already have a list of things they want pre-stopped, they could pass that in. Imagine a variant of nondescendingSubsequence below in which the accumulator started with 0 instead of the first element, for example. Of course the technical reason the user needs to pass an empty stop list is that we killed the type class that goes with nubWith, so there's no way to know what the empty accumulator of their stop list type actually is.
Also stop-lists have type "b", so stop-unconstrainted-type-variable seems a more appropriate name.
The name "stop list" is a term of art in the algorithm and AI fields, which is why I chose it. I could stick some bib reference in the docs if you thought that would help. In general, the stop list is an accumulator of values that you want to stop after the first one.
The type signature is fairly complex.
I agree. I'd written this off initially for this reason, but other folks on the list seem to be fine with it.
I'm not even convinced that nubWith really is a nub function, and not just some generalised filter - filterState perhaps. In which case you'd want to generalise Maybe b to (Bool,b).
Or to Either b b? Which is preferred in this case?
Can you ever imagine anyone other than nubOrd using nubWith?
Yes. As discussed previously you can get a significant constant factor speedup with nubInt on IntSets if you need it. Also, as discussed previously, you can get a speedup for nubChar by using a mutable array as the stop list. One can also imagine using this in less traditional ways; for example nondescendingSubsequence [] = [] nondescendingSubsequence (e : es) = e : nubWith (\a b-> if (a >= b) then Just a else Nothing) e es
nondescendingSubsequence [1,2,3,2,3,3,4,1,1] [1,2,3,3,3,4]
Is it a genuine utility function people have been crying out for?
IMHO it is a "genuine utility function", whatever that means. It certainly isn't something "people have been crying out for", so if that's the criterion we should omit it. It just seems churlish to hide it, given that it's sitting in there being a perfectly useful building block for things people do cry out for. I'd guess it would be used about as often as "break", which is in there for much the same reason.
It seems perfectly good to include as a local function in Data.Set to be used to implement nubOrd, but I don't see its value as a standalone export from a very commonly used library full of very useful stuff.
We could move it to live in Data.Set with nubOrd if folks think that it doesn't belong in Data.List. Or we could move them both into their own module, although that seems silly to me given that I can't really think what else goes in there.
So, in summary I think nubOrd is great, and nubWith isn't nub, and isn't great.
Thanks hugely for the detailed commentary! Please understand that I'm in no way wedded to the idea of getting any of this in; I want to do what works best for everyone, and am grateful for your and everyone's help in figuring out what that is. Bart Massey bart@cs.pdx.edu