
On 16/10/13 17:46, Yitzchak Gale wrote:
On Sun, Oct 13, 2013 at 6:25 PM, Niklas Hambüchen
wrote: ordNub behaves has the same behaviour as nub, while (Set.toList . Set.fromList) doesn't.
Perhaps you mean that they produce exactly the same output when fully evaluated. But I doubt that their behavior is *exactly* the same in every aspect. Given the differences in strictness between sets and lists, can you prove that nub and nubOrd have *exactly* the same strictness properties in every possible case?
Yes. The elements are are evaluated in the same order as in `nub`. We can see that by comparing the implementations. If you have objects that need not be fully evaluated to use (==) on them, and need not fully evaluated *in a different way* when using `compare` on them, then of course the strictness properties are different. That is why one function has "Eq a =>" and the other one "Ord a =>" at the front.
...ordNub is *always* faster than nub, even for singleton lists.
This is also a claim that I doubt. You say that you tested the case of singletons. How about this:
2 : replicate 100000 1 ++ [3]
Well, you could optimize for that by having nubOrd squeeze runs of of equal elements in the input. But then how about something like this:
[3,2,1] ++ take 100000 (cycle [3,2,1]) ++ [4]
Or variations on that, cycling some other fairly small number of elements. Set containers must do extra comparisons, whereas the of cost running down the first few hundred elements of a linked list (as long as the compiler fuses the iteration down to a tight loop in the output code and you remain within the cache) is near zero. How could nubOrd be as fast as ord in these kinds of cases?
I make the following, really cool proposal: * You add your test cases that you just mentioned to the benchmark list in "ordnub.hs" * run `cabal build` * run `dist/build/ordnub/ordnub -o report.html`. Then you will see that ordNub is, indeed, faster.
(as long as the compiler fuses the iteration down to a tight loop in the output code and you remain within the cache)
Even though a list is conceptually simpler than a set, a list node and a set node are not that different (as I explained in the last email). Both have a content and a pointer to the next node. The only difference is that the Set node is optimised with strictness and unboxing, and the list one is not.
The only advantage of nubOrd is for very large lists where the complexity advantage overtakes the excellent constants of nub to provide better speed.
Sorry, I think this is absolutely invented. What "excellent constants" do you mean? How large are "very large lists"? I find lists of length 1000 a very common case. An `n log2 n` algorithm is already 100 times faster on those than an n² one. I do not dispute that there exist n² algorithms that are faster than others with better complexity for small inputs. nub is not one of them.