
Bart Massey
Any suggestions from anyone about how to proceed? Or are we just done at this point? So close...
OK, I've got a new version of nubOrd together. This one is just nubOrd, no nubWith, no nubInt, nothing fancy. There's two variants, nubOrd' and nubOrd''. The latter gets a slight performance win by omitting some work that would preserve the order of output with respect to nub. The basic strategy of nubOrd' is to pull off a prefix l of the input list as long as the current stoplist s, nubSort l, merge l with s to get a new stoplist s', use s in a merge step to filter l, then sort the remainder of l back to the original order. Finally, the resulting l' and s' are prepended to a recursive call on the rest of the list. nubOrd' has roughly the same time complexity as the set-based nubOrd did. This is because each step of nubOrd requires log m time to search the stoplist, and each collection of steps in nubOrd' requires amortized log m time for the sorts. Let's do some side by side comparison: nub nubSort nubOrd nubOrd' nubOrd'' laziness full no full spine spine worst-case O(mn) O(n log n) O(n log m) O(n log m) O(n log m) complexity (m symbols, length n) relative 1x 50x 1.1x 5x 5x runtime with m = 1 absolute 84s 0.14s 0.16s 0.35s 0.21s runtime with m = n = 10^5 dependencies no no Data.Set no no outside (containers) Data.List outputs in yes no yes yes no order of first mention code small tiny smallish wtf wtf-lite complexity This table suggests to me that nubOrd' is viable, assuming spine-laziness is sufficient. It solves the time problems of nub and nubSort, solves the dependency problem of nubOrd, and preserves the order of its outputs. I don't like the 5x slowdown for small m, or the non-laziness on elements, but I guess I am willing to take them to get the other stuff. Maybe someone or the compiler can do some optimization later. The code for nubOrd', in all its hideous glory, is at the end of this post. Comments, corrections and improvements welcome. Bart Massey bart <at> cs.pdx.edu merge :: Ord e => [e] -> [e] -> [e] merge l1 [] = l1 merge [] l2 = l2 merge l1@(e1 : e1s) l2@(e2 : e2s) | e1 < e2 = e1 : merge e1s l2 | e1 > e2 = e2 : merge l1 e2s | otherwise = merge l1 e2s nubOrd' :: Ord e => [e] -> [e] nubOrd' [] = [] nubOrd' (e : es) = e : go [e] es where go _ [] = [] go s l = l1' ++ go s' l2 where (l1, l2) = splitAt (length s) l curl = filterDups $ sort $ zip l1 ([1..] :: [Integer]) s' = merge s (map fst curl) l1' = map fst $ sortBy flipcmp $ stopMerge s curl (a, b) `flipcmp` (c, d) = (b, a) `compare` (d, c) filterDups [] = [] filterDups (s1@(e1, _) : (e2, _) : ss) | e1 == e2 = filterDups (s1 : ss) filterDups (s1 : ss) = s1 : filterDups ss stopMerge _ [] = [] stopMerge [] m2 = m2 stopMerge m1@(e1 : e1s) m2@(s2@(e2, _) : s2s) | e1 < e2 = stopMerge e1s m2 | e1 > e2 = s2 : stopMerge m1 s2s | otherwise = stopMerge m1 s2s