Re: [Haskell-beginners] sorting almost sorted list

27 Sep 2011

      On Tuesday 27 September 2011, 18:34:01, Yitzchak Gale wrote:
...
Hi Daniel,
Thanks for clearing up the mystery
about the dependence on prime factorization!
Very interesting.
...
at first glance, merging c
chunks of length m each is O(c*m); so, since sorting each chunk is
O(m*log m), it'd overall be
O(c*m*log m) = O(n*log m) = O(n*log (n/c))
No. In the given use case m is a small constant and
c is proportional to n, so the overall complexity is O(n).
Not a contradiction. If m is a constant (> 1), O(n*log m) = O(n)
(and sorting chunks of length m is O(m*log m) = O(1) then).
...
...
It has larger overhead than sortBy,
That's true. But it's asymptotically better.
Given Dennis' use case, it sounds like
m will probably always be < 10.
Input size will typically be a few hundred,
but could be a few thousand. sortBy
would indeed be just fine in those cases.
...
so if n is small or m is large, sortBy
is likely to be better, but if n is large and m relatively small,
sortAlmostBy can be significantly (but not dramatically,
excluding extreme cases where you can do much better) faster.
At most m will be on the order of 100 if we
need to process all the parts of a large
symphony orchestra combined together
(a very unlikely use case admittedly),
and in that case the input size could be as
much as 10^6, or possibly even more for
something like the Mahler.
Yeah, but log (10^6) is still a fairly small number, so counting reduction 
steps, it wouldn't be dramatically better. However, I forgot to take memory 
usage into account, so in terms of wall-clock (or CPU) time, it could be 
dramatically better even for not too large n and no too small m.
...
(Come to think of it, it's pretty amazing that
Mahler wrote so many notes and actually got
people to play them.)
People do stranger things. People listen to Wagner, for example.
...
There I believe sortAlmostBy could be quite
an improvement.
Certainly. (Just included Data.List.sort in the benchmark, for these short 
lists, it takes less than twice as long as sortAlmostBy)
...
I'd be interested to hear ideas to make it even better though.
mergeAABy :: (a -> a -> Ordering) -> [[a]] -> [a]
mergeAABy cmp ((x:xs):(y:ys):zss) = foo x xs y ys zss
  where
    foo u us v vs wss =
      case cmp u v of
        GT -> v : case vs of
                    (b:bs) -> foo u us b bs wss
                    [] -> u : us ++ mergeAABy cmp wss
        _ -> u : case us of
                   (c:cs) -> foo c cs v vs wss
                   [] -> case wss of
                           ((w:ws):kss) -> foo v vs w ws kss
                           _ -> v : vs
mergeAABy _ [xs] = xs
mergeAABy _ _ = []

seems to be a bit faster (I have made the lists longer, substituted 2000 by 
5000):

dafis@schwartz:~/Haskell/BeginnersTesting> ./benchAlmost -nis 20 -kis 20
(20,20)  -- print parameters to make sure they're evaluated
284090563  -- print sum of all random Ints to make sure they're evaluated
warming up
estimating clock resolution...
mean is 2.183860 us (320001 iterations)
found 41404 outliers among 319999 samples (12.9%)
  2443 (0.8%) low severe
  38961 (12.2%) high severe
estimating cost of a clock call...
mean is 50.65112 ns (14 iterations)
found 2 outliers among 14 samples (14.3%)
  2 (14.3%) high mild

benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 6.537795 s
mean: 67.05025 ms, lb 66.59363 ms, ub 67.62786 ms, ci 0.950
std dev: 2.616184 ms, lb 2.179318 ms, ub 3.100722 ms, ci 0.950

benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 5.566311 s
mean: 56.91069 ms, lb 56.54846 ms, ub 57.37602 ms, ci 0.950
std dev: 2.100948 ms, lb 1.722233 ms, ub 2.585056 ms, ci 0.950

With n = k = 50:

benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 7.366776 s
mean: 75.63603 ms, lb 75.17202 ms, ub 76.20199 ms, ci 0.950
std dev: 2.617970 ms, lb 2.227238 ms, ub 3.040563 ms, ci 0.950

benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 6.517696 s
mean: 68.00708 ms, lb 67.26089 ms, ub 69.04105 ms, ci 0.950
std dev: 4.463628 ms, lb 3.484169 ms, ub 5.950722 ms, ci 0.950

n = k = 100:

benchmarking Yitz
collecting 100 samples, 1 iterations each, in estimated 9.756303 s
mean: 87.39906 ms, lb 86.90397 ms, ub 87.99232 ms, ci 0.950
std dev: 2.763955 ms, lb 2.382008 ms, ub 3.168909 ms, ci 0.950

benchmarking Mine
collecting 100 samples, 1 iterations each, in estimated 7.505393 s
mean: 77.05801 ms, lb 76.51954 ms, ub 77.74727 ms, ci 0.950
std dev: 3.117756 ms, lb 2.594126 ms, ub 3.769922 ms, ci 0.950

Re: [Haskell-beginners] sorting almost sorted list

Daniel Fischer