
On Tuesday 27 September 2011, 11:32:35, Yitzchak Gale wrote:
I wrote:
Can you guarantee for some value of m that for each note N, only the first m notes following N might end earlier than N? If so, then the following simple algorithm is linear and runs in constant space. You could then use: sortAlmostBy m (comparing endTime) ...You might be able to do a little better than this.
After running some tests, it seems empirically that you can use m `div` 2 instead of m for lists of even length, which gives you a nice speedup for larger values of m. For lists of prime length, the best you can do seems to be m-1. It's more complicated for lists of non-prime odd length.
I can't immediately see why any of that is true.
Daniel to the rescue perhaps?
I have to confess, I'm not even sure what you mean. Okay, m is the maximal number of following notes that may end earlier than the current. But to which list(s) does the length refer?
test n k = fmap (all isSorted . map (sortAlmostBy n compare)) $ rands k
With what parameters did you call test? Prelude Almost System.Random> test 12 16 True (0.12 secs, 178509776 bytes) Prelude Almost System.Random> test 12 17 False (0.00 secs, 592904 bytes) The only big variation in runtimes I find is due to a violation of the preconditions of sortAlmostBy, when all shortcuts. (But the most of the runtime is used for generating the lists. If I completely evaluate such a list of lists prior to sorting, then m has a measurable impact on time even when all doesn't shortcut, however, in that situation it's basically monotonic in m, as expected).
isSorted xs = and . zipWith (<=) xs $ drop 1 xs
rands d = fmap (take 100 . map (concat . zipWith (map . (+)) [0,d ..] . chunksOf d) . chunksOf 1000 . randomRs (0, d-1)) newStdGen
You generate 100 lists of length 1000, such that the entries at index k*d <= i < (k+1)*d are between k*d (inclusive) and (k+1)*d (exclusive). Now, if you split those lists into chunks of length m, the preconditions of sortAlmostBy are only guaranteed to hold if each of the d-long chunks from the generation meets at most two m-long chunks. That is the case if (and only if, if the entire list is long enough) m + gcd m d >= d If m + gcd m d < d, you will sooner or later encounter a d-chunk meeting three (or more) m-chunks. The probability that the first element of that d-chunk is larger than the last is nearly 1/2 [(d-1)/2d], so it doesn't take many such situations until you get a non-sorted list with sortAlmostBy m. So, if running time is monotonic in m, under the assumption that the preconditions of sortAlmostBy hold, the optimal choice of m is min { m \in N : m + gcd m d >= d } For even d, that's d/2, for d prime it's d-1, generally, it's (1 - 1/p)*d, where p is the smallest prime factor of d. I haven't analysed your merging function, but at first glance, merging c chunks of length m each is O(c*m); so, since sorting each chunk is O(m*log m), it'd overall be O(c*m*log m) = O(n*log m) = O(n*log (n/c)) It has larger overhead than sortBy, so if n is small or m is large, sortBy is likely to be better, but if n is large and m relatively small, sortAlmostBy can be significantly (but not dramatically, excluding extreme cases where you can do much better) faster.