
I wrote:
After running some tests, it seems empirically that you can use m `div` 2 instead of m for lists of even length, which gives you a nice speedup for larger values of m. For lists of prime length, the best you can do seems to be m-1. It's more complicated for lists of non-prime odd length. I can't immediately see why any of that is true.
Well, obviously, that weird behavior has to do with the regularity of my test data. If I randomize more, the results become more believable. It seems that m-1 or m-2 work empirically, but that could just be because the probability of a collision is extremely low. The result for the more regularly shaped test data, where the results have an interesting dependence on prime factorization, is still fascinating. I hope Daniel will comment. :) Here is the more randomized test data (sorry, I really should be using QuickCheck for this instead of doing it by hand): test n k = fmap (all isSorted . map (sortAlmostBy n compare)) $ rands2 k isSorted xs = and . zipWith (<=) xs $ drop 1 xs rands1 d = do cs <- replicateM (2000 `div` d) $ randomRS (0, d-1) fmap concat $ zipWithM mkChunk (scanl (+) 0 cs) cs where mkChunk x y = replicateM y $ randomRS (x, x+y-1) rands2 d = fmap (evalState . replicateM 100 $ rands1 d) newStdGen