
On Tuesday 25 May 2010 17:30:25, Markus Läll wrote:
Here is an ugly one:
<moved below>
Besides the long names, could this be done somehow shorter?
In particular, it could be done faster. In fact, empirically, this seems to be rather O(n^f(n)), where f is an increasing function with values > 1: (as usual, all code compiled with -O2, performance characteristics can be very different if interpreted or compiled without optimisations) --------------------------------------------------------------------- Prelude WithoutBiggest> testDec remLargest2 100000 1 (0.09 secs, 15961808 bytes) Prelude WithoutBiggest> testDec remLargest2 200000 1 (0.23 secs, 29848872 bytes) Prelude WithoutBiggest> testDec remLargest2 400000 1 (0.68 secs, 59199984 bytes) Prelude WithoutBiggest> testDec remLargest2 800000 1 (2.39 secs, 118963628 bytes) Prelude WithoutBiggest> testDec remLargest2 1600000 1 (8.89 secs, 237443716 bytes) Prelude WithoutBiggest> testAsc remLargest2 100000 99999 (0.09 secs, 15426136 bytes) Prelude WithoutBiggest> testAsc remLargest2 200000 199999 (0.24 secs, 29851428 bytes) Prelude WithoutBiggest> testAsc remLargest2 400000 399999 (0.69 secs, 59200792 bytes) Prelude WithoutBiggest> testAsc remLargest2 800000 799999 (2.38 secs, 118437128 bytes) Prelude WithoutBiggest> testAsc remLargest2 1600000 1599999 (8.95 secs, 236917064 bytes) Prelude WithoutBiggest> testBounce remLargest2 100000 -25501 (0.10 secs, 18231012 bytes) Prelude WithoutBiggest> testBounce remLargest2 200000 29107 (0.28 secs, 32470584 bytes) Prelude WithoutBiggest> testBounce remLargest2 400000 -21646 (0.82 secs, 64440016 bytes) Prelude WithoutBiggest> testBounce remLargest2 800000 -29987 (2.83 secs, 128922424 bytes) Prelude WithoutBiggest> testBounce remLargest2 1600000 -6415 (10.62 secs, 256836988 bytes) --------------------------------------------------------------------- Whereas remLargest shows more or less linear behaviour -------------------------------------------------------------------- Prelude WithoutBiggest> testDec remLargest 100000 1 (0.03 secs, 6817632 bytes) Prelude WithoutBiggest> testDec remLargest 200000 1 (0.06 secs, 13109604 bytes) Prelude WithoutBiggest> testDec remLargest 1600000 1 (0.60 secs, 102758712 bytes) Prelude WithoutBiggest> testAsc remLargest 100000 99999 (0.02 secs, 9715556 bytes) Prelude WithoutBiggest> testAsc remLargest 200000 199999 (0.03 secs, 19286292 bytes) Prelude WithoutBiggest> testAsc remLargest 1600000 1599999 (0.23 secs, 154411292 bytes) Prelude WithoutBiggest> testBounce remLargest 100000 -25501 (0.03 secs, 10369436 bytes) Prelude WithoutBiggest> testBounce remLargest 200000 29107 (0.08 secs, 17137072 bytes) Prelude WithoutBiggest> testBounce remLargest 1600000 -6415 (0.65 secs, 123919124 bytes) --------------------------------------------------------------------- and is only a little slower (except for ascending lists, where it's more than a little) than a faster implementation of your algorithm: --------------------------------------------------------------------- Prelude WithoutBiggest> testDec remLargest3 1600000 1 (0.52 secs, 174594652 bytes) Prelude WithoutBiggest> testAsc remLargest3 1600000 1599999 (0.08 secs, 115863632 bytes) Prelude WithoutBiggest> testBounce remLargest3 1600000 -6415 (0.60 secs, 193472764 bytes) --------------------------------------------------------------------- Except for ascending lists, the order-changing algorithm is *much* faster, however: Prelude WithoutBiggest> testDec withoutLargest 1600000 1 (0.16 secs, 116646100 bytes) Prelude WithoutBiggest> testAsc withoutLargest 1600000 1599999 (0.12 secs, 116352076 bytes) Prelude WithoutBiggest> testBounce withoutLargest 1600000 -6415 (0.21 secs, 134971508 bytes)
The idea of it is to carry the maximum in 'previous' and compare it with every element when recursing the list. When recursion reaches the end, it starts to return, and on every step it tells the previous step if there was something bigger down it's road of recursion or not. This way every step knows if to drop its 'current' element -- this drop happens only once.
So the steps it takes are defenitely 2n, because it rolls out, and then has to return all the way -- even to get the first element (because for it theres the question: "drop it or not?").
But these steps are comparatively expensive. And you must be careful not to be too strict, so you can start bulding the result before you've reached the end.
The order-not-retaining functions thus far are faster, taking only n steps.
Well, 'far' is relative.
The performance of Daniels remLargest depends on the order of elements in the list: best case is if the list is grows like (=<), so there's no use of an accumulator and concatenation: worst case is when the list is composed of strictly descending lists -- then the time it takes is 3n (n for traversing the list, n for reversing all sublists and n for concatenating).
Actually, there's no big difference between strictly descending lists and sufficiently bouncy lists. And it can be made a tiny bit faster by using go post mx (y:ys) | mx < y = mx : foldl (flip (:)) (go [] y ys) post instead of "mx : reverse post ++ go [] y ys" Let's look a bit at your code now.
remLargest2 [] = [] remLargest2 (li:st) = if something_bigger_in_tail then (li:result) else result where ismax [] previous = ([], False) ismax (current:rest) previous = case (current_is_bigger_than_previous, but_something_even_bigger_in_tail) of (True, True) -> (current:newRest, True) (False, True) -> (current:newRest, True) (False, False) -> (current:newRest, False) (True, False) -> ( newRest, True) -- current -- is the biggest, lets leave it out
That 'case' is bad. Pattern matching on both Bools at once means you can't do anything until the second is available. Since that comes from another call to ismax, which again pattern matches on two Bools, you must wait until the recursion has reached the end of the list. In particular, you get nothing (but a heap-exhaustion) for infinite lists (the lazier variant produces the initial segment before the maximum fine and only hangs when that is reached). When the end is reached, the recursion is unwound, thunk after thunk is converted to a list-node (except for the maximum). I think the space behaviour is responsible for the worse than linear running time.
where f = ismax rest current_is_bigger_than_previous = current > previous (newRest, but_something_even_bigger_in_tail) = if current_is_bigger_than_previous then f current else f previous (result, something_bigger_in_tail) = ismax st li
Another problem with the code is the wrong argument order of ismax, it's better to have the (usually small) argument which will often remain the same in the recursive call first, the (usually larger) list which changes in every call second. So let's make the algorithm lazier: remLargest3 :: Ord a => [a] -> [a] remLargest3 [] = [] remLargest3 (li:st) | somethingBigger = li : result | otherwise = result where (somethingBigger,result) = ismax li st ismax pre (cur:rest) -- We can immediately know whether the current element -- is larger than the previous maximum. If that is the case, -- we can immediately signal to keep the previous elements. | pre < cur = let (evenBigger,newRest) = ismax cur rest in (True, if evenBigger then cur:newRest else newRest) -- Otherwise, we know that the current element is kept. | otherwise = let (evenBigger,newRest) = ismax pre rest in (evenBigger,cur:newRest) ismax _ [] = (False,[]) We now use every bit of information as soon as it is available. For ascending lists, that means that we only traverse the list once, building the result list from front to back - very fast. For decreasing lists, we also build the result list while traversing the list from front to back, although we don't know yet that we don't prepend another item to it. On the way back to the front, we only need to pass False and have no other work until we finally see that we must drop the original head. For bouncy lists, we can deliver chunks whenever we find a new maximum until we find the global maximum, when the traversal pattern becomes that of a decreasing list. A nice variant passing functions instead of Bools, it's equally fast for decreasing and bouncy lists, but for reasons I don't see yet, it's slower for ascending lists: remLargest4 :: Ord a => [a] -> [a] remLargest4 [] = [] remLargest4 (li:st) = li `f` result where ign _ xs = xs (f,result) = ismax li st ismax pre (cur:rest) | pre < cur = ((:),let (g,more) = ismax cur rest in g cur more) | otherwise = let (g,more) = ismax pre rest in (g,cur:more) ismax _ [] = (ign,[])
Cool problem ;-)
Indeed 8-)
And we should do tests!