Re: [Haskell-cafe] Ultra-newbie Question

18 Sep 2010

      On Saturday 18 September 2010 19:44:38, Jake McArthur wrote:
...
On 09/18/2010 02:51 AM, Christopher Tauss wrote:
...
I am trying to write a function that takes a list and returns the last
n elements.
This may just be for the sake of learning, in which case this is fine,
but usually, needing to do this would be a sign that you are using lists
improperly (since this is a O(n) time operation).
By which he meant O(length list), not the number of elements you're asking 
for.
...
...
Let's call the function n_lastn and, given a list  [1,2,3,4,5], I
would like
n_lastn 3 = [3,4,5]
n_lastn n = reverse . take n . reverse
Which is the most elegant definition, but it's an O(length list) space 
operation (as are all others proposed so far). That will be a problem for 
long lists (consider n_lastn 10 [1 .. 10^8]). You can implement it as an 
O(n) [the number of elements you want] *space* operation, provided that the 
list is generated lazily and not referred to by anything else, but the code 
is decidedly less nice:

The first idea would be to keep a window of n elements and move it to the 
end of the list:

n_lastn n xs =
    case splitAt n xs of
      (ys,[]) -> ys   -- the list contains at most n elements, yay
      (ys,zs) -> loop ys zs
    where
      loop window [] = window
      loop window (v:vs) = loop (tail window ++ [v]) vs

The space behaviour is now fine (if compiled with optimisations), but 
unfortunately, the time complexity has become O(n*length list) because the 
(++)-calls are left-nested:

Suppose n = 4,

loop (1:2:3:4:[]) [5 .. end]
~> loop ((2:3:4:[]) ++ [5]) [6 .. end]
~> loop (2:((3:4:[]) ++ [5])) [6 .. end]
~> loop (((3:4:[]) ++ [5]) ++ [6]) [7 .. end]

The strictness analyser has been smart enough to transform the call to tail 
into a strict pattern match on window, as if we'd written
   loop (_:twindow) (v:vs) = loop (twindow ++ [v]) vs
so the first few tails go fast, but later, we have to go through more 
layers to get at the head of the window to discard it

~> loop ((3:((4:[]) ++ [5])) ++ [6]) [7 .. end]
~> loop (3:(((4:[]) ++ [5]) ++ [6])) [7 .. end]
-- finally!
~> loop ((((4:[]) ++ [5]) ++ [6]) ++ [7]) [8 .. end]
-- bubble the 4 through four layers of parentheses:
~> loop(((4:([] ++ [5])) ++ [6]) ++ [7]) [8 .. end]
~> loop ((4:(([] ++ [5]) ++ [6])) ++ [7]) [8 .. end]
~> loop (4:((([] ++ [5]) ++ [6]) ++ [7])) [8 .. end]
~> loop (((([] ++ [5]) ++ [6]) ++ [7]) ++ [8]) [9 .. end]
-- phew
-- form now on, it's uniform, on each step we have to match an expression

(((([] ++ [a]) ++ [b]) ++ [c]) ++ [d])

against (_:rest)
1. check whether ((([] ++ [a]) ++ [b]) ++ [c]) is empty, for that,
2. check whether (([] ++ [a]) ++ [b]) is empty, for that,
3. check whether ([] ++ [a]) is empty, for that,
4. check whether [] is empty, it is, hence [] ++ [a] = [a],
5. check whether [a] is empty, it's not, it's (a:[]), hence
6. (...) ++ [b] = a:([] ++ [b]), so 2's not empty, and
7. (...) ++ [c] = a:(([] ++ [b]) ++ [c]), so 1's not empty and
8. (...) ++ [d] = a:((([] ++ [b]) ++ [c]) ++ [d])
9. at last, a can be dropped and we get to 
   loop (((([] ++ [b]) ++ [c]) ++ [d]) ++ [e]) remainingList

Blech!

Appending to the end of a list is bad if it leads to left-nested 
parentheses (it's okay if the parentheses are right-nested).
So we might be tempted to keep the window reversed and cons each element to 
the front, dropping the last. No good, removing an element from the end is 
O(length window) too.

One possibility to fix it is to use a 'list-like' type with O(1) appending 
at the end and dropping from the front.
Data.Sequence is such a type,

import qualified Data.Sequence as Seq
import Data.Foldable (toList)
import Data.Sequence (ViewL(..), (|>))

n_lastn' :: Int -> [a] -> [a]
n_lastn' k _
    | k <= 0    = []
n_lastn' k xs =
    case splitAt k xs of
      (ys,[]) -> ys
      (ys,zs) -> go (Seq.fromList ys) zs
    where
      go acc [] = toList acc
      go acc (v:vs) = case Seq.viewl acc of
                        _ :< keep -> go (keep |> v) vs
                        _ -> error "teh impossible jus hapnd"

fixes space and time behaviour. But the constant factors for Sequence are 
larger than those for lists, so we can beat it with lists:

n_lastn :: Int -> [a] -> [a]
n_lastn k _
    | k <= 0  = []
n_lastn k xs =
    case splitAt k xs of
      (ys,[]) -> ys
      (ys,zs) -> go k (reverse ys) zs
    where
      m = k-1
      go _ acc [] = reverse $ take k acc
      go 0 acc (v:vs) = case splitAt m acc of
                          (keep,release) -> release `seq` go k (v:keep) vs
      go h acc (v:vs) = go (h-1) (v:acc) vs

Every k steps, we perform the O(k) operation of removing the last (k+1) 
elements from a (2k)-element list, making the removal from the end an 
amortized O(1) operation.
You can trade some space for speed and clip the window in larger intervals, 
say every (3*k) or every (10*k) steps.

Re: [Haskell-cafe] Ultra-newbie Question

Daniel Fischer