Computing lazy and strict list operations at the same time

newer
Re: [Haskell-cafe] Optimizing a...

C Rodrigues

19 Jun 2006 19 Jun '06

3:24 p.m.

Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function? There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list. init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = [] last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

Show replies by date

Robert Dockins

19 Jun 19 Jun

3:59 p.m.

On Jun 19, 2006, at 11:24 AM, C Rodrigues wrote:

...

Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

initlast :: [a] -> ([a],a) initlast (x:xs) = f x xs id where f x (y:ys) g = f y ys (g . (x:)) f x [] g = (g [],x) Its within the letter, if maybe not the spirit of the rules. The accumulated function could arguably be considered to be traversing the list again. FYI, the technique is a fairly well known one for overcoming the quadratic behavior of repeated (++). Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Jerzy Karczmarczuk

4:21 p.m.

...

On Jun 19, 2006, at 11:24 AM, C Rodrigues wrote:

...
Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once?

After Robert Dockins, Jon Fairbarn and Duncan Coutts, one more, nothing original...: initlast (x:xs) = inl x xs where inl x (a:as) = (x:q,y) where (q,y) = inl a as inl x [] = ([],x) Such tricks become your second nature, when you take the solution (lazy) of the "repmin" problem by Richard Bird, you put it under your pillow, and sleep for one week with your head close to it. Jerzy Karczmarczuk

Joel Reymont

4:25 p.m.

Where's the solution and what is the repmin problem? On Jun 19, 2006, at 5:21 PM, Jerzy Karczmarczuk wrote:

...

Such tricks become your second nature, when you take the solution (lazy) of the "repmin" problem by Richard Bird, you put it under your pillow, and sleep for one week with your head close to it.

-- http://wagerlabs.com/

jerzy.karczmarczuk＠info.unicaen.fr

5:54 p.m.

New subject: Computing lazy and strict list operations at the same time

Joel Reymont writes:

...

Where's the solution and what is the repmin problem?

On Jun 19, 2006, at 5:21 PM, Jerzy Karczmarczuk wrote:

...
Such tricks become your second nature, when you take the solution (lazy) of the "repmin" problem by Richard Bird, you put it under your pillow, and sleep for one week with your head close to it.

Well, the Functionalist True Lazy Church considers this to be a part of the Holy Scriptures... R.S. Bird. Using circular programs to eliminate multiple traversals of data. Acta Informatica, 21, pp. 239--250, 1984. Traverse a binary tree ONCE, and replace all the elements by the minimum of all leaves (i.e., construct a new tree, topologically equivalent, but with all leaf nodes being the minimum value within the original source. A one pass algorithm postpones the binding of an argument until the minimum is found... data Tree a = L a | B (Tree a) (Tree a) rpMin :: (Tree Int, Int) -> (Tree Int, Int) rpMin (L a, m) = (L m, a) rpMin (B l r, m) = let (l', ml) = rpMin (l, m) (r', mr) = rpMin (r, m) in (B l' r', ml `min` mr) replaceMin :: Tree Int -> Tree Int replaceMin t = let (t', m) = rpMin (t, m) in t' Google, your not-so-humble friend will find you some dozen references... For example, Levent Erkök: http://www.cse.ogi.edu/PacSoft/projects/rmb/repMin.html Jerzy Karczmarczuk

Jon Fairbairn

4:03 p.m.

On 2006-06-19 at 15:24-0000 "C Rodrigues" wrote:

...

Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

il [] = error "foo" il [x] = ([], x) il (x:xs) = cof x (il xs) where cof x ~(a,b) = (x:a, b) -- ! Should do it, I think. -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

Duncan Coutts

4:50 p.m.

New subject: Computing lazy and strict list operations at the same time

On Mon, 2006-06-19 at 17:03 +0100, Jon Fairbairn wrote:

...

On 2006-06-19 at 15:24-0000 "C Rodrigues" wrote:

...
Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

il [] = error "foo" il [x] = ([], x) il (x:xs) = cof x (il xs) where cof x ~(a,b) = (x:a, b) -- !

...

From a quick test, it looks like none of our suggested solutions actually work in constant space.

main = interact $ \s -> case il s of (xs, x) -> let l = length xs in l `seq` show (l,x) using ghc: ghc -O foo.hs -o foo ./foo +RTS -M10m -RTS < 50mb.data using runhugs: runhugs foo.hs < 50mb.data in both cases and for each of the three solutions we've suggested the prog runs out of heap space where the spec asked for constant heap use. So what's wrong? In my test I was trying to follow my advice that we should consume the init before consuming the last element. Was that wrong? Is there another way of consuming the result of initlast that will work in constant space? Note that by changing discarding the x we do get constant space use: main = interact $ \s -> case il s of (xs, x) -> let l = length xs in l `seq` show l -- rather than 'show (l,x)' Why does holding onto 'x' retain 'xs' (or the initial input or some other structure with linear space use)? Duncan

Robert Dockins

5:28 p.m.

On Jun 19, 2006, at 12:50 PM, Duncan Coutts wrote:

...

On Mon, 2006-06-19 at 17:03 +0100, Jon Fairbairn wrote:

...
On 2006-06-19 at 15:24-0000 "C Rodrigues" wrote:

...
Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

il [] = error "foo" il [x] = ([], x) il (x:xs) = cof x (il xs) where cof x ~(a,b) = (x:a, b) -- !

...
From a quick test, it looks like none of our suggested solutions actually work in constant space.

...

main = interact $ \s -> case il s of (xs, x) -> let l = length xs in l `seq` show (l,x)

using ghc: ghc -O foo.hs -o foo ./foo +RTS -M10m -RTS < 50mb.data

using runhugs: runhugs foo.hs < 50mb.data

in both cases and for each of the three solutions we've suggested the prog runs out of heap space where the spec asked for constant heap use.

Actually, the OP asked for constant stack space, which is quite different and much easier to achieve.

...

So what's wrong? In my test I was trying to follow my advice that we should consume the init before consuming the last element. Was that wrong? Is there another way of consuming the result of initlast that will work in constant space?

That is, nonetheless, an interesting question.

...

Note that by changing discarding the x we do get constant space use: main = interact $ \s -> case il s of (xs, x) -> let l = length xs in l `seq` show l -- rather than 'show (l,x)'

Why does holding onto 'x' retain 'xs' (or the initial input or some other structure with linear space use)?

Duncan

Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Andrew Pimlott

14 Jul 14 Jul

8:10 p.m.

On Mon, Jun 19, 2006 at 05:50:13PM +0100, Duncan Coutts wrote:

...

On Mon, 2006-06-19 at 17:03 +0100, Jon Fairbairn wrote:

...
il [] = error "foo" il [x] = ([], x) il (x:xs) = cof x (il xs) where cof x ~(a,b) = (x:a, b) -- !

From a quick test, it looks like none of our suggested solutions actually work in constant space.

main = interact $ \s -> case il s of (xs, x) -> let l = length xs in l `seq` show (l,x)

I was hoping to have enlightenment served to me, but since nobody has explained this, I took a crack at it. I still can't explain it, but I got some data that maybe somebody else will understand. My code: initlast :: [a] -> ([a], a) initlast [x] = ([], x) initlast (x:xs) = let (init, last) = initlast xs in (x:init, {-# SCC "last" #-} last) lenshow n (_:xs) last = let n1 = n+1 in n1 `seq` lenshow n1 xs last lenshow n [] last = show (n,last) main = interact $ \s -> case initlast s of (xs, x) -> lenshow 0 xs x lenshow is just "show (length xs, x)", written out so I can tweak it later. This exhibits the runaway space usage with a large input that Duncan described. If you throw away "last" in lenshow and just "show n", it runs in constant space. It seems that the reference to "last" that I annotated as a cost center is holding a chain of trivial thunks--trivial because "last" is just being copied from the result of the recursive call to initlast. I thought maybe I could get rid of them by returning an unboxed tuple from initlast, but this turned out to make no difference. Profiling gave a couple hints. Retainer set profiling (-hr) showed the retainer set holding all the memory was {} I think this confirms that last holding a chain of thunks. I'm still surprised that ghc doesn't see that they're trivial. It feels like it should be an easy optimization. Constructor and type profiling (-hd and -hy) both show the memory held by stg_ap_1_upd_info. I don't know what that means. Most frustrating, I can't find any work around: No matter how I tried to write initlast, it had the same leak when consumed this way. (NB: functional implementations only need apply.) Granted, I can't think of any good reason to code in this style, but it's hard for me to accept that it should be impossible. Finally, here is a (silly) version that doesn't leak: initlast :: [a] -> ([a], [a]) initlast [x] = ([], [x]) initlast (x:xs) = let (init, last) = initlast xs in (x:init, undefined:last) lenshow n (_:xs) (_:ls) = let n1 = n+1 in n1 `seq` lenshow n1 xs ls lenshow n [] [last] = show (n,last) This is the first case I recall in which adding more constructors makes a space leak go away, because it gives me something to force (vis "_:ls). With the original implementation, there was no way to "partly force" last, one thunk at a time. Have others used this technique? Andrew

Andrew Pimlott

27 Sep 27 Sep

3:21 a.m.

This is a follow-up to a thread from June-July[1]. The question was how to write the function initlast :: [a] -> ([a], a) initlast xs = (init xs, last xs) so that it can be consumed in fixed space: main = print $ case initlast [0..1000000000] of (init, last) -> (length init, last) Attempts were along the lines of initlast :: [a] -> ([a], a) initlast [x] = ([], x) initlast (x:xs) = let (init, last) = initlast xs in (x:init, last) I seemed obvious to me at first (and for a long while) that ghc should force both computations in parallel; but finally at the hackathon (thanks to Simon Marlow) I realized I was expecting magic: The elements of the pair are simply independent thunks, and there's no way to "partly force" the second (ie, last) without forcing it all the way. Simon Peyton Jones graciously offered that it is "embarrassing" that we can't write this in Haskell, so to make him less embarrassed (and despite my adamance on the mailing list that the implementation be functional), I wrote an imperative version with the desired space behavior. Borrowing the insight that unsafePerform and unsafeInterleave can be thought of as hooks into the evaluator, this shows more or less what I would wish for ghc to do automatically. initlastST :: [a] -> ([a], a) initlastST xs = runST (m xs) where m xs = do r <- newSTRef undefined init <- init' r xs last <- unsafeInterleaveST (readSTRef r) return (init, last) init' r [x] = do writeSTRef r x return [] init' r (x:xs) = do writeSTRef r (last xs) liftM (x:) (unsafeInterleaveST (init' r xs)) Andrew [1] http://haskell.org/pipermail/haskell-cafe/2006-June/016171.html http://haskell.org/pipermail/haskell-cafe/2006-July/016709.html

Brandon Moore

7:38 a.m.

Andrew Pimlott wrote:

...

This is a follow-up to a thread from June-July[1]. The question was how to write the function

initlast :: [a] -> ([a], a) initlast xs = (init xs, last xs)

so that it can be consumed in fixed space:

main = print $ case initlast [0..1000000000] of (init, last) -> (length init, last)

Attempts were along the lines of

initlast :: [a] -> ([a], a) initlast [x] = ([], x) initlast (x:xs) = let (init, last) = initlast xs in (x:init, last)

I seemed obvious to me at first (and for a long while) that ghc should force both computations in parallel; but finally at the hackathon (thanks to Simon Marlow) I realized I was expecting magic: The elements of the pair are simply independent thunks, and there's no way to "partly force" the second (ie, last) without forcing it all the way. According to the stuff about "selector thunks", it seems this should work

initlast [x] = ([],[x]) initlast (x:xs) = let ~(init,last) = initlast xs in (x:init, last) It does, at least when I build with -ddump-simpl. Other builds, I get a program that overflows. Attached is a heap profile for a run with the main (10M rather than 100M as above - that just takes too long) main = print $ case initlast [0..100000000] of (init, last) -> (length init, last) Brandon

Claus Reinke

2:30 p.m.

New subject: Computing lazy and strict list operations at thesame time

...

This is a follow-up to a thread from June-July[1]. The question was how to write the function

initlast :: [a] -> ([a], a) initlast xs = (init xs, last xs)

so that it can be consumed in fixed space:

main = print $ case initlast [0..1000000000] of (init, last) -> (length init, last)

if space is the main issue, you could try to avoid the sharing (so that each part of the computation unfolds and throws away its own copy of the input list): initlast :: (()->[a]) -> ([a], a) initlast xs = (init (xs ()), last (xs ())) main = print $ case initlast (\()->[0..1000000000]) of (init, last) -> (length init, last) hth, claus ps. it would occasionally be nice to be able to undo sharing instead of having to avoid it, by asking for an independent, equally unevaluated, copy of some expression. that would make it easy to write initlast with your original type. pps. compiling with -O recreates the space problem? (ghc-6.5)

Brandon Moore

3:59 p.m.

Andrew Pimlott wrote:

...

This is a follow-up to a thread from June-July[1]. The question was how to write the function

initlast :: [a] -> ([a], a) initlast xs = (init xs, last xs)

so that it can be consumed in fixed space:

main = print $ case initlast [0..1000000000] of (init, last) -> (length init, last)

Attempts were along the lines of

initlast :: [a] -> ([a], a) initlast [x] = ([], x) initlast (x:xs) = let (init, last) = initlast xs in (x:init, last)

I seemed obvious to me at first (and for a long while) that ghc should force both computations in parallel; but finally at the hackathon (thanks to Simon Marlow) I realized I was expecting magic: The elements of the pair are simply independent thunks, and there's no way to "partly force" the second (ie, last) without forcing it all the way. According to the stuff about "selector thunks", it seems this should work

initlast [x] = ([],[x]) initlast (x:xs) = let ~(init,last) = initlast xs in (x:init, last) Sometimes it does compile to a program that runs in constant space, sometimes it doesn't! I've sent a message to the list with a heap profile of a run on 10M numbers, but it's being held for moderation because it's too big. Brandon

Duncan Coutts

19 Jun 19 Jun

4:06 p.m.

New subject: Computing lazy and strict list operations at the same time

On Mon, 2006-06-19 at 15:24 +0000, C Rodrigues wrote:

...

Here's a puzzle I haven't been able to solve. Is it possible to write the initlast function?

There are functions "init" and "last" that take constant stack space and traverse the list at most once. You can think of traversing the list as deconstructing all the (:) [] constructors in list.

init (x:xs) = init' x xs where init' x (y:ys) = x:init' y ys init' _ [] = []

last (x:xs) = last' x xs where last' _ (y:ys) = last' y ys last' x [] = x

Now, is there a way to write initlast :: [a] -> ([a], a) that returns the result of init and the result of last, takes constant stack space, and traverses the list only once? Calling reverse traverses the list again. I couldn't think of a way to do it, but I couldn't figure out why it would be impossible.

initlast :: [a] -> ([a],a) initlast [x] = ([], x) initlast (x:xs) = (x : xs', x') where (xs', x') = initlast xs It depends how you use it I think. If you look at the last element immediately then you'll get a linear collection of thunks for the init. However if you consume the init and then look at the last then I think that will use constant space. Duncan

C Rodrigues

20 Jun 20 Jun

2:58 a.m.

New subject: Computing lazy and strict list operations atthe same time

Ah, thanks for the replies. I like the approach that uses lazy tuples of intermediate values because it has a recognizable similarity to the original two functions.

6852

Age (days ago)

6952

Last active (days ago)

List overview

Download

14 comments

10 participants

participants (10)

Andrew Pimlott
Brandon Moore
C Rodrigues
Claus Reinke
Duncan Coutts
Jerzy Karczmarczuk
jerzy.karczmarczuk＠info.unicaen.fr
Joel Reymont
Jon Fairbairn
Robert Dockins