producing and consuming lists

Jorge Adriano

5 Nov 2002 5 Nov '02

4:04 p.m.

I might have been not very clear in my last mail. I decided to post again, and go straight to the point, with some small examples. Consider the following function streams. streams :: (Int->Bool, Int->Bool)->(Int, Int)->([Int],[Int]) streams (p,q) (x,y) = (xs',ys') where (xs,ys) = streams (p,q) ((x+y),(y-x)) xs' = if p x then x:xs else xs ys' = if q y then y:xs else ys - produces a pair of ('infinite') lists - produced lists are not indepentent (you need to calculate elements of one list to calculate elements of the other) - in each recursive call an element is added to the 1st/2nd list iff it satisfies a given (Int->Bool) function p/q How should one consume (part of) both lists, avoiding space leaks? A particular example of consuming both lists might be writing them to files: main :: IO() main = do let (s1,s2)=stream ... -- stream applied to some arguments (p,q) (x,y) p' = ... q' = ... writeFile "f1.txt" (show$ takeWhile p' s1) writeFile "f2.txt" (show$ takeWhile q' s2) In this example all elements of s2 required to evaluate (takeWhile p' s1) are kept in memory, until the first file is writen. Notice that writing one element from s1 and one from s2 successively might still cause space leaks to arise. Fusing the consuming functions with the producer is a possible, but IMO dirty, way out. If my question doesn't seem to make sense for any reason, please tell me, maybe I am missing something obvious here. Thanks, J.A.

Show replies by date

Tom Pledger

5 Nov 5 Nov

11:09 p.m.

Jorge Adriano writes: : | streams :: (Int->Bool, Int->Bool)->(Int, Int)->([Int],[Int]) | streams (p,q) (x,y) = (xs',ys') | where | (xs,ys) = streams (p,q) ((x+y),(y-x)) | xs' = if p x then x:xs else xs | ys' = if q y then y:xs else ys | | | - produces a pair of ('infinite') lists | - produced lists are not indepentent (you need to calculate elements of one | list to calculate elements of the other) | - in each recursive call an element is added to the 1st/2nd list iff it | satisfies a given (Int->Bool) function p/q | | How should one consume (part of) both lists, avoiding space leaks? I think you have a choice between risking a space leak and repeating evaluation. If you use 'streams' like this let (xs, ys) = streams ... in -- anything which requires the first element of xs while ys may -- still be used in future (or vice versa) and p (or q, respectively) rejects the first trillion numbers it sees, you create a huge trail of 'if' thunks, which can't be garbage collected because you're still holding ys (or xs, respectively). If you do the following let xs = fst (streams ...) ys = snd (streams ...) in ... and somehow suppress Common Subexpression Elimination, I think the garbage collector will remove the trails of thunks, but the x+y and x-y and pair construction and matching will be done up to twice as many times as in the other version. Regards, Tom

Hal Daume III

11:59 p.m.

Hrm. This is interesting. So one option you already considered would be to put the writing inside 'streams', which probably should be disprefered. Have you considered doing something like: streams :: (Int -> Bool, Int -> Bool) -> (Int,Int) -> [(Maybe Int,Maybe Int)] streams (p,q) (x,y) | p x && p y = (Just x , Just y ) : xs | p x = (Just x , Nothing) : xs | p y = (Nothing, Just y ) : xs | otherwise = xs where xs = streams (p,q) ((x+y),(y-x)) With this setup, I think you can write your own writefile function which looks something like: writeTwoFiles f1 f2 (p',q') stream = do h1 <- openFile f1 WriteMode h2 <- openFile f2 WriteMode writeFiles' h1 h2 stream hClose h1 hClose h2 where writeFiles' h1 h2 ((Just x,Just y):xs) | p' x && p' y = do hPutStr h1 $ show x hPutStr h2 $ show y writeFiles' h1 h2 xs | p' x = do hPutSTr h1 $ show x writeFiles' h1 h2 (zip (map fst xs) (repeat Nothing) | p' y = do hPutSTr h2 $ show y writeFiles' h1 h2 (zip (repeat Nothing) (map snd xs)) | otherwise = return () writeFiles' h1 h2 ((Just x,Nothing):xs) = ... where you essentially ignore the nothings. I think, but I'm not sure, that this will allow the old stuff to be garbage collected. In practice, you don't get too much useless junk generated because we don't append the (Nothing,Nothing) pair to the list (erm, "prepend"). But what's more important, I think you only evaluate the same amount of each at any given time, thus allowing GC to gobble up the old stuff. An expert might be able to prove me wrong, though, or you could try this and profile it and see if it works or not :) - Hal -- Hal Daume III "Computer science is no more about computers | hdaume@isi.edu than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On Tue, 5 Nov 2002, Jorge Adriano wrote:

...

I might have been not very clear in my last mail. I decided to post again, and go straight to the point, with some small examples.

Consider the following function streams.

streams :: (Int->Bool, Int->Bool)->(Int, Int)->([Int],[Int]) streams (p,q) (x,y) = (xs',ys') where (xs,ys) = streams (p,q) ((x+y),(y-x)) xs' = if p x then x:xs else xs ys' = if q y then y:xs else ys

- produces a pair of ('infinite') lists - produced lists are not indepentent (you need to calculate elements of one list to calculate elements of the other) - in each recursive call an element is added to the 1st/2nd list iff it satisfies a given (Int->Bool) function p/q

How should one consume (part of) both lists, avoiding space leaks?

A particular example of consuming both lists might be writing them to files: main :: IO() main = do let (s1,s2)=stream ... -- stream applied to some arguments (p,q) (x,y) p' = ... q' = ... writeFile "f1.txt" (show$ takeWhile p' s1) writeFile "f2.txt" (show$ takeWhile q' s2)

In this example all elements of s2 required to evaluate (takeWhile p' s1) are kept in memory, until the first file is writen. Notice that writing one element from s1 and one from s2 successively might still cause space leaks to arise. Fusing the consuming functions with the producer is a possible, but IMO dirty, way out.

If my question doesn't seem to make sense for any reason, please tell me, maybe I am missing something obvious here. Thanks, J.A.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jorge Adriano

6 Nov 6 Nov

1:01 a.m.

...

I think you have a choice between risking a space leak and repeating evaluation.

Well, sometimes neither is a good option...

...

If you use 'streams' like this

let (xs, ys) = streams ... in -- anything which requires the first element of xs while ys may -- still be used in future (or vice versa)

and p (or q, respectively) rejects the first trillion numbers it sees, you create a huge trail of 'if' thunks, which can't be garbage collected because you're still holding ys (or xs, respectively).

Yeap, exactly Even repeatedly processing one element of xs and one of ys, can still cause memory leaks - it happens evaluation of an elements of one of the lists causes evaluation of lots of elements of the other. One way out in my example, is - *always* adding elements to both lists in each recursive call. - processing lists alternating between them like I just said. streams :: (Int->Bool, Int->Bool)->(Int,Int)->([Maybe Int],[Maybe Int]) streams (p,q) (x,y) = (xs',ys') where (xs,ys) = streams (p,q) ((x+y),(y-x)) xs' = if p x then (Just x:xs) else (Nothing:xs) ys' = if q y then (Just y:xs) else (Nothing:ys) (I could, of course, return a list of pairs in this case) This might not be feaseble in more general cases though.

...

If you do the following

let xs = fst (streams ...) ys = snd (streams ...) in ... and somehow suppress Common Subexpression Elimination, I think the garbage collector will remove the trails of thunks, but the x+y and x-y and pair construction and matching will be done up to twice as many times as in the other version.

Like I said, sometimes that is not really an option... In my example, it doesn't really matter in which order the lists are consumed, I was hoping you could take advantage of that in some way... Thanks, J.A.

Jorge Adriano

1:32 a.m.

...

streams :: (Int -> Bool, Int -> Bool) -> (Int,Int) -> [(Maybe Int,Maybe Int)] streams (p,q) (x,y)

| p x && p y = (Just x , Just y ) : xs | p x = (Just x , Nothing) : xs | p y = (Nothing, Just y ) : xs | otherwise = xs

where xs = streams (p,q) ((x+y),(y-x))

...

With this setup, I think you can write your own writefile function which looks something like: I think, but I'm not sure, that this will allow the old stuff to be garbage collected. In practice, you don't get too much useless junk generated because we don't append the (Nothing,Nothing) pair to the list Nice observation, I missed that one since I didn't paired both lists :) But like I just showed, sometimes paring them may not be a natural approach

Yeap, in fact I thought about it when answering Toms answer. Doesn't seem good enough in more general cases though, Like: streams :: (Int->Bool, Int->Bool)->(Int, Int)->([Int],[Int]) streams (p,q) (x,y) = (xs',ys') where (xs,ys) = streams (p,q) ((x+y),(y-x)) xs' = if p x then x:xs else zs++xs <------- ys' = if q y then y:xs else ys zs = some_other_stream <---------- Or: streams :: (Int->Bool, Int->Bool)->(Int, Int)->([Int],[Int]) streams (p,q) (x,y) = (xs',ys') where (xs,ys) = streams (p,q) ((x+y),(y-x)) xs' = if p x then x:xs else zs -- <------------ ys' = if q y then y:xs else ws -- <----------- zs = ... some other_stream -- <--- ws= ... yet_another_stream -- <--- though...

...

(erm, "prepend"). But what's more important, I think you only evaluate the same amount of each at any given time, thus allowing GC to gobble up the old stuff. Exactly

...

An expert might be able to prove me wrong, though, or you could try this and profile it and see if it works or not :) I haven't tested it yet either but seems to me like it should work, in this particular example.

I was hoping I could somehow take advantage of the fact that the order in which I want to consume the lists doesn't matter. I thought about concurrency but it doesn't seem to work. J.A.

Hal Daume III

4:05 p.m.

Jorge,

...

But like I just showed, sometimes paring them may not be a natural approach though...

Yeah, I understand what you mean. In the examples you give, you could always try to make the appended stuff (the zs, etc) the same length by appending Nothings, but probalby not a general solution. I wonder if, given you original function which returns ([Int],[Int]), you couldn't do something really ugly like using unsafeInterleaveIO on writing the first list, and then write the second list "by hand". Just a thought. - Hal

Jorge Adriano

5:48 p.m.

...

...
But like I just showed, sometimes paring them may not be a natural approach though...

Yeah, I understand what you mean. In the examples you give, you could always try to make the appended stuff (the zs, etc) the same length by appending Nothings, but probalby not a general solution.

Yes maybe, I was hoping I could find a more elegant solution though.

...

I wonder if, given you original function which returns ([Int],[Int]), you couldn't do something really ugly like using unsafeInterleaveIO on writing the first list, and then write the second list "by hand".

Just a thought.

If "consuming" is just writing stuff to files I think Trace (which uses unsafeInterleaveIO) would be just fine, but "consuming" might be more than that. In that case I'd have to fuse the functions that consume each of the streams, with the function that produces them. J.A.

8275

Age (days ago)

8276

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Hal Daume III
Jorge Adriano
Tom Pledger