what space leak could happen here?

Hi All, I was looking at the definition of intersperse on http://hackage.haskell.org/package/base-4.7.0.1/docs/src/Data-List.html#inte... And found this -- | The 'intersperse' function takes an element and a list and -- \`intersperses\' that element between the elements of the list. -- For example, -- -- > intersperse ',' "abcde" == "a,b,c,d,e" intersperse :: a -> [a] -> [a] intersperse _ [] = [] intersperse sep (x:xs) = x : prependToAll sep xs -- Not exported: -- We want to make every element in the 'intersperse'd list available -- as soon as possible to avoid space leaks. Experiments suggested that -- a separate top-level helper is more efficient than a local worker. prependToAll :: a -> [a] -> [a] prependToAll _ [] = [] prependToAll sep (x:xs) = sep : x : prependToAll sep xs I don't understand why we need to "make every element in the 'intersperse'd list available as soon as possible to avoid space leaks." Could somebody shed some light on what could cause a space leak here? In particular, would this simpler version cause problems? intersperse :: a -> [a] -> [a] intersperse _ [] = [] intersperse _ [x] = [x] intersperse a (x: xs@(y:ys)) = x : a : intersperse a xs Thanks, Dimitri

On Sun, Sep 21, 2014 at 7:05 PM, Dimitri DeFigueiredo < defigueiredo@ucdavis.edu> wrote:
I don't understand why we need to "make every element in the 'intersperse'd list available as soon as possible to avoid space leaks." Could somebody shed some light on what could cause a space leak here? In particular, would this simpler version cause problems?
This is delicate but basically your version needs to force the second element of the list (to check if it's []) before it produce the first element of the result while the base version doesn't, check with "take 1 . intersperse ',' $ 'a' : undefined". The leak can occur when you only need the first element of the result to start producing but the second one is expensive to produce and store, the base version avoid the problem by being maximally lazy. -- Jedaï

Thanks! I don't quite understand the last sentence, though. So, it is a leak just because it needs to evaluate an element that may not be required? I thought a space leak implied we would be changing the "Big O" asymptotic order of space usage. How would the extra evaluation would change that? Thanks again, Dimitri On 21/09/14 13:20, Chaddaï Fouché wrote:
On Sun, Sep 21, 2014 at 7:05 PM, Dimitri DeFigueiredo
mailto:defigueiredo@ucdavis.edu> wrote: I don't understand why we need to "make every element in the 'intersperse'd list available as soon as possible to avoid space leaks." Could somebody shed some light on what could cause a space leak here? In particular, would this simpler version cause problems?
This is delicate but basically your version needs to force the second element of the list (to check if it's []) before it produce the first element of the result while the base version doesn't, check with "take 1 . intersperse ',' $ 'a' : undefined".
The leak can occur when you only need the first element of the result to start producing but the second one is expensive to produce and store, the base version avoid the problem by being maximally lazy. -- Jedaï
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Sun, Sep 21, 2014 at 02:12:04PM -0600, Dimitri DeFigueiredo wrote:
Thanks!
I don't quite understand the last sentence, though. So, it is a leak just because it needs to evaluate an element that may not be required? I thought a space leak implied we would be changing the "Big O" asymptotic order of space usage. How would the extra evaluation would change that?
Think of a network socket as the source of your list, and the first character is all you need. If you need to evaluate the second character to get the first, you could be stuck on the network socket for some time waiting for the second one to come. And I think the reason it would break some "Big O"s is that most complexity analysis in Haskell are amortized bounds and it uses the fact that Haskell is lazy and won't evaluate something until we actually need it. -- Phil Xiaojun Hu

On Mon, Sep 22, 2014 at 12:10 AM, Phil Xiaojun Hu
Think of a network socket as the source of your list, and the first character is all you need. If you need to evaluate the second character to get the first, you could be stuck on the network socket for some time waiting for the second one to come.
That's more of a time leak. I think the space leak aspect comes from the fact that you've evaluated a second list element that was not requested, may never be used, and will be held in the heap until all shared consumers of the list go out of scope. (It may of course also be a time leak if computing that second element is expensive.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (4)
-
Brandon Allbery
-
Chaddaï Fouché
-
Dimitri DeFigueiredo
-
Phil Xiaojun Hu