trying to understand space leaks....

In Section 2.5 of "Generalizing Monads to Arrows" paper (linkhttp://www.cs.chalmers.se/%7Erjmh/Papers/arrows.ps) John Huges talks about the space leak inherit in monadic implementation of backtracking parsers. newtype Parser s a = P( [s] => Maybe (a, [s])) instance MonadPlus Parser where P a mplus P b = P (\s -> case a s of Just (x, s') -> Just (x, s') Nothing -> b s) The problem (as I understand it) is that to implement the backtracking, the monad plus implementation needs to keep the state (the computation that returns the next token) around in case the parser fails (so that it can be passed to the other side of mplus). I am having hard time to understand... a)what exactly gets saved on the heap between the mplus calls? b)I am assuming the computation to get the next character for parsing to be an "IO Char" type computation, in that case, what would be the size of the heap buffer that is kept around in case the computation result needs to be reused? c) Assuming Pa in the above code reads n tokens from the input stream then fails, how does the run time returns the same token to the P b? Thanks, Daryoush

On Tue, May 26, 2009 at 5:03 PM, Daryoush Mehrtash
newtype Parser s a = P( [s] -> Maybe (a, [s])) (fixed typo)
instance MonadPlus Parser where P a mplus P b = P (\s -> case a s of Just (x, s') -> Just (x, s') Nothing -> b s)
a)what exactly gets saved on the heap between the mplus calls?
Two things: (1) Values in the input stream that "a" parses before failing. Beforehand, it might just be a thunk that generates the list lazily in some fashion. (2) The state of the closure "b"; if parser "a" fails, we need to be able to run "b"; that could use an arbitrary amount of space depending on what data it keeps alive.
b)I am assuming the computation to get the next character for parsing to be an "IO Char" type computation, in that case, what would be the size of the heap buffer that is kept around in case the computation result needs to be reused?
Nope, no IO involved; just look at the types: P :: ([s] -> Maybe (a,[s])) -> Parser s a (Parser s a) is just a function that takes a list of "s", and possibly returns a value of type "a" and another list [s] (of the remaining tokens, one hopes) It's up to the caller of the parsing function to provide the token stream [s] somehow.
c) Assuming Pa in the above code reads n tokens from the input stream then fails, how does the run time returns the same token to the P b?
It just passes the same stream to both. No mutability means no danger :) -- ryan

So long as the [s] is a fixed list (say [1,2,3,4]) there is no space
leak. My understanding was that the space leak only happens if there is
computation involved in building the list of s. Am I correct?
If so, I still don't have any feeling for what needs to be saved on the heap
to be able to back track on computation that needs and IO computation
data. What would be approximate space that an IO (Char) computation
take on the heap, is it few bytes, 100, 1k, ....?
Daryoush
On Tue, May 26, 2009 at 6:11 PM, Ryan Ingram
On Tue, May 26, 2009 at 5:03 PM, Daryoush Mehrtash
wrote: newtype Parser s a = P( [s] -> Maybe (a, [s])) (fixed typo)
instance MonadPlus Parser where P a mplus P b = P (\s -> case a s of Just (x, s') -> Just (x, s') Nothing -> b s)
a)what exactly gets saved on the heap between the mplus calls?
Two things:
(1) Values in the input stream that "a" parses before failing. Beforehand, it might just be a thunk that generates the list lazily in some fashion.
(2) The state of the closure "b"; if parser "a" fails, we need to be able to run "b"; that could use an arbitrary amount of space depending on what data it keeps alive.
b)I am assuming the computation to get the next character for parsing to be an "IO Char" type computation, in that case, what would be the size of the heap buffer that is kept around in case the computation result needs to be reused?
Nope, no IO involved; just look at the types:
P :: ([s] -> Maybe (a,[s])) -> Parser s a
(Parser s a) is just a function that takes a list of "s", and possibly returns a value of type "a" and another list [s] (of the remaining tokens, one hopes)
It's up to the caller of the parsing function to provide the token stream [s] somehow.
c) Assuming Pa in the above code reads n tokens from the input stream then fails, how does the run time returns the same token to the P b?
It just passes the same stream to both. No mutability means no danger :)
-- ryan

There's still the space used by the closure "b".
An example:
expensiveParser :: Parser Char ExpensiveStructure
simple :: Parser Char Int
withExpensive :: ExpensiveStructure -> Parser Char Int
withExpensive _ = mzero -- actually always fails, not using its argument.
example = do
e <- expensiveParser
simple `mplus` withExpensive e
The expensive structure constructed by expensiveParser needs to be
kept in memory throughout the entire parsing of "simple", even though
withExpensive doesn't actually use it and would immediately fail. A
smarter parser could realize that e couldn't actually ever be used and
allow the GC to free it much more quickly.
This example can be made arbitrarily more complicated; withExpensive
could run different things based on the value of "e" that could be
determined to fail quickly, simple might actually do a lot of work,
etc. But during the "mplus" in the monadic parser, we can't free e.
-- ryan
On Wed, May 27, 2009 at 12:49 PM, Daryoush Mehrtash
So long as the [s] is a fixed list (say [1,2,3,4]) there is no space leak. My understanding was that the space leak only happens if there is computation involved in building the list of s. Am I correct?
If so, I still don't have any feeling for what needs to be saved on the heap to be able to back track on computation that needs and IO computation data. What would be approximate space that an IO (Char) computation take on the heap, is it few bytes, 100, 1k, ....?
Daryoush
On Tue, May 26, 2009 at 6:11 PM, Ryan Ingram
wrote: On Tue, May 26, 2009 at 5:03 PM, Daryoush Mehrtash
wrote: newtype Parser s a = P( [s] -> Maybe (a, [s])) (fixed typo)
instance MonadPlus Parser where P a mplus P b = P (\s -> case a s of Just (x, s') -> Just (x, s') Nothing -> b s)
a)what exactly gets saved on the heap between the mplus calls?
Two things:
(1) Values in the input stream that "a" parses before failing. Beforehand, it might just be a thunk that generates the list lazily in some fashion.
(2) The state of the closure "b"; if parser "a" fails, we need to be able to run "b"; that could use an arbitrary amount of space depending on what data it keeps alive.
b)I am assuming the computation to get the next character for parsing to be an "IO Char" type computation, in that case, what would be the size of the heap buffer that is kept around in case the computation result needs to be reused?
Nope, no IO involved; just look at the types:
P :: ([s] -> Maybe (a,[s])) -> Parser s a
(Parser s a) is just a function that takes a list of "s", and possibly returns a value of type "a" and another list [s] (of the remaining tokens, one hopes)
It's up to the caller of the parsing function to provide the token stream [s] somehow.
c) Assuming Pa in the above code reads n tokens from the input stream then fails, how does the run time returns the same token to the P b?
It just passes the same stream to both. No mutability means no danger :)
-- ryan

I (think) I understand the problem. What I don't have any intuition about
is how much space would "Expensive Structure" take if it was basically an
IO Char computation fed into a simple function (say checks for char being
equal to "a"). Is there any way to guess, know the size of the buffer that
is kept in the heap?
thanks,
Daryoush
On Wed, May 27, 2009 at 3:12 PM, Ryan Ingram
There's still the space used by the closure "b".
An example:
expensiveParser :: Parser Char ExpensiveStructure
simple :: Parser Char Int
withExpensive :: ExpensiveStructure -> Parser Char Int withExpensive _ = mzero -- actually always fails, not using its argument.
example = do e <- expensiveParser simple `mplus` withExpensive e
The expensive structure constructed by expensiveParser needs to be kept in memory throughout the entire parsing of "simple", even though withExpensive doesn't actually use it and would immediately fail. A smarter parser could realize that e couldn't actually ever be used and allow the GC to free it much more quickly.
This example can be made arbitrarily more complicated; withExpensive could run different things based on the value of "e" that could be determined to fail quickly, simple might actually do a lot of work, etc. But during the "mplus" in the monadic parser, we can't free e.
-- ryan
On Wed, May 27, 2009 at 12:49 PM, Daryoush Mehrtash
wrote: So long as the [s] is a fixed list (say [1,2,3,4]) there is no space leak. My understanding was that the space leak only happens if there is computation involved in building the list of s. Am I correct?
If so, I still don't have any feeling for what needs to be saved on the heap to be able to back track on computation that needs and IO computation data. What would be approximate space that an IO (Char) computation take on the heap, is it few bytes, 100, 1k, ....?
Daryoush
On Tue, May 26, 2009 at 6:11 PM, Ryan Ingram
wrote: On Tue, May 26, 2009 at 5:03 PM, Daryoush Mehrtash
wrote:
newtype Parser s a = P( [s] -> Maybe (a, [s])) (fixed typo)
instance MonadPlus Parser where P a mplus P b = P (\s -> case a s of Just (x, s') -> Just (x, s') Nothing -> b s)
a)what exactly gets saved on the heap between the mplus calls?
Two things:
(1) Values in the input stream that "a" parses before failing. Beforehand, it might just be a thunk that generates the list lazily in some fashion.
(2) The state of the closure "b"; if parser "a" fails, we need to be able to run "b"; that could use an arbitrary amount of space depending on what data it keeps alive.
b)I am assuming the computation to get the next character for parsing to be an "IO Char" type computation, in that case, what would be the size of the heap buffer that is kept around in case the computation result needs to be reused?
Nope, no IO involved; just look at the types:
P :: ([s] -> Maybe (a,[s])) -> Parser s a
(Parser s a) is just a function that takes a list of "s", and possibly returns a value of type "a" and another list [s] (of the remaining tokens, one hopes)
It's up to the caller of the parsing function to provide the token stream [s] somehow.
c) Assuming Pa in the above code reads n tokens from the input stream then fails, how does the run time returns the same token to the P b?
It just passes the same stream to both. No mutability means no danger :)
-- ryan
participants (2)
-
Daryoush Mehrtash
-
Ryan Ingram