parsec: how to get end location

Suppose I have some parser 'p'. I want to parse it as well as get its span in the text. So I could write \begin{code] pWithLocation = do loc_start <- getPosition pval <- p loc_end <- getPosition return (pval,loc_start,loc_end) \end{code} except that loc_end gives me the location _after_ 'p'. In case when 'p' has consumed trailing newline I see no way to recover the end location of 'p' (it is the last column of the previous line). So, can I do anything with this without patching parsec? I use parsec3 if it matters. -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain

On Sun, Jun 13, 2010 at 4:17 PM, Roman Cheplyaka
Suppose I have some parser 'p'. I want to parse it as well as get its span in the text. So I could write
\begin{code] pWithLocation = do loc_start <- getPosition pval <- p loc_end <- getPosition return (pval,loc_start,loc_end) \end{code}
except that loc_end gives me the location _after_ 'p'. In case when 'p' has consumed trailing newline I see no way to recover the end location of 'p' (it is the last column of the previous line).
So, can I do anything with this without patching parsec?
I use parsec3 if it matters.
Can you use a parser which doesn't consume leading/trailing whitespace? Or somehow layer the parsers so that the whitepsace munching happens outside of parseWithLocation. Antoine

* Antoine Latter
On Sun, Jun 13, 2010 at 4:17 PM, Roman Cheplyaka
wrote: Suppose I have some parser 'p'. I want to parse it as well as get its span in the text. So I could write
\begin{code] pWithLocation = do loc_start <- getPosition pval <- p loc_end <- getPosition return (pval,loc_start,loc_end) \end{code}
except that loc_end gives me the location _after_ 'p'. In case when 'p' has consumed trailing newline I see no way to recover the end location of 'p' (it is the last column of the previous line).
So, can I do anything with this without patching parsec?
I use parsec3 if it matters.
Can you use a parser which doesn't consume leading/trailing whitespace?
Or somehow layer the parsers so that the whitepsace munching happens outside of parseWithLocation.
Of course most parsers don't consume trailing newlines. But I was writing general function to use in many places in the code which would recover the end location. In most cases it just subtracts 1 from the column number, but what if it just happened so that column number is 1? So I got an impression that this way of obtaining the end location is at least ugly. It would be better if there was a function parseWithLocation :: Parser a -> Parser (a, SourcePos, SourcePos) which, I guess, is easy to implement within Parsec without any such hacks. -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain

On Mon, Jun 14, 2010 at 12:10 AM, Roman Cheplyaka
Of course most parsers don't consume trailing newlines. But I was writing general function to use in many places in the code which would recover the end location. In most cases it just subtracts 1 from the column number, but what if it just happened so that column number is 1?
Parsec can handle state, right ? You could modify the parsers for white space so they record the beginning position in some state. ( In a maybe ) Then, modify parseWithLocation to set the state position to nothing, parse p then if no position has been recorded in the state , use the current position, else use the position in the state. Excuse me if this is unclear or confused, it's late :) David.

On 6/14/10 0:10, Roman Cheplyaka wrote:
Of course most parsers don't consume trailing newlines. But I was writing general function to use in many places in the code which would recover the end location. In most cases it just subtracts 1 from the column number, but what if it just happened so that column number is 1?
Parsers can be composed of lots of functions, but eventually all the actual consumption of symbols boils down to calls to Text.Parsec.Prim.tokenPrimEx. This is where you need to address your problem: find the places in your code where this function is called (or the derived tokenPrim or token) and intercept there. Hopefully you have defined your own 'satisfy' function and need only change that one. Once you've found those places, there's multiple ways to solve the problem. For example, you could keep track of the last interesting (=non-whitespace) position. Or you could parse the whitespace in a separate phase. There is no way to retroactively intercept these calls, which I think is a flaw in the design of Parsec. It would have been nice to have a 'runParsecWith myTokenPrimEx', or better yet capture the Parsec primitives in a type class. Groetjes, Martijn.

Hi Roman You would need different behaviour for the /lexeme/ parser in Parsec.Token at least - this is the combinator that promotes a parser to also consume trailing whitespace. I suspect you would have to recode most of Parsec.Token module - the TokenParser is a parameterized module (in the sense of Sheard and Pasalic [1]), unfortunately /lexeme/ is not one of the parameters and other combintators are defined using it within the makeTokenParser function which instantiates the parameterized module. Alternative you could remake a set of token parsers - this is covered in the Parsec manual [2] - section 2.11 "Advanced: Separate scanners". [1] http://web.cecs.pdx.edu/~sheard/papers/JfpPearl.ps [2] http://legacy.cs.uu.nl/daan/download/parsec/parsec.pdf Best wishes Stephen
participants (5)
-
Antoine Latter
-
David Virebayre
-
Martijn van Steenbergen
-
Roman Cheplyaka
-
Stephen Tetley