parser error coordinates transformation from preprocessed to original text

Hi! The parser function in my library is exposed to user which hides internally preprocessing & parsing. When parser error ocurr the coordinates are in preprocessed text, but library function user deserves to know coordinates in original text. Following is my application flow (simplified): {- User exposed library function -} parseLDIF :: String -> Either ParseError LDIF parseLDIF xs = myParser $ myPreprocessor xs {- Expects the preprocessed input -} myParser :: String -> Either ParseError LDIF {- Remove comments and unwarp lines; No new text added only removed -} myPreprocessor :: String -> String My approach I'm going to implement is to keep information what myProcessor did on text (some kind of journal of preprocessing). In the case of error this journal will be used for original text coordinates calculation. myPreprocessor :: String -> (String,[PreprocessorOperation]) Anyway, I have feeling that I'm doing something very wrong... reason for this email. What is the common approach to transform error coordinates from preprocessed to original text ? Is that common to separate preprocessing & parsing phase ? Thanks, Rado

Hi Radoslav Comment removal is usually done as part of lexing, so during lexing you will have access to the true source position - if you care about line numbering (and have a split between lexing and parsing) usually you would annotate lexemes with their source position. If you are doing macro-expansion before parsing and lexing, it is common to tell the macro expander an artificial but hopefully correct-to-the-user line number with a pragma like #line.

Hi Stephen,
On Sun, Sep 18, 2011 at 6:56 PM, Stephen Tetley
Comment removal is usually done as part of lexing, so during lexing you will have access to the true source position - if you care about line numbering (and have a split between lexing and parsing) usually you would annotate lexemes with their source position.
Thanks! And sorry for late reply. The answer for me means that I should use common method of separate lexem / parser instead of using parsec with Char Parser. I will do that anyway since it will probably improve the performance of the parser. One more question: Do you know If there is a lexing technology which can inter work with Parsec based Parser ? With inter work I mean that line number information is preserved across lexing/parsing. Rado

On 20 October 2011 12:49, Radoslav Dorcik
One more question: Do you know If there is a lexing technology which can inter work with Parsec based Parser ? With inter work I mean that line number information is preserved across lexing/parsing.
Yes, you can use Alex lexers with Parsec for instance. The pattern to do this covered in section 2.11 "Advanced: Separate scanners" of the Parsec manual. For a full scanner this unfortunately leads to quite a bit of boilerplate so usually I use Parsec's own tokenizing code via Token module. http://research.microsoft.com/en-us/um/people/daan/download/parsec/parsec.pd...
participants (2)
-
Radoslav Dorcik
-
Stephen Tetley