Parsec line number off-by-one

Recently I've been playing around with Parsec for a simple parsing project. While I was able to quickly construct my grammar (simplified version attached), getting it working has been a bit tricky. In particular, I am now stuck trying to figure out why Parsec is mis-reporting line numbers. Parsec seems convinced that line 12 of my input (also attached) has a "%" character, $ runghc Test.hs Left "(unknown)" (line 12, column 1): unexpected "%" expecting space or atom name while my file clearly disagrees, 10 %FLAG ATOM_NAME 11 %FORMAT(20a4) 12 C1 H1 C2 H2 C3 H3 C4 H4 C5 C6 C7 C8 N1 C9 H9 C10 H10 C11 H11 C12 13 H12 C13 H13 C14 C15 N2 C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22 H221H222H223 ... 18 %FLAG CHARGE 19 %FORMAT(5E16.8) The task here is to identify the block of data lines (lines 12-17), ending at the beginning of the next block (starting with "%"). It seems likely that my problem stems from the fact that I use "try" to accomplish this but this is as far as I can reason. Any ideas what might cause this sort of off-by-one? Does anyone see a better (i.e. working) way to formulate my grammar? Any and all help would be greatly appreciated. Thanks. Cheers, - Ben

Hi Ben,
This is indeed a bug in parsec.
I have written a patch that fixes this. Currently Antoine Latter (current
parsec's maintainer) and I are working on getting these patches into the
next parsec release.
As a workaround until then, you can apply the attached patch manually.
darcs get http://code.haskell.org/parsec3
cd parsec3
darcs apply parsec.dpatch
cabal install
With this patch, the error message is:
Left "(unknown)" (line 18, column 1):
expecting space or atom name
* Ben Gamari
Recently I've been playing around with Parsec for a simple parsing project. While I was able to quickly construct my grammar (simplified version attached), getting it working has been a bit tricky. In particular, I am now stuck trying to figure out why Parsec is mis-reporting line numbers. Parsec seems convinced that line 12 of my input (also attached) has a "%" character,
$ runghc Test.hs Left "(unknown)" (line 12, column 1): unexpected "%" expecting space or atom name
while my file clearly disagrees,
10 %FLAG ATOM_NAME 11 %FORMAT(20a4) 12 C1 H1 C2 H2 C3 H3 C4 H4 C5 C6 C7 C8 N1 C9 H9 C10 H10 C11 H11 C12 13 H12 C13 H13 C14 C15 N2 C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22 H221H222H223 ... 18 %FLAG CHARGE 19 %FORMAT(5E16.8)
The task here is to identify the block of data lines (lines 12-17), ending at the beginning of the next block (starting with "%"). It seems likely that my problem stems from the fact that I use "try" to accomplish this but this is as far as I can reason.
Any ideas what might cause this sort of off-by-one? Does anyone see a better (i.e. working) way to formulate my grammar? Any and all help would be greatly appreciated. Thanks.
Cheers,
- Ben
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Roman I. Cheplyaka :: http://ro-che.info/

Hi, 1. your "lookAhead" is unnecessary, because your items (atomNames) never start with "%". 2. your "try" fails in (line 12, column 1), because the last item (aka atomName) starts consuming "\n", before your eol parser is called. So rather than calling spaces before every real atom, I would call it after every real atom and after your formatDecl (so before your linesOf parser). atomNameBlock = do flagDecl "ATOM_NAME" formatDecl spaces atomNames <- many1 atomName return $ AtomNames atomNames where atomName = do name <- countBetween 1 4 (alphaNum <|> oneOf "\'+-") > "atom name" spaces return name Since spaces also consume "\n", linesOf can just be "many1"! HTH Christian Am 21.09.2011 05:32, schrieb Ben Gamari:
Recently I've been playing around with Parsec for a simple parsing project. While I was able to quickly construct my grammar (simplified version attached), getting it working has been a bit tricky. In particular, I am now stuck trying to figure out why Parsec is mis-reporting line numbers. Parsec seems convinced that line 12 of my input (also attached) has a "%" character,
$ runghc Test.hs Left "(unknown)" (line 12, column 1): unexpected "%" expecting space or atom name
while my file clearly disagrees,
10 %FLAG ATOM_NAME 11 %FORMAT(20a4) 12 C1 H1 C2 H2 C3 H3 C4 H4 C5 C6 C7 C8 N1 C9 H9 C10 H10 C11 H11 C12 13 H12 C13 H13 C14 C15 N2 C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22 H221H222H223 ... 18 %FLAG CHARGE 19 %FORMAT(5E16.8)
The task here is to identify the block of data lines (lines 12-17), ending at the beginning of the next block (starting with "%"). It seems likely that my problem stems from the fact that I use "try" to accomplish this but this is as far as I can reason.
Any ideas what might cause this sort of off-by-one? Does anyone see a better (i.e. working) way to formulate my grammar? Any and all help would be greatly appreciated. Thanks.
Cheers,
- Ben
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, 21 Sep 2011 11:27:31 +0200, Christian Maeder
Hi,
1. your "lookAhead" is unnecessary, because your items (atomNames) never start with "%".
I see.
2. your "try" fails in (line 12, column 1), because the last item (aka atomName) starts consuming "\n", before your eol parser is called.
Ahh, this is a good point. I for some reason seeded the thought in my mind that spaces takes the ' ' character, not '\n'.
So rather than calling spaces before every real atom, I would call it after every real atom and after your formatDecl (so before your linesOf parser).
Excellent solution. I appreciate your help. That would have taken me quite a bit of head-banging to find. Cheers, - Ben
participants (3)
-
Ben Gamari
-
Christian Maeder
-
Roman Cheplyaka