Parsec, comma sperated list with special last element

Hey, I want to parse a string like this: <cell>,<cell>,<lastCell> All cells but the last cell are parsed with "cell", the last one is parsed with "lastCell". So I am trying: file = do res <- endBy cell (char ',') l <- lastCell eof return res cell = many1 (noneOf ",") lastCell = many1 (noneOf "\n") This results in unexpected end of input expecting "," I am assuming this is because the lastCell is comsumed by a "cell" parser. lastCell and cell look pretty similar, so the cell parser can does not fail when presented the <lastCell>. Can I still do this with "endBy" or is there a better combinator? Thanks! Nathan

Hi Nathan, On Sun, Dec 16, 2012 at 05:55:53PM +0100, Nathan Hüsken wrote:
cell = many1 (noneOf ",") lastCell = many1 (noneOf "\n")
I wouldn't define the cells by the character that separates them, but by the allowed characters for a cell. Using attoparsec with applicative style (I didn't yet use parsec) I would write: import qualified Data.Attoparsec.Text as P import qualified Data.Attoparsec.Combinator as PC import qualified Data.Text as T import Data.Char (isAlpha) cell :: P.Parser T.Text cell = P.takeWhile1 isAlpha data Row { fstCell :: T.Text, sndCell :: T.Text, lastCell :: T.Text } row :: P.Parser Row row = Row <$> cell <*> (P.char ',' *> cell) <*> (P.char ',' *> cell) rows :: P.Parser [Row] rows = PC.manyTill row P.endOfLine The applicative style really shines in these cases, because the definition of the parser combines nicely with the creation of the data structures for the parsed data. Greetings, Daniel

On Sun, Dec 16, 2012 at 8:55 AM, Nathan Hüsken
I am assuming this is because the lastCell is comsumed by a "cell" parser.
Yes. Parsec does not backtrack automatically, which means once the cell parser has consumed the input that you were hoping would be consumed by lastCell, your program is never going to go back and reconsider that part of the input, even if doing so is the only way to make the file parser succeed.
lastCell and cell look pretty similar, so the cell parser can does not fail when presented the <lastCell>.
Can I still do this with "endBy" or is there a better combinator?
Using endBy is not the problem. There are many different solutions: 1. Factor out the common prefix of cell and lastCell, and restructure the file parser to avoid committing to either cell or lastCell until the next input symbol is one which definitively identifies which alternative the parser is looking at. 2. Replace cell and lastCell with a single parser that matches either one. Parse out a list of cellOrLastCell results and then do some post-processing to treat the last one specially. 3. Use the "try" combinator. You apply this combinator to a parser, and get back a parser which consumes no input if it fails. When using this combinator, you should consider whether this will have an unacceptable impact on the performance of your parser. (Performance is one of the reasons Parsec does not just backtrack automatically.) -Karl

On 12/17/2012 07:45 AM, Karl Voelker wrote: (...)
Using endBy is not the problem. There are many different solutions:
1. Factor out the common prefix of cell and lastCell, and restructure the file parser to avoid committing to either cell or lastCell until the next input symbol is one which definitively identifies which alternative the parser is looking at.
2. Replace cell and lastCell with a single parser that matches either one. Parse out a list of cellOrLastCell results and then do some post-processing to treat the last one specially.
3. Use the "try" combinator. You apply this combinator to a parser, and get back a parser which consumes no input if it fails. When using this combinator, you should consider whether this will have an unacceptable impact on the performance of your parser. (Performance is one of the reasons Parsec does not just backtrack automatically.)
Ok, I would choose 3 because performance is not the issue and it seems to be the most simple. Still ... I can not do it with "endBy", can I? Do it so: file = do res <- many $ try (do {c <- cell; char ','; return c}) l <- lastCell eof return res cell = many1 (noneOf ",") lastCell = many1 (noneOf "\n") works. But I wonder if I also could have used a clearer combinator from parsec. Regards, Nathan

I tried file = do res <- endBy (try cell) (char ',') l <- lastCell eof return res cell = many1 (noneOf ",") lastCell = many1 (noneOf "\n") But that does not work because cell succeeds on the last cell. I can replace the endBy by many (try $ do {a <- cell; string ","; return a}) Nathan On 12/18/2012 08:21 AM, Karl Voelker wrote:
On Mon, Dec 17, 2012 at 2:28 AM, Nathan Hüsken
wrote: Still ... I can not do it with "endBy", can I?
I think you can. What have you tried with endBy that didn't work?
-Karl
participants (3)
-
Daniel Trstenjak
-
Karl Voelker
-
Nathan Hüsken