writing many1Till combinator for Parsec

Hi, I'm writing a parser where I find myself in need to use manyTill combinator with an additional constraint that there should be at least one meaningful element before the trailer (say, a word ended with a period: 'abc.'). Here is what I have: many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do p1 <- p ps <- manyTill p end return (p1:ps) The problem here is that I want to catch and report a case when 'p1' could be actually parsed by 'end' ('..' is not a word ended by a period). Generally 'p' and 'end' can parse the same subset of strings. Another version a had was: many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do try (end >> (unexpected "sequence terminator")) <|> (do { p1 <- p; ps <- manyTill p end; return (p1:ps) }) Here there are two disadvantages: 1) I don't like hardcoding "sequence terminator" here; 2) the error output should say that only 'p' parser is expected, while it says (technically correct) that either 'p' or 'end' is expected: Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "1" parse error at (line 1, column 1): unexpected "1" expecting "." or letter (What I want here is to say "expecting letter") Any suggestions? Thanks! -- Vlad Skvortsov, vss@73rus.com, http://vss.73rus.com

Am Samstag 12 Dezember 2009 01:58:11 schrieb Vlad Skvortsov:
Hi,
I'm writing a parser where I find myself in need to use manyTill combinator with an additional constraint that there should be at least one meaningful element before the trailer (say, a word ended with a period: 'abc.').
Here is what I have:
many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do notFollowedBy end p1 <- p ps <- manyTill p end return (p1:ps)
You want a nonempty sequence of 'p's which aren't 'end's, followed by an 'end'. So you could do for example a) many1Till p end = do ps <- manyTill p end guard (not $ null ps) return ps , i.e. check whether the result of manyTill is a nonempty list after the fact, or check whether manyTill p end will return an empty list before manyTill is run, like b) many1Till p end = do notFollowedBy end manyTill p end (i.e. many1Till p end = notFollowedBy end >> manyTill p end).
The problem here is that I want to catch and report a case when 'p1' could be actually parsed by 'end' ('..' is not a word ended by a period). Generally 'p' and 'end' can parse the same subset of strings.
Another version a had was:
many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do try (end >> (unexpected "sequence terminator")) <|> (do { p1 <- p; ps <- manyTill p end; return (p1:ps) })
Here there are two disadvantages:
1) I don't like hardcoding "sequence terminator" here; 2) the error output should say that only 'p' parser is expected, while it says (technically correct) that either 'p' or 'end' is expected:
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "1" parse error at (line 1, column 1): unexpected "1" expecting "." or letter
(What I want here is to say "expecting letter")
For that, you need the slightly clumsier c) many1Till p end = do notFollowedBy end p1 <- p ps <- manyTill p end return (p1:ps)
Any suggestions?
Thanks!

Daniel Fischer wrote:
You want a nonempty sequence of 'p's which aren't 'end's, followed by an 'end'. So you could do for example a) many1Till p end = do ps <- manyTill p end guard (not $ null ps) return ps
, i.e. check whether the result of manyTill is a nonempty list after the fact, or check whether manyTill p end will return an empty list before manyTill is run, like b) many1Till p end = do notFollowedBy end manyTill p end
(i.e. many1Till p end = notFollowedBy end >> manyTill p end).
Thanks Daniel! I missed out on 'guard' and somewhy was under impression that 'notFollowedBy' can only deal with Chars.
many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do try (end >> (unexpected "sequence terminator")) <|> (do { p1 <- p; ps <- manyTill p end; return (p1:ps) })
Here there are two disadvantages:
1) I don't like hardcoding "sequence terminator" here; 2) the error output should say that only 'p' parser is expected, while it says (technically correct) that either 'p' or 'end' is expected:
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "1" parse error at (line 1, column 1): unexpected "1" expecting "." or letter
(What I want here is to say "expecting letter")
For that, you need the slightly clumsier c) many1Till p end = do notFollowedBy end p1 <- p ps <- manyTill p end return (p1:ps)
Yep, that works but still provides incorrect diagnostics when fed with an empty string: Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "" parse error at (line 1, column 1): unexpected end of input expecting "." or letter It's not a showstopper, but I'd still like to understand how to make it provide better error messages. Thanks! -- Vlad Skvortsov, vss@73rus.com, http://vss.73rus.com

Am Montag 14 Dezember 2009 23:35:13 schrieb Vlad Skvortsov:
Thanks Daniel! I missed out on 'guard' and somewhy was under impression that 'notFollowedBy' can only deal with Chars.
That's correct, its type is notFollowedBy :: Show tok => GenParser tok st tok -> GenParser tok st () I didn't lookup the type. However, you can easily generalize it: nFB p = ((try p >> pzero) <|> return ()) > "" Be aware, however, that the use of try may have adverse effects on performance.
many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do try (end >> (unexpected "sequence terminator")) <|> (do { p1 <- p; ps <- manyTill p end; return (p1:ps) })
Here there are two disadvantages:
1) I don't like hardcoding "sequence terminator" here;
Understandable, but I haven't a better idea either.
2) the error output should say that only 'p' parser is expected, while it says (technically correct) that either 'p' or 'end' is expected:
Remove expectations of end by labelling the test with an empty string: many1Till p end = ((try end >> unexpected "whatever") > "") <|> do { p1 <- p; ps <- manyTill p end; return (p1:ps) }
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "1" parse error at (line 1, column 1): unexpected "1" expecting "." or letter
(What I want here is to say "expecting letter")
For that, you need the slightly clumsier c) many1Till p end = do notFollowedBy end
make that nFB end or better ((try end >> unexpected "whatever") > "")
p1 <- p ps <- manyTill p end return (p1:ps)
Yep, that works but still provides incorrect diagnostics when fed with an empty string:
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "" parse error at (line 1, column 1): unexpected end of input expecting "." or letter
It's not a showstopper, but I'd still like to understand how to make it provide better error messages.
Thanks!

Thanks Daniel! Just for the record, here is what I ended up with: many1Till :: Show end => Parser a -> Parser end -> Parser [a] many1Till p end = do notFollowedBy' end p1 <- p ps <- manyTill p end return (p1:ps) {- Workaround for an overly restrictive type of notFollowedBy. See also: http://www.haskell.org/pipermail/haskell-cafe/2005-November/012318.html -} notFollowedBy' :: Show a => GenParser tok st a -> GenParser tok st () notFollowedBy' p = try $ join $ do a <- try p return (unexpected (show a)) <|> return (return ()) Daniel Fischer wrote:
Am Montag 14 Dezember 2009 23:35:13 schrieb Vlad Skvortsov:
Thanks Daniel! I missed out on 'guard' and somewhy was under impression that 'notFollowedBy' can only deal with Chars.
That's correct, its type is
notFollowedBy :: Show tok => GenParser tok st tok -> GenParser tok st ()
I didn't lookup the type. However, you can easily generalize it:
nFB p = ((try p >> pzero) <|> return ()) > ""
Be aware, however, that the use of try may have adverse effects on performance.
many1Till :: Parser a -> Parser end -> Parser [a] many1Till p end = do try (end >> (unexpected "sequence terminator")) <|> (do { p1 <- p; ps <- manyTill p end; return (p1:ps) })
Here there are two disadvantages:
1) I don't like hardcoding "sequence terminator" here;
Understandable, but I haven't a better idea either.
2) the error output should say that only 'p' parser is expected, while it says (technically correct) that either 'p' or 'end' is expected:
Remove expectations of end by labelling the test with an empty string:
many1Till p end = ((try end >> unexpected "whatever") > "") <|> do { p1 <- p; ps <- manyTill p end; return (p1:ps) }
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "1" parse error at (line 1, column 1): unexpected "1" expecting "." or letter
(What I want here is to say "expecting letter")
For that, you need the slightly clumsier c) many1Till p end = do notFollowedBy end
make that nFB end or better ((try end >> unexpected "whatever") > "")
p1 <- p ps <- manyTill p end return (p1:ps)
Yep, that works but still provides incorrect diagnostics when fed with an empty string:
Prelude Main Text.ParserCombinators.Parsec> parseTest (many1Till letter (char '.')) "" parse error at (line 1, column 1): unexpected end of input expecting "." or letter
It's not a showstopper, but I'd still like to understand how to make it provide better error messages.
Thanks!
-- Vlad Skvortsov, vss@73rus.com, http://vss.73rus.com
participants (2)
-
Daniel Fischer
-
Vlad Skvortsov