
I want to write a parser that can read a file with this format: the file has sections which are demarcated by keywords. Keywords always begin with two forward slashes and consist of letters, digits, and underscore. The text can be anything, including special characters. For instance: //keyword some text and more text //another_keyword and) some { more text //ya_keyword $$ -- text I'm not sure how to write a parser that considers anything but a double slash to be a valid part of the text. Thanks, Mike

2009/4/17 Michael P Mossey
I want to write a parser that can read a file with this format: the file has sections which are demarcated by keywords. Keywords always begin with two forward slashes and consist of letters, digits, and underscore. The text can be anything, including special characters. For instance:
//keyword some text and more text //another_keyword and) some { more text //ya_keyword $$ -- text
I'm not sure how to write a parser that considers anything but a double slash to be a valid part of the text.
Maybe you can use a combination of 'many', 'noneOf' or 'manyTill' ? Cheers, Thu

Here's what I've got so far. -- Text is considered everything up to //. However, the problem -- is that this consumes the //. parseText = manyTill anyChar (try (string "//")) -- Because the // is already consumed, parseKeyword just grabs -- the available letters. parseKeyword :: Parser String parseKeyword = many1 letter -- Test function. parseSome = do t1 <- parseText k1 <- parseKeyword t2 <- parseText return (t1,k1,t2) On "some text//keyword more text//" this gives ("some text","keyword"," more text") On "some text//keyword more text" this gives the error "expecting //" I wonder how I can get the manyTill to be happy with eof before finding the //? I tried parseText = manyTill anyChar (try (string "//") <|> eof) but got a type error. minh thu wrote:
2009/4/17 Michael P Mossey
: I want to write a parser that can read a file with this format: the file has sections which are demarcated by keywords. Keywords always begin with two forward slashes and consist of letters, digits, and underscore. The text can be anything, including special characters. For instance:
//keyword some text and more text //another_keyword and) some { more text //ya_keyword $$ -- text
I'm not sure how to write a parser that considers anything but a double slash to be a valid part of the text.
Maybe you can use a combination of 'many', 'noneOf' or 'manyTill' ?
Cheers, Thu

You can use 'notFollowedBy' (probably with 'many1' and 'try').
Something like (untested):
notFollowedBy (try $ string "//")
Thu
2009/4/17 Michael Mossey
Here's what I've got so far.
-- Text is considered everything up to //. However, the problem -- is that this consumes the //. parseText = manyTill anyChar (try (string "//"))
-- Because the // is already consumed, parseKeyword just grabs -- the available letters. parseKeyword :: Parser String parseKeyword = many1 letter
-- Test function. parseSome = do t1 <- parseText k1 <- parseKeyword t2 <- parseText return (t1,k1,t2)
On "some text//keyword more text//" this gives
("some text","keyword"," more text")
On "some text//keyword more text"
this gives the error "expecting //"
I wonder how I can get the manyTill to be happy with eof before finding the //? I tried
parseText = manyTill anyChar (try (string "//") <|> eof)
but got a type error.
minh thu wrote:
2009/4/17 Michael P Mossey
: I want to write a parser that can read a file with this format: the file has sections which are demarcated by keywords. Keywords always begin with two forward slashes and consist of letters, digits, and underscore. The text can be anything, including special characters. For instance:
//keyword some text and more text //another_keyword and) some { more text //ya_keyword $$ -- text
I'm not sure how to write a parser that considers anything but a double slash to be a valid part of the text.
Maybe you can use a combination of 'many', 'noneOf' or 'manyTill' ?
Cheers, Thu
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

My confusion is that text is by definition followed by // or eof. minh thu wrote:
You can use 'notFollowedBy' (probably with 'many1' and 'try'). Something like (untested):
notFollowedBy (try $ string "//")
Thu
2009/4/17 Michael Mossey
: Here's what I've got so far.
-- Text is considered everything up to //. However, the problem -- is that this consumes the //. parseText = manyTill anyChar (try (string "//"))
-- Because the // is already consumed, parseKeyword just grabs -- the available letters. parseKeyword :: Parser String parseKeyword = many1 letter
-- Test function. parseSome = do t1 <- parseText k1 <- parseKeyword t2 <- parseText return (t1,k1,t2)
On "some text//keyword more text//" this gives
("some text","keyword"," more text")
On "some text//keyword more text"
this gives the error "expecting //"
I wonder how I can get the manyTill to be happy with eof before finding the //? I tried
parseText = manyTill anyChar (try (string "//") <|> eof)
but got a type error.
minh thu wrote:
2009/4/17 Michael P Mossey
: I want to write a parser that can read a file with this format: the file has sections which are demarcated by keywords. Keywords always begin with two forward slashes and consist of letters, digits, and underscore. The text can be anything, including special characters. For instance:
//keyword some text and more text //another_keyword and) some { more text //ya_keyword $$ -- text
I'm not sure how to write a parser that considers anything but a double slash to be a valid part of the text. Maybe you can use a combination of 'many', 'noneOf' or 'manyTill' ?
Cheers, Thu
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2009/04/17 minh thu
2009/04/17 Michael Mossey
: I wonder how I can get the manyTill to be happy with eof before finding the //? I tried
parseText = manyTill anyChar (try (string "//") <|> eof)
but got a type error.
You can use 'notFollowedBy' [...]
You get a type error because `string "//"` parses to a `String` while `eof` parses to a `()`. Instead you might use: parseText = manyTill anyChar (try (string "//" >> return ()) <|> eof) -- Jason Dusek

Jason Dusek wrote:
2009/04/17 minh thu
: 2009/04/17 Michael Mossey
: I wonder how I can get the manyTill to be happy with eof before finding the //? I tried
parseText = manyTill anyChar (try (string "//") <|> eof)
but got a type error. You can use 'notFollowedBy' [...]
You get a type error because `string "//"` parses to a `String` while `eof` parses to a `()`. Instead you might use:
parseText = manyTill anyChar (try (string "//" >> return ()) <|> eof)
-- Jason Dusek
Ah.. I think I get it... in the function manyTill, the second argument type doesn't matter.. doesn't have to match the first argument type. Here's what I have so far. It works, but it's a bit weird to consume the // as part of the text rather than the keyword. That happens because the try( string "//" ), which is part of the end arg to manyTill, consumes the // when it succeeds. But maybe it is the most natural way to express the problem. parseKeyword :: Parser String parseKeyword = many1 (alphaNum <|> char '_') parseText :: Parser String parseText = manyTill anyChar ((try (string "//") >> return ()) <|> eof) parsePair :: Parser (String,String) parsePair = do k <- parseKeyword t <- parseText return (k,t) parseFile :: Parser [(String,String)] parseFile = do _ <- parseText -- to skip any text at beginning and 'sync up' p <- many parsePair return p

Michael Mossey wrote:
Here's what I have so far. It works, but it's a bit weird to consume the // as part of the text rather than the keyword. That happens because the try( string "//" ), which is part of the end arg to manyTill, consumes the // when it succeeds. But maybe it is the most natural way to express the problem.
use lookAhead!
parseKeyword :: Parser String parseKeyword = many1 (alphaNum <|> char '_')
parseKeyword = string "//" >> many1 (alphaNum <|> char '_')
parseText :: Parser String parseText = manyTill anyChar ((try (string "//") >> return ()) <|> eof)
parseText = manyTill anyChar $ (lookAhead (try $ string "//") >> return ()) <|> eof (untested) C.

I've just about got this parser working, but wondering about something. Turns out I need "try" inside the "lookahead" here. parseText :: Parser String parseText = manyTill anyChar $ lookAhead (try (string "//")) Without try, if I give it an input with a single slash, like "some/text" It stops with the error "unexpected t; expecting //" I'm curious why that happens when lookAhead is used with manyTill like this. I was under the impression that if the end parser given to manyTill failed, then manyTill would just continue with the main parser. Apparently there are two ways to fail: in some contexts, failing means that manyTill will just continue. In other contexts, such as the one above, there is some sense in which 'string' demands the entire string to be present. Can anyone explain? Thanks, Mike

Am Samstag 18 April 2009 01:33:44 schrieb Michael P Mossey:
I've just about got this parser working, but wondering about something. Turns out I need "try" inside the "lookahead" here.
parseText :: Parser String parseText = manyTill anyChar $ lookAhead (try (string "//"))
Without try, if I give it an input with a single slash, like
"some/text"
It stops with the error "unexpected t; expecting //"
I'm curious why that happens when lookAhead is used with manyTill like this. I was under the impression that if the end parser given to manyTill failed, then manyTill would just continue with the main parser. Apparently there are two ways to fail: in some contexts, failing means that manyTill will just continue. In other contexts, such as the one above, there is some sense in which 'string' demands the entire string to be present. Can anyone explain?
Looking at the source: manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st [a] manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) } if end fails after consuming some input, manyTill p end fails. lookAhead :: GenParser tok st a -> GenParser tok st a lookAhead p = do{ state <- getParserState ; x <- p ; setParserState state ; return x } lookAhead fails if p fails, but if p fails, the state is not reset, so if p fails after consuming some input, like in your example "some/text", where lookAhead (string "//") consumes the slash and fails because the second expected slash is missing, that is not put back and since something is consumed, the second branch of scan in manyTill isn't tried. You could also have keyword = try $ do string "//" kw <- many1 keywordChar return (Keyword kw) parseText = manyTill anyChar (lookAhead keyword) Seems cleaner to have the slashes in keyword.
Thanks, Mike
Cheers, Daniel
participants (6)
-
Christian Maeder
-
Daniel Fischer
-
Jason Dusek
-
Michael Mossey
-
Michael P Mossey
-
minh thu