[Parsec] A combinator to match between M and N times?

Stephane Bortzmeyer

29 Aug 2006 29 Aug '06

1:05 p.m.

Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec. Much to my surprise, it seems quite difficult to write it myself and, until now, I failed (the best result I had was with the "option" combinator, which unfortunately requires a dummy value, returned when the parser fails). Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

Show replies by date

Tomasz Zielonka

29 Aug 29 Aug

1:11 p.m.

New subject: [Parsec] A combinator to match between M and N times?

On Tue, Aug 29, 2006 at 03:05:39PM +0200, Stephane Bortzmeyer wrote:

...

Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

...

Much to my surprise, it seems quite difficult to write it myself and, until now, I failed (the best result I had was with the "option" combinator, which unfortunately requires a dummy value, returned when the parser fails).

How about this? countBetween m n p = do xs <- count m p ys <- count (n - m) $ option Nothing $ do y <- p return (Just y) return (xs ++ catMaybes ys) Assuming n >= m.

...

Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

No liftM, as requested :-) Best regards Tomasz

Henning Thielemann

1:31 p.m.

New subject: [Parsec] A combinator to match between M and N times?

On Tue, 29 Aug 2006, Tomasz Zielonka wrote:

...

How about this?

countBetween m n p = do xs <- count m p ys <- count (n - m) $ option Nothing $ do y <- p return (Just y) return (xs ++ catMaybes ys)

Assuming n >= m.

...
Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

No liftM, as requested :-)

Although, one is tempted to write liftM Just p instead of do y <- p return (Just y) :-)

Stephane Bortzmeyer

1:38 p.m.

New subject: [Parsec] A combinator to match between M and N times?

On Tue, Aug 29, 2006 at 03:11:09PM +0200, Tomasz Zielonka wrote a message of 28 lines which said:

...

How about this?

It works fine, many thanks. Here is the version rewritten according to my taste: import Text.ParserCombinators.Parsec import Data.Maybe (catMaybes) countBetween m n p | n < m = error "First bound must be lower or equal than second bound" | otherwise = do xs <- count m p ys <- count (n - m) ((option Nothing) (do y <- p return (Just y))) return (xs ++ catMaybes ys)

Robert Dockins

2:35 p.m.

New subject: [Parsec] A combinator to match between M and N times?

On Aug 29, 2006, at 9:11 AM, Tomasz Zielonka wrote:

...

On Tue, Aug 29, 2006 at 03:05:39PM +0200, Stephane Bortzmeyer wrote:

...
Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

...
Much to my surprise, it seems quite difficult to write it myself and, until now, I failed (the best result I had was with the "option" combinator, which unfortunately requires a dummy value, returned when the parser fails).

How about this?

countBetween m n p = do xs <- count m p ys <- count (n - m) $ option Nothing $ do y <- p return (Just y) return (xs ++ catMaybes ys)

Assuming n >= m.

...
Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

No liftM, as requested :-)

Here's an interesting puzzle. For a moment, consider parsec only wrt its language-recognition capabilities. Then, we expect the count combinator to factor, count x p >> count y p === count (x+y) p where === mean "accepts the same set of strings". I somehow intuitively expect the countBetween combinator to factor in a similar way also, but it doesn't (at least, none of the posted versions do)! Note the output of: parser1 = countBetween 3 7 (char 'a') >> eof parser2 = countBetween 2 3 (char 'a') >> countBetween 1 4 (char 'a')

...

...
eof

main = do print $ parse parser1 "" "aaa" print $ parse parser2 "" "aaa" OK. What's happening is that the greedy nature of the combinator breaks things because parsec doesn't do backtracking by default. I'd expect to be able to insert 'try' in the right places to make it work. However, after playing around for a few minutes, I can't figure out any combination that does it. Is it possible to write this combinator so that it factors in this way?

...

Best regards Tomasz

Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Chris Kuklewicz

2:39 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Robert Dockins wrote:

...

On Aug 29, 2006, at 9:11 AM, Tomasz Zielonka wrote:

...
On Tue, Aug 29, 2006 at 03:05:39PM +0200, Stephane Bortzmeyer wrote:

...
Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

...
Much to my surprise, it seems quite difficult to write it myself and, until now, I failed (the best result I had was with the "option" combinator, which unfortunately requires a dummy value, returned when the parser fails).

How about this?

countBetween m n p = do xs <- count m p ys <- count (n - m) $ option Nothing $ do y <- p return (Just y) return (xs ++ catMaybes ys)

Assuming n >= m.

...
Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

No liftM, as requested :-)

Here's an interesting puzzle. For a moment, consider parsec only wrt its language-recognition capabilities.

Then, we expect the count combinator to factor,

count x p >> count y p === count (x+y) p

where === mean "accepts the same set of strings".

I somehow intuitively expect the countBetween combinator to factor in a similar way also, but it doesn't (at least, none of the posted versions do)! Note the output of:

parser1 = countBetween 3 7 (char 'a') >> eof parser2 = countBetween 2 3 (char 'a') >> countBetween 1 4 (char 'a') >> eof

main = do print $ parse parser1 "" "aaa" print $ parse parser2 "" "aaa"

OK. What's happening is that the greedy nature of the combinator breaks things because parsec doesn't do backtracking by default. I'd expect to be able to insert 'try' in the right places to make it work. However, after playing around for a few minutes, I can't figure out any combination that does it. Is it possible to write this combinator so that it factors in this way?

My regex-parsec part of TextRegexLazy implements Greedy,Lazy,and Possessive semantics for regular expressions using Parsec. It is not obvious at first how to insert <|> and 'try'. You have to use a continuation style. The above example could be simply done, however, as: count 2 (char 'a') choice [count 1 (char 'a') >> countBetween 1 4 (char 'a') ,countBetween 1 4 (char 'a') ] This can be automated. A not-maximally efficient version would be: cb m n p cont | m<=n = do xs <- count m p let rep 0 = return xs rep i = do ys <- count i p return (xs++ys) choice [ try (rep i >>= cont) | i <- [(n-m),(n-m)-1 .. 0] ] test = cb 2 3 (string "ab") (\xs -> cb 1 4 (string "ab") (\ys -> return (xs,ys))) go = runParser test () "" "abababac" Where go now returns Right (["ab","ab"],["ab"])

Chris Kuklewicz

3:40 p.m.

New subject: Greedy Lazy Possessive

How to use parsec to match optional things. Lets consider regex style matching of greedy a?b, lazy a??b, and possessive a?+b. And consider greedy a*b, lazy a*?b, and possessive a*+b. Then I think these examples (cribbed from my module) will work: -- Building blocks greedyOpt p contFail contSuccess = try (p >> contSuccess) <|> contFail lazyOpt p contFail contSuccess = try contFail <|> (p >> contSuccess) possessiveOpt p contFail contSuccess = ((try p) >> contSuccess) <|> contFail -- Match p*cont p*?cont p*+cont greedyStar p cont = fix (greedyOpt p cont) lazyStar p cont = fix (lazyOpt p cont) possessiveStar p cont = fix (possessiveOpt p cont) -- Match p?cont p??cont p?+cont greedyQuest p cont = greedyOpt p cont cont lazyQuest p cont = lazyOpt p cont cont possessiveQuest p cont = possessiveOpt p cont cont Altering p to return a useful value is left to the user.

Chris Kuklewicz

2:11 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Stephane Bortzmeyer wrote:

...

Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

Much to my surprise, it seems quite difficult to write it myself and, until now, I failed (the best result I had was with the "option" combinator, which unfortunately requires a dummy value, returned when the parser fails).

Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-

The problem with the other solutions posted so far is for (countBetween 0 100 p) they will always try to run p 100 times, regardless of when it starts to fail. This can be made more efficient by stopping when p first fails. Also, you probably have to use "try p" for the (n-m) cases, so if p consumes some input it does not cause the whole operation to fail. import Text.ParserCombinators.Parsec -- This stops when p first fails, using option to hide <|> countBetween m n p | m<=n = do xs <- count m p let loop 0 acc = return (acc []) loop i acc = do -- mx <- option Nothing (liftM Just p) mx <- option Nothing (do x <- try p return (Just x)) case mx of Nothing -> return (acc []) Just x -> loop (pred i) (acc . (x:)) ys <- loop (n-m) id return (xs++ys) -- This also works and uses <|> directly instead of via option countBetween' m n p | m<=n = do xs <- count m try p let loop 0 acc = return (acc []) loop i acc = (do x<-p loop (pred i) (acc . (x:))) <|> (return (acc [])) ys <- loop (n-m) id return (xs++ys)

Chris Kuklewicz

2:15 p.m.

New subject: [Parsec] A combinator to match between M and N times?

I put the "try" in the wrong place. Silly me

...

-- This also works and uses <|> directly instead of via option countBetween' m n p | m<=n = do xs <- count m p let loop 0 acc = return (acc []) loop i acc = (do x <- try p loop (pred i) (acc . (x:))) <|> (return (acc [])) ys <- loop (n-m) id return (xs++ys)

Udo Stenzel

4:24 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Stephane Bortzmeyer wrote:

...

Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

infixr 2 <:> (<:>) = ap . ap (return (:)) countBetween 0 0 _ = return [] countBetween 0 n p = p <:> countBetween 0 (n-1) p <|> return [] countBetween m n p = p <:> countBetween (m-1) (n-1) p (Shortest solution yet, I think. Is primitive recursion somehow out of fashion? Should I rewrite it as two folds?)

...

Does anyone has a solution? Preferrably one I can understand, which means not yet with liftM :-)

As requested, though I believe a quick 'liftM2' would have been easier than two 'ap's. But if you prefer: a <:> b = do x <- a y <- b return (a:b) or what I'd ordinarily use: (<:>) = liftM2 (:) Udo. -- As Will Rogers would have said, "There is no such thing as a free variable."

Chris Kuklewicz

5:08 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Udo Stenzel wrote:

...

Stephane Bortzmeyer wrote:

...
Parsec provides "count n p" to run the parser p exactly n times. I'm looking for a combinator "countBetween m n p" which will run the parser between m and n times. It does not exist in Parsec.

infixr 2 <:> (<:>) = ap . ap (return (:))

Again, Parsec requires you to put "try" where you need it

...

countBetween 0 0 _ = return [] countBetween 0 n p = try p <:> countBetween 0 (n-1) p <|> return [] countBetween m n p = p <:> countBetween (m-1) (n-1) p

or what I'd ordinarily use:

(<:>) = liftM2 (:)

Udo Stenzel

10:24 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Chris Kuklewicz wrote:

...

Again, Parsec requires you to put "try" where you need it

I'm pretty sure it does, although this

...

Udo Stenzel wrote:

...
countBetween 0 n p = p <:> countBetween 0 (n-1) p <|> return []

is a place where it's not needed in general. You should know what you're doing, though. And I like ReadP better in general, exactly because 'try' is a bit trippy. Udo. -- C'est magnifique, mais ce n'est pas l'Informatique. -- Bosquet [on seeing the IBM 4341]

Chris Kuklewicz

11:07 p.m.

New subject: [Parsec] A combinator to match between M and N times?

Udo Stenzel wrote:

...

Chris Kuklewicz wrote:

...
Again, Parsec requires you to put "try" where you need it

I'm pretty sure it does, although this

...
Udo Stenzel wrote:

...
countBetween 0 n p = p <:> countBetween 0 (n-1) p <|> return []

is a place where it's not needed in general.

No, see my example below. It is not needed in the specific case where p can only consume a single token.

...

You should know what you're doing, though. And I like ReadP better in general, exactly because 'try' is a bit trippy.

And I have not used ReadP enough to be good with it.

...

Udo.

As long as you test with 'p' that consumes a single token, there is no difference using try (<:>) = liftM2 (:) countBetween 0 0 _ = return [] countBetween 0 n p = p <:> countBetween 0 (n-1) p <|> return [] countBetween m n p = p <:> countBetween (m-1) (n-1) p Expand countBetween 0 n p = (do value_p <- p value_rest <- countBetween 0 (n-1) p return (value : value_rest) <|> (return []) If p fails without consuming tokens, then the <|> runs (return []). If p fails after consuming tokens, then the <|> will fail instead of running (return []). By replacing p with (try p) this will work.

...

test = countBetween 3 5 (string "ab") go = runParser test () "" "abababac"

Without try this fails to parse with an error message: Left (line 1, column 7): unexpected "c" expecting "ab" With the try this succeeds: Right ["ab","ab","ab"] I assume you think the former is problem. If you do not then we are trying to define different operations. The problem is that string "ab" consumes the 'a' character/token before it fails. The 'try' wraps this failure, pretending p did not consume any input, so the <|> will then run (return []) as we want. It is not needed in the line countBetween m n p = p <:> countBetween (m-1) (n-1) p because if p fails then the whole operation should fail. There is a benefit to this difficult behavior! The <|> choice point for backtracking is expensive. By committing to the first branch when it consumes a token, the Parsec monad can completely forget the other branch of <|> which allows it to be garbage collected. This default is handy for preventing memory leaks, but it has to be overridden by 'try'. One can package <|> and 'try' in useful combinators such as countBetween so the use of them becomes slightly easier. -- Chris

Chris Kuklewicz

30 Aug 30 Aug

7:45 p.m.

New subject: ReadP question

Udo Stenzel wrote:

...

Chris Kuklewicz wrote:

...
Again, Parsec requires you to put "try" where you need it

I'm pretty sure it does, although this

...
Udo Stenzel wrote:

...
countBetween 0 n p = p <:> countBetween 0 (n-1) p <|> return []

is a place where it's not needed in general. You should know what you're doing, though. And I like ReadP better in general, exactly because 'try' is a bit trippy.

Udo.

I just tried to mimic regular expression matching with ReadP and got what seems like a non-terminating program. Is there another way to use ReadP to do this?

...

import Control.Monad import Text.ParserCombinators.ReadP

type R = ReadP Int

-- Consume a specific character, return length 1 c :: Char -> R c x = char x >> return 1

-- Consume like x? x+ x* and return the length quest,plus,star :: R -> R quest x = option 0 x plus x = liftM sum (many1 x) star x = liftM sum (many x)

-- Concatenate two with sum of lengths infixr 5 +> (+>) :: R -> R -> R (+>) x y = liftM2 (+) x y

-- Concatenate list with running total of length match xs = match' xs 0 where match' [] t = return t match' (x:xs) t = do v <- x match' xs $! t+v

-- Simulate "(a?|b+|c*)*d" regular expression test = star (choice [quest (c 'a') ,plus (c 'b') ,star (c 'c')]) +> c 'd'

go foo = readP_to_S test foo

'go' works if I remove the leading 'star' operation from 'test' But 'go' seems to not terminate with the leading 'star' My regex-dfa package has a failure which seems similar, and I have been adding the ability to internally rewrite the pattern to avoid the problem. -- Chris

Tom Phoenix

11:25 p.m.

New subject: ReadP question

On 8/30/06, Chris Kuklewicz wrote:

...

...
-- Simulate "(a?|b+|c*)*d" regular expression

...

But 'go' seems to not terminate with the leading 'star'

Unless I'm missing something... The part of the pattern inside the parentheses should successfully match at least the empty string at the beginning of the string. Since it's regulated by the second (outer) 'star', it will keep matching as long as it keeps succeeding; since it keeps matching the empty string, it keeps matching forever in the same spot. To solve this problem, your implementation of 'star' could perhaps be changed to answer "no more matches" rather than "infinitely many matches" once the body fails to consume any characters. Hope this helps! --Tom Phoenix

Chris Kuklewicz

31 Aug 31 Aug

3:46 p.m.

New subject: ReadP question

Tom Phoenix wrote:

...

On 8/30/06, Chris Kuklewicz wrote:

...
...
-- Simulate "(a?|b+|c*)*d" regular expression

...
But 'go' seems to not terminate with the leading 'star'

Unless I'm missing something... The part of the pattern inside the parentheses should successfully match at least the empty string at the beginning of the string. Since it's regulated by the second (outer) 'star', it will keep matching as long as it keeps succeeding; since it keeps matching the empty string, it keeps matching forever in the same spot.

To solve this problem, your implementation of 'star' could perhaps be changed to answer "no more matches" rather than "infinitely many matches" once the body fails to consume any characters.

Hope this helps!

--Tom Phoenix

And that is indeed the solution. But then I wanted $ end-of-line anchors (easy) and ^ begin-of-line anchors (annoying). But it works now: -- | Using ReadP to simulate regular expressions, finding the longest match -- by Chris Kuklewicz, public domain import Control.Monad import Data.Set(Set,member) import Data.Maybe(maybe) import Text.ParserCombinators.ReadP type R = Char -> ReadP (Int,Char) dot :: R dot _ = do x <- get return (1,x) anyOf :: Set Char -> R anyOf s _ = do x <- satisfy (`member` s) return (1,x) noneOf :: Set Char -> R noneOf s _ = do x <- satisfy (not.(`member` s)) return (1,x) c :: Char -> R c x _ = char x >> return (1,x) cs :: String -> R cs [] prev = return (0,prev) cs xs _ = string xs >> return (length xs,last xs) atBOL prev = case prev of '\n' -> return (0,prev) _ -> pfail atEOL prev = do rest <- look case rest of [] -> return (0,prev) ('\n':_) -> return (0,prev) _ -> pfail -- Consume like x? x+ x* and return the length quest,plus,star :: R -> R quest x = x <|> (\prev -> return (0,prev)) plus x = x +> star x star x prev = until0 0 prev where until0 t prev' = do (len,prev'') <- quest x prev' if (0==len) then return (t,prev'') else let tot = t + len in seq tot (until0 tot prev'') upToN :: Int -> R -> R upToN n x = helper n where helper 0 prev t = return (t,prev) helper i prev t = do (len,prev') <- x prev if 0==len then return (t,prev') else helper (pred i) prev' $! t+len ranged 0 Nothing x = star x ranged 0 (Just n) x | n>0 = upToN n x ranged m n x | (m>=0) && maybe True (\n'->n'>=m) n = doSeq (replicate m x) +> (ranged 0 (fmap (subtract m) n) x) | otherwise = (\prev -> return (0,prev)) infixr 6 +> infixr 5 <|> (+>),(<|>) :: R -> R -> R (+>) x y = (\prev -> do (lenX,prev') <- x prev (lenY,prev'') <- y prev' let tot = lenX + lenY seq tot (return (tot,prev'')) ) (<|>) x y = (\prev -> (x prev) +++ (y prev)) orSeq,doSeq :: [R] -> R orSeq [] prev = return (0,prev) orSeq xs prev = foldr1 (<|>) xs $ prev doSeq [] prev = return (0,prev) doSeq xs prev = foldr1 (+>) xs $ prev -- Simulate "(^a|b+|c*|^.)*(d|_rest_)$" regular expression test = star (orSeq [quest (c 'a') ,plus (c 'b') ,star (c 'c') ,atBOL +> dot ]) +> doSeq [c 'd' <|> cs "_rest_",atEOL] go foo = case readP_to_S (gather (test '\n')) foo of [] -> Nothing xs -> Just (last xs)

Udo Stenzel

9:39 a.m.

New subject: ReadP question

Chris Kuklewicz wrote:

...

I just tried to mimic regular expression matching with ReadP and got what seems like a non-terminating program. Is there another way to use ReadP to do this?

...
-- Simulate "(a?|b+|c*)*d" regular expression test = star (choice [quest (c 'a') ,plus (c 'b') ,star (c 'c')]) +> c 'd'

Indeed, this cannot work. ReadP delivers parses in order of increasing length, and your expression produces infinitely many parses of the empty string, you never get to the interesting matches. I'd say, the best solution is to use an equivalent regex which does not contain something of the form 'many (return x)':

...

...
-- Simulate "(a?|b+|c*)*d" regular expression test' = star (choice [(c 'a') ,plus (c 'b') ,plus (c 'c')]) +> c 'd'

Udo. -- f u cn rd ths, u cn gt a gd jb n cmptr prgrmmng.

6883

Age (days ago)

6885

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Chris Kuklewicz
Henning Thielemann
Robert Dockins
Stephane Bortzmeyer
Tom Phoenix
Tomasz Zielonka
Udo Stenzel