Parsec question

Levi Stephen

21 Jun 2007 21 Jun '07

6:04 a.m.

Hi, Fairly new to Haskell and trying some parsec. (Also, new to parsers/interpreters) I had come up with this, which works, but I can't help thinking there's a better way :) |> newtype Identifier = Identifier String

...

newtype Literal = StringLiteral String -- to be extended later data Primary = PrimaryLiteral Literal | PrimaryIdentifier Identifier

||> primary = do {

...

i <- identifier; return $ PrimaryIdentifier i; } <|> do { l <- stringLiteral; return $ PrimaryLiteral l; }

||> identifier = do

...

i <- many1 letter return $ Identifier i

stringLiteral = do (char '\'') s <- manyTill anyChar (char '\'') return $ StringLiteral s

Is there a way through combining types/parsers that the double do block in primary could be avoided? I understand it's necessary right now because the parsers identifier and stringLiteral return different types, so I can't just write:

...

i <- identifier <|> stringLiteral

So, I'm not sure whether my types need work, the parsers, or if this is the simplest way. Thanks, Levi lstephen.wordpress.com PS, have lurked on the list for a while, enjoy the discussions and content. |

Show replies by date

Tomasz Zielonka

21 Jun 21 Jun

6:28 a.m.

On Thu, Jun 21, 2007 at 03:34:54PM +0930, Levi Stephen wrote:

...

Is there a way through combining types/parsers that the double do block in primary could be avoided?

I understand it's necessary right now because the parsers identifier and stringLiteral return different types, so I can't just write:

...
i <- identifier <|> stringLiteral

You can use the fact that (GenParser tok st) is a monad and use liftM: i <- liftM PrimaryIdentifier identifier <|> liftM PrimaryLiteral stringLiteral I often find it convenient to use "choice" instead of <|> for long more complicated alternatives, for example like this: primary = choice [ do i <- identifier return $ PrimaryIdentifier i , do l <- stringLiteral return $ PrimaryLiteral l ]

...

So, I'm not sure whether my types need work,

I have a feeling that Identifier and Literal could just be type synonyms, because newtype's don't seem to be neccesary or beneficial here. I could be wrong though - after all, I don't know your intentions and the rest of the program. Best regards Tomek

Dave Tapley

8:51 a.m.

I find it's good for the soul to remember what the do notation is doing for us. Also I'm with Einstein on "You do not really understand something unless you can explain it to your grandmother" :) Personally I think (in this instance) your three 'Parser a' functions read nicer as:

...

primary = (identifier >>= (return . PrimaryIdentifier)) <|> (stringLiteral >>= (return . PrimaryLiteral)) identifier = (many1 letter) >>= (return . Identifier) stringLiteral = (char '\'') >> (manyTill anyChar (char '\'')) >>= (return . StringLiteral)

Looking at them in this form Tomek's point should seem clear now, especially when we look at the type signature for liftM: liftM :: Monad m => (a1 -> r) -> m a1 -> m r So we can (marginally) shorten down to:

...

primary = (liftM PrimaryIdentifier identifier) <|> (liftM PrimaryLiteral stringLiteral) identifier = liftM Identifier (many1 letter) stringLiteral = liftM StringLiteral ((char '\'') >> (manyTill anyChar (char '\'')))

You might like: http://syntaxfree.wordpress.com/2006/12/12/do-notation-considered-harmful/ Dave, On 21/06/07, Tomasz Zielonka wrote:

...

On Thu, Jun 21, 2007 at 03:34:54PM +0930, Levi Stephen wrote:

...
Is there a way through combining types/parsers that the double do block in primary could be avoided?

I understand it's necessary right now because the parsers identifier and stringLiteral return different types, so I can't just write:

...
i <- identifier <|> stringLiteral

You can use the fact that (GenParser tok st) is a monad and use liftM:

i <- liftM PrimaryIdentifier identifier <|> liftM PrimaryLiteral stringLiteral

I often find it convenient to use "choice" instead of <|> for long more complicated alternatives, for example like this:

primary = choice [ do i <- identifier return $ PrimaryIdentifier i , do l <- stringLiteral return $ PrimaryLiteral l ]

...
So, I'm not sure whether my types need work,

I have a feeling that Identifier and Literal could just be type synonyms, because newtype's don't seem to be neccesary or beneficial here. I could be wrong though - after all, I don't know your intentions and the rest of the program.

Best regards Tomek _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thomas Conway

10:36 a.m.

On 6/21/07, Dave Tapley wrote:

...

...
primary = (identifier >>= (return . PrimaryIdentifier)) <|> (stringLiteral >>= (return . PrimaryLiteral)) identifier = (many1 letter) >>= (return . Identifier) stringLiteral = (char '\'') >> (manyTill anyChar (char '\'')) >>= (return . StringLiteral)

I have found this a sufficiently common pattern that I have a little combinator to tidy it up: p `with` f = p >>= (return . f) so I can write primary = (identifier `with` PrimaryIdentifier) <|> (stringLiteral `with` PrimaryLiteral) Obviously you could write it in terms of liftM, choose a different name, &c, &c. FWIW, the other little combinator I invariably use is p `returning` x = p >>= (\_ -> return x) which I end up calling with () as the second argument so often (especially during development), I usually have another combinator void p = p >> return () YMMV, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

Jules Bean

12:52 p.m.

Thomas Conway wrote:

...

p `with` f = p >>= (return . f)

so I can write

primary = (identifier `with` PrimaryIdentifier) <|> (stringLiteral `with` PrimaryLiteral)

I would write primary = PrimaryIdentifier `fmap` identifer <|> PrimaryLiteral `fmap` stringLiteral (I prefer fmap to liftM but they are the same for monads). To my mind this fits the general pattern of 'constructor comes before contents'. with is, of course, just fmap with the parameters reversed. It's a question of taste if it's better to define a new name or use an existing one.

...

p `returning` x = p >>= (\_ -> return x)

I see no convincing reason to prefer that to p >> return x (which is fewer characters and, to me, just as clear). In fact I'll also use do { p ; return x } and which of the two I choose will depend on context. If this is part of a large 'choice' construct I prefer to have each branch using the same notation (all do, or all not do). Jules

Thomas Conway

10:13 p.m.

On 6/21/07, Jules Bean wrote:

...

I would write

primary = PrimaryIdentifier `fmap` identifer <|> PrimaryLiteral `fmap` stringLiteral

(I prefer fmap to liftM but they are the same for monads). To my mind this fits the general pattern of 'constructor comes before contents'. with is, of course, just fmap with the parameters reversed.

Nice. I only discovered the joys of Data.Monad, fmap included relatively recently, well after I'd spent quite some time writing parsec parsers. I note that it was Parsec that converted me to Haskell from Mercury[*] - I wrote a library of parser combinators for Mercury, but the lambda notation in Mercury is not nearly so clean as Haskell's, so lacking special syntax for monadic code, the results looked horrible. Much faster - Mercury is a strict language, so it's easier to compile into respectable machine code - but horrible to read. cheers, T. [*] Well, that, and the fact that my thesis was about Mercury, which meant that once I was finished, I never wanted to look at Mercury again. ;-) -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

Levi Stephen

22 Jun 22 Jun

12:04 a.m.

...

I find it's good for the soul to remember what the do notation is doing for us.

Also I'm with Einstein on "You do not really understand something unless you can explain it to your grandmother" :)

Personally I think (in this instance) your three 'Parser a' functions read nicer as:

...
primary = (identifier >>= (return . PrimaryIdentifier)) <|> (stringLiteral >>= (return . PrimaryLiteral)) identifier = (many1 letter) >>= (return . Identifier) stringLiteral = (char '\'') >> (manyTill anyChar (char '\'')) >>= (return . StringLiteral) Thanks, I tried to do this, but got stuck. I was missing the 'return .'

Looking at them in this form Tomek's point should seem clear now, especially when we look at the type signature for liftM:

liftM :: Monad m => (a1 -> r) -> m a1 -> m r

So we can (marginally) shorten down to:

...
primary = (liftM PrimaryIdentifier identifier) <|> (liftM PrimaryLiteral stringLiteral) identifier = liftM Identifier (many1 letter) stringLiteral = liftM StringLiteral ((char '\'') >> (manyTill anyChar (char '\''))) I had initially tried following the types through of liftM as I didn't get Tomek's approach straight away. It did make a bit of sense, but with

Dave Tapley wrote: the extra step of getting rid of the do notation made it clearer.

...

You might like: http://syntaxfree.wordpress.com/2006/12/12/do-notation-considered-harmful/

Thanks for the link. I have been trying to avoid do notation, as I think it helps my understanding of what's really going on with monads. In this case though, I got stuck when I left out the 'return .' and resorted to do notation because it worked :( Levi lstephen.wordpress.com

...

Tillmann Rendel

21 Jun 21 Jun

2:03 p.m.

Levi Stephen wrote:

...

...
newtype Identifier = Identifier String newtype Literal = StringLiteral String -- to be extended later data Primary = PrimaryLiteral Literal | PrimaryIdentifier Identifier

primary = do { i <- identifier; return $ PrimaryIdentifier i; } <|> do { l <- stringLiteral; return $ PrimaryLiteral l; }

||> identifier = do

...
i <- many1 letter return $ Identifier i

stringLiteral = do (char '\'') s <- manyTill anyChar (char '\'') return $ StringLiteral s

Is there a way through combining types/parsers that the double do block in primary could be avoided?

I prefer using Control.Monad.ap: primary = (return PrimaryIdentifier `ap` identifier) <|> (return PrimaryLiteral `ap` stringLiteral) identifier = return Identifier `ap` many1 letter stringLiteral = return StringLiteral `ap` (quote >> manyTill anyChar quote) quote = char '\'' This scales easily to the case of multiple fields per constructor, provided that the order of the subterms in the abstract syntax is the same as in the concrete syntax: data FunctionCall = FunctionCall Identifier [Primary] functionCall = return FunctionCall `ap` identifier `ap` parens (primary `sepBy` comma) parens = between lparen rparen lparen = char '(' rparen = char ')' comma = char ',' My self-defined monadic combinator of choice to use with parsec is a >>~ b = a >>= \x -> b >> return x It works like (>>), but returns the result of the first instead of the result of the second computation. It is kind of an alternative for between: between lparen rparen p == lparen >> p >>~ rparen It can be usefull like this: data Term = TVar Identifier | TTerm Identifier [Term] term = (return TTerm `ap` try (identififer >>~ lparen) `ap` (term `sepBy` comma >>~ rparen)) <|> (return TVar `ap` identifier) After accepting lparen, the second branch is discarded. Tillmann

Levi Stephen

11:51 p.m.

Tillmann Rendel wrote:

...

My self-defined monadic combinator of choice to use with parsec is

a >>~ b = a >>= \x -> b >> return x

It works like (>>), but returns the result of the first instead of the result of the second computation. It is kind of an alternative for between:

between lparen rparen p == lparen >> p >>~ rparen

Cool. I've always liked how Parsec made parsers so readable, and this makes it even more so.

...

It can be usefull like this:

data Term = TVar Identifier | TTerm Identifier [Term]

term = (return TTerm `ap` try (identififer >>~ lparen) `ap` (term `sepBy` comma >>~ rparen))

<|> (return TVar `ap` identifier)

After accepting lparen, the second branch is discarded.

Interesting. I think I'll have to keep this one around. Not sure if I'll need it, but its the kind of thing that would have taken me a while to solve ;) Levi lstephen.wordpress.com

6591

Age (days ago)

6592

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Dave Tapley
Jules Bean
Levi Stephen
Thomas Conway
Tillmann Rendel
Tomasz Zielonka

Parsec question

Levi Stephen

Tomasz Zielonka

Dave Tapley

Thomas Conway

Jules Bean

Thomas Conway

Levi Stephen

Tillmann Rendel

Levi Stephen

tags

participants (6)