Badly designed Parsec combinators?

So... I see no reason why someone can't just do it themselves, but... I was playing around with Parsec (as included in GHC 6.4.1), and I found two functions that are... not quite what I believe they should be. optional :: GenParser tok st a -> GenParser tok st () optional p = do{ p; return ()} <|> return () Now, this completely loses the result of the optional parser. Better would be: optional :: GenParser tok st a -> GenParser tok st (Maybe a) optional p = do{ x <- p; return (Just x) } <|> return Nothing Same thing with manyTill: manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st [a] manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) } Better: manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st ([a], end) manyTill p end = scan where scan = do{ endr <- end; return ([], endr) } <|> do{ x <- p; (xs, endr) <- scan; return (x:xs, endr) } Is there any reason I can't see why they are the way they are? JCAB

Am Sonntag, 12. Februar 2006 04:23 schrieb Juan Carlos Arevalo Baeza:
So... I see no reason why someone can't just do it themselves, but... I was playing around with Parsec (as included in GHC 6.4.1), and I found two functions that are... not quite what I believe they should be.
optional :: GenParser tok st a -> GenParser tok st () optional p = do{ p; return ()} <|> return ()
Now, this completely loses the result of the optional parser. Better would be:
optional :: GenParser tok st a -> GenParser tok st (Maybe a) optional p = do{ x <- p; return (Just x) } <|> return Nothing
Hmm, I haven't used Parsec very much yet, but so far, whenever the result of an optional parser (mostly a '-'-sign) mattered, I use e.g. option ' ' (char '-'). Your above parser would be option Nothing (fmap Just p) -- or you might use liftM. Both are easy enough. If you think the naming is unfortunate, I wouldn't flatly contradict, but it's too late now, I believe.
Same thing with manyTill:
manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st [a] manyTill p end = scan where scan = do{ end; return [] } <|> do{ x <- p; xs <- scan; return (x:xs) }
Better:
manyTill :: GenParser tok st a -> GenParser tok st end -> GenParser tok st ([a], end) manyTill p end = scan where scan = do{ endr <- end; return ([], endr) } <|> do{ x <- p; (xs, endr) <- scan; return (x:xs, endr) }
Is there any reason I can't see why they are the way they are?
JCAB
With manyTill, I tend to agree with you; sometimes (not often) I would need the result of end, too. So far, this meant that I didn't use Parsec then, but wrote my own library, including a 'test' combinator where 'test p' would succeed or fail as p would, but not consume any input in either case, so I would use 'manyTill p (test end)' and afterwards endr <- end. Since changing manyTill now would be unfeasible, I wonder if including such a 'test' would be a good idea (danger: somebody might use 'many (test p)') or if providing your parser (under the name manyTillPair or something) would be better. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

On Sun, Feb 12, 2006 at 01:57:07PM +0100, Daniel Fischer wrote:
Am Sonntag, 12. Februar 2006 04:23 schrieb Juan Carlos Arevalo Baeza:
optional :: GenParser tok st a -> GenParser tok st () optional p = do{ p; return ()} <|> return ()
Now, this completely loses the result of the optional parser. Better would be:
optional :: GenParser tok st a -> GenParser tok st (Maybe a) optional p = do{ x <- p; return (Just x) } <|> return Nothing
Your above parser would be
option Nothing (fmap Just p) -- or you might use liftM.
Both are easy enough. If you think the naming is unfortunate, I wouldn't flatly contradict, but it's too late now, I believe.
They are easy, but writing "option Nothing (liftM Just p)" for the nth time tends to be boring. I could write my own combinator, but all the good names are already taken. I too wish optional returned (Maybe a) and I wonder how many programs would be broken if it was changed now. Best regards Tomasz -- I am searching for programmers who are good at least in (Haskell || ML) && (Linux || FreeBSD || math) for work in Warsaw, Poland

Tomasz Zielonka wrote:
On Sun, Feb 12, 2006 at 01:57:07PM +0100, Daniel Fischer wrote:
Your above parser would be
option Nothing (fmap Just p) -- or you might use liftM.
Both are easy enough. If you think the naming is unfortunate, I wouldn't flatly contradict, but it's too late now, I believe.
They are easy, but writing "option Nothing (liftM Just p)" for the nth time tends to be boring. I could write my own combinator, but all the good names are already taken. I too wish optional returned (Maybe a) and I wonder how many programs would be broken if it was changed now.
The only programs it would break are those that specify it at the end (they'd require an extra "return ()", right? This brings me to wonder also if it'd be possible for the compilers to add a little bit more smarts to the "do" notation syntax, so that it'll add the return () at the end if it's missing. Maybe too much to ask of the Haskell crowd :). In any case... I called them "optionalKeep" and "manyTillKeep". As in... "keep the result". Thanx! It's good to know it's not just me. JCAB

Hello Juan, Sunday, February 12, 2006, 5:22:46 PM, you wrote: JCAB> The only programs it would break are those that specify it at the end JCAB> (they'd require an extra "return ()", right? JCAB> This brings me to wonder also if it'd be possible for the compilers JCAB> to add a little bit more smarts to the "do" notation syntax, so that JCAB> it'll add the return () at the end if it's missing ... and break code like this: readFile name = do h <- openFile name hGetContents h -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Sun, Feb 12, 2006 at 06:22:46AM -0800, Juan Carlos Arevalo Baeza wrote:
The only programs it would break are those that specify it at the end (they'd require an extra "return ()", right?
I can imagine many other cases, but none of them very likely.
This brings me to wonder also if it'd be possible for the compilers to add a little bit more smarts to the "do" notation syntax, so that it'll add the return () at the end if it's missing. Maybe too much to ask of the Haskell crowd :).
I wouldn't like that, as do-expressions without return at the end can be convenient. They can also make your intent clearer for other programmers and perhaps also the compiler, especially when you want to write tail-recursive monadic code (assuming a suitable monad and/or a sufficiently smart compiler). Best regards Tomasz -- I am searching for programmers who are good at least in (Haskell || ML) && (Linux || FreeBSD || math) for work in Warsaw, Poland

Tomasz Zielonka wrote:
On Sun, Feb 12, 2006 at 06:22:46AM -0800, Juan Carlos Arevalo Baeza wrote:
This brings me to wonder also if it'd be possible for the compilers to add a little bit more smarts to the "do" notation syntax, so that it'll add the return () at the end if it's missing. Maybe too much to ask of the Haskell crowd :).
I wouldn't like that, as do-expressions without return at the end can be convenient. They can also make your intent clearer for other programmers and perhaps also the compiler, especially when you want to write tail-recursive monadic code (assuming a suitable monad and/or a sufficiently smart compiler).
Right, I understand and share that thought. But that's not what I meant. I really didn't explain the way I should, and I didn't think it through. What I was proposing needs to be implemented not as an addition to the do-syntax sugar, but as something the compiler does to monads when matching their type. Take for instance this function: myParser :: Parser () myParser = do string "Hello" optional (string ", world!") It makes no sense for myParser to generate any values, especially not the result from the optional statement, so it is set to return (). But that function as written will not compile (with my proposed modification to "optional"), and so we have to manually add the "return ()" at the end. But... the thing is, if we have any "do" statement, or any monad whatsoever, which does not return (), and the program needs it to return () in order to be able to match its type, that transformation is always trivial. It just has to add ">> return ()" to it. () is special that way, because there's only one possible value, and monads are also special already (do-notation, for instance). Another case where I encounter this is with the "when" function: myParser2 :: Bool -> Parser () myParser2 all = do string "Hello" when all $ do string ", world" string "!" Again that doesn't compile, because "when" requires a ()-returning monad as its second parameter, but the "string" parser returns "String". Same thing with if-then-else, when used to switch IO actions and such: the IO actions must fully match in type, even if the returned value will be discarded, and again that can be trivially resolved by adding the "return ()". All I'm saying is that I wish the language and the compiler would take care of that for me. Hence what I said: maybe too much for the Haskell crowd to start playing games with the type system like this. It resembles a lot the automatic conversions that C++ does. I agree Haskell can't have those in any form, but still... "return ()"... Thanx! JCAB

Juan Carlos Arevalo Baeza wrote:
Another case where I encounter this is with the "when" function:
myParser2 :: Bool -> Parser () myParser2 all = do string "Hello" when all $ do string ", world" string "!"
I made a function (did I miss one in the base package?) ignore :: Monad m => m a -> m () ignore m = m >> return() and write "ignore $ string "..." if necessary. "when b m" is "if b then m else return()". I don't think that the then- or else- branch of any if- expression schould be automatically casted to some matching type "m ()" (and I don't know what implications this would have to typing in general). However "when b m" could be generalized by "if b then ignore m else return ()". (The same applies to "unless") Cheers Christian

Juan Carlos Arevalo Baeza wrote:
myParser :: Parser () myParser = do string "Hello" optional (string ", world!")
It makes no sense for myParser to generate any values, especially not the result from the optional statement, so it is set to return ().
Don't you think this will interfere somehow with type inference? I wouldn't like a function that might decide to throw away its result if (erroneously) used in a context that wouldn't need it. I also think almost every function has a sensible result, and written with the right combinator, can return it without too much hassle. So I'd probably write: yourParser :: Parser String yourParser = liftM2 (++) (string "Hello") (option "" (string ", world!") I also find it very convenient to have a combinator that does a bind and return the unmodified result of the first computation. With that you get: (*>) :: Monad m => m a -> m b -> m a m *> n = do a <- m ; n ; return a ourParser :: Parser String ourParser = string "Hello" *> optional (string ", world!") Therefore, implicit (return ()) is selsdom useful, has the potential to cause annoying silent failures and is generally not worth the hassle.
Another case where I encounter this is with the "when" function:
myParser2 :: Bool -> Parser () myParser2 all = do string "Hello" when all $ do string ", world" string "!"
A better fix would be more flexible when: when :: Monad m => Bool -> m a -> m (Maybe a) when True m = Just `liftM` m when False _ = return Nothing ...which is quite similar to the proposed change to Parsec's 'optional'. I'd support both.
It resembles a lot the automatic conversions that C++ does.
I'm more reminded of Perl... Udo. -- Avoid strange women and temporary variables.

On Thu, Feb 16, 2006 at 04:36:06PM +0100, Christian Maeder wrote:
Udo Stenzel wrote:
(*>) :: Monad m => m a -> m b -> m a m *> n = do a <- m ; n ; return a
Right, that one is really useful. I named it (<<), though, conforming to (>>=) versus (=<<).
But =<< first executes the second argument... Best regards Tomasz -- I am searching for programmers who are good at least in (Haskell || ML) && (Linux || FreeBSD || math) for work in Warsaw, Poland

Tomasz Zielonka wrote:
On Thu, Feb 16, 2006 at 04:36:06PM +0100, Christian Maeder wrote:
Udo Stenzel wrote:
(*>) :: Monad m => m a -> m b -> m a m *> n = do a <- m ; n ; return a Right, that one is really useful. I named it (<<), though, conforming to (>>=) versus (=<<).
But =<< first executes the second argument...
... so much to striking symbolic identifiers. (In fact, I've never used =<<.) Christian

Udo Stenzel wrote:
Juan Carlos Arevalo Baeza wrote:
myParser :: Parser () myParser = do string "Hello" optional (string ", world!")
It makes no sense for myParser to generate any values, especially not the result from the optional statement, so it is set to return ().
Don't you think this will interfere somehow with type inference?
With type inference? No, why? I mean... specifying the type of a function (as is recommended practice in multiple places) places a hard-point in the type system. It is even sometimes critical to make the types of a program totally predictable (or decidable), as when it's needed to resolve the the monomorphism restriction. Therefore, the language/compiler can do things to types at those hard points. If, say, the compiler needs to match some expression with "IO ()" (or else it'll throw an error), and it infers "IO String", it can unambiguously resolve it by adding the ">> return ()". In my opinion, this would make the program better by removing chaff. In the function above, the "string" statement returns a value which is ignored because it is in the middle of a do-notation sequence. The second statement, after my proposed change, also returns a value. But this value currently cannot be ignored like the others in the do-sequence, even though it could without ambiguity of any kind. If I hadn't specified the type of "myParser", it would have gotten the inferred type "Parser (Maybe String)". But I should be able to specify the more general one "Parser ()" because that change is decidable. In some conceptual way, this is no different than this: max :: Int -> Int -> Int max a b = if a > b then a else b In this case, I've forced the type of the function to be more restrictive (and definitely different) than what it would have had if the type signature weren't there.
I wouldn't like a function that might decide to throw away its result if (erroneously) used in a context that wouldn't need it. I also think almost every function has a sensible result, and written with the right combinator, can return it without too much hassle. So I'd probably write:
yourParser :: Parser String yourParser = liftM2 (++) (string "Hello") (option "" (string ", world!")
Personally, that style is way too functional (and a bit lisp-ish) for me. I prefer just using: yourParser :: Parser String yourParser = do helloResult <- string "Hello" worldResult <- option "" $ string ", world!" return $ helloResult ++ worldResult But that's just a matter of style. In this case, that might even be a reasonable thing to do, returning this value from this function. But sometimes isn't. Sometimes, dropping results is the right thing to do.
I also find it very convenient to have a combinator that does a bind and return the unmodified result of the first computation. With that you get:
(*>) :: Monad m => m a -> m b -> m a m *> n = do a <- m ; n ; return a
ourParser :: Parser String ourParser = string "Hello" *> optional (string ", world!")
So you do drop returned values now and then? But with that function you lose out on the do-notation convenience.
Therefore, implicit (return ()) is selsdom useful, has the potential to cause annoying silent failures and is generally not worth the hassle.
Useful? No more than the do-notation. They are both conveniences. No more than the "liftM2" function you used above: that's another convenience. All languages are full of conveniences that are not strictly necessary. Annoying silent failures? No more than the ">>" monad combinator.
Another case where I encounter this is with the "when" function:
myParser2 :: Bool -> Parser () myParser2 all = do string "Hello" when all $ do string ", world" string "!"
A better fix would be more flexible when:
when :: Monad m => Bool -> m a -> m (Maybe a) when True m = Just `liftM` m when False _ = return Nothing
...which is quite similar to the proposed change to Parsec's 'optional'. I'd support both.
I like that.
It resembles a lot the automatic conversions that C++ does.
I'm more reminded of Perl...
I don't know perl :) Thanx! JCAB

Juan Carlos Arevalo Baeza wrote:
Udo Stenzel wrote:
Don't you think this will interfere somehow with type inference?
With type inference? No, why? I mean... specifying the type of a function [...]
Okay, so want an implicit (return ()) only if the type of the do-block has been explicitly specified as (m ()) for a monad m. I've become used to type inference and assumed you wanted to tack on the (return ()) if the corresponding type was only inferred. The former is less destructive, of course. I still dislike it, being a special rule with very limited use.
yourParser :: Parser String yourParser = liftM2 (++) (string "Hello") (option "" (string ", world!")
Personally, that style is way too functional (and a bit lisp-ish) for me.
Uhm, well, of course you're entitled to an opinion, but I know where to find the children of Algol when I need them...
ourParser :: Parser String ourParser = string "Hello" *> optional (string ", world!")
So you do drop returned values now and then? But with that function you lose out on the do-notation convenience.
Why, yes, I do, but I like being explicit about it. (And I'm not sure that (*>) is explicit enough.) And I must confess, I don't find do-notation all that convenient. If it weren't for fail being called if a pattern match fails, I'd probably never use it at all.
I'm more reminded of Perl...
I don't know perl :)
You're a very lucky man. (No, seriously, Perl is quite the opposite of Haskell in nearly every aspect.) Udo. -- fork(2) New processes are created by other processes, just like new humans. New humans are created by other humans, of course, not by processes. -- Unix System Administration Handbook

On Thu, Feb 16, 2006 at 04:22:40AM -0800, Juan Carlos Arevalo Baeza wrote:
But... the thing is, if we have any "do" statement, or any monad whatsoever, which does not return (), and the program needs it to return () in order to be able to match its type, that transformation is always trivial. It just has to add ">> return ()" to it. () is special that way, because there's only one possible value, and monads are also special already (do-notation, for instance).
How do you know what type the do statement returns in general? in haskell, type inference goes both directions, deciding the type at any point depends not just on what routine you are calling but the context it is called in.
Again that doesn't compile, because "when" requires a ()-returning monad as its second parameter, but the "string" parser returns "String". Same thing with if-then-else, when used to switch IO actions and such: the IO actions must fully match in type, even if the returned value will be discarded, and again that can be trivially resolved by adding the "return ()".
This is a straight up bug in the definition of when I hope we fix. it should have type when :: Bool -> IO a -> IO () when = ... John -- John Meacham - ⑆repetae.net⑆john⑈

On Feb 16, 2006, at 7:32 PM, John Meacham wrote:
...
Again that doesn't compile, because "when" requires a ()-returning monad as its second parameter, but the "string" parser returns "String". Same thing with if-then-else, when used to switch IO actions and such: the IO actions must fully match in type, even if the returned value will be discarded, and again that can be trivially resolved by adding the "return ()".
This is a straight up bug in the definition of when I hope we fix. it should have type
when :: Bool -> IO a -> IO () when = ...
Arguably this could be made true of *every* function which presently takes m () as an argument. That is, we could systematically go through the libraries and convert every function of type: f :: (Monad m) => .... -> m () -> ... into f :: (Monad m) => .... -> m otherwiseUnusedTypeVariable -> ... This would basically eliminate the need for "ignore". I can see taste arguments in either direction, but really the language ought to pick an alternative and use it everywhere (including for >>). -Jan-Willem Maessen
John
-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (8)
-
Bulat Ziganshin
-
Christian Maeder
-
Daniel Fischer
-
Jan-Willem Maessen
-
John Meacham
-
Juan Carlos Arevalo Baeza
-
Tomasz Zielonka
-
Udo Stenzel