Text.Regex.Base throws exceptions with makeRegexOptsM

Hi folks, I'm using Text.Regex.Base with the TDFA and PCRE backends. I want to compile regular expressions first and make sure the patterns were actually valid, so I used makeRegexOptsM, which indicates a bad regular expression by calling fail. That allows you to use makeRegexOptsM with Maybe or with (Either String) (assuming that Either String is an instance of Monad, which of course is defined in Control.Monad.Error.) Doing this with Maybe Regex works like it should--bad pattern gives you a Nothing. But if you want to see the error message by using Either String, an exception gets thrown with the bad pattern, rather than getting a Left String. Why is this? Seems like an odd bug somewhere. I am a Haskell novice, but I looked at the code for Text.Regex.Base and for the TDFA and PCRE backends and there's nothing in there to suggest this kind of behavior--it should work with Either String. The attached code snippet demonstrates the problem. I'm on GHC 7.0.3 (though I also got the problem with 6.12.3) and regex-base-0.93.2 and regex-tdfa-1.1.8 and regex-pcre-0.94.2. Thanks very much for any tips or ideas. --Omari

On Thursday 29 December 2011, 23:52:46, Omari Norman wrote:
Hi folks,
I'm using Text.Regex.Base with the TDFA and PCRE backends. I want to compile regular expressions first and make sure the patterns were actually valid, so I used makeRegexOptsM, which indicates a bad regular expression by calling fail. That allows you to use makeRegexOptsM with Maybe or with (Either String) (assuming that Either String is an instance of Monad, which of course is defined in Control.Monad.Error.)
Doing this with Maybe Regex works like it should--bad pattern gives you a Nothing. But if you want to see the error message by using Either String, an exception gets thrown with the bad pattern, rather than getting a Left String.
Why is this?
The cause is that a pattern-match failure in a do-block or equivalent causes the Monad's 'fail' method to be invoked. For Maybe, we have fail _ = Nothing For Either, there used to be instance Error e => Monad (Either e) where ... fail s = Left (strMsg s) in mtl's Control.Monad.error, and all was fine if one used the regex functions with e.g. (Either String) as the Monad. Recently, however, it was decided to have instance Monad (Either e) where ... fail s = error s -- not explicitly, but by Monad's default method in Control.Monad.Instances. So now, if you have a pattern-match failure using (Either String), you don't get a nice 'Left message' but an error. So why was it decided to have that change? 'fail' doesn't properly belong in the Monad class, it was added for the purpose of dealing with pattern-match failures, but most monads can't do anything better than abort with an error in such cases. 'fail' is widely considered a wart. On the other hand, the restriction to Either's first parameter to belong to the Error class is artificial, mathematically, (Either e) is a Monad for every type e. And (Either e) has use-cases as a Monad for types which aren't Error members. So the general consensus was that it was better to get rid of the arbitrary (Error e) restriction. Now, what can you do to get the equivalent of the old (Either String)? Use 'ErrorT String Identity'. It's a bit more cumbersome to get at the result, foo = runidentity . runErrorT $ bar but it's clean.
Seems like an odd bug somewhere.
A change in behaviour that was accepted as the price of fixing what was widely considered a mistake.
I am a Haskell novice, but I looked at the code for Text.Regex.Base and for the TDFA and PCRE backends and there's nothing in there to suggest this kind of behavior--it should work with Either String.
It used to.
The attached code snippet demonstrates the problem. I'm on GHC 7.0.3 (though I also got the problem with 6.12.3) and regex-base-0.93.2 and regex-tdfa-1.1.8 and regex-pcre-0.94.2. Thanks very much for any tips or ideas. --Omari

On Fri, Dec 30, 2011 at 1:24 PM, Daniel Fischer
On Thursday 29 December 2011, 23:52:46, Omari Norman wrote:
[...]
'fail' doesn't properly belong in the Monad class, it was added for the purpose of dealing with pattern-match failures, but most monads can't do anything better than abort with an error in such cases. 'fail' is widely considered a wart.
I thought I'd add my own reason why I don't like fail. Take these two functions, for example: test :: Maybe Int test = do (Right v) <- Just (Left 1) return v test' :: Maybe Int test' = do let (Right v) = Left 1 return v The first returns Nothing. The second crashes with a pattern match failure. Why should a pattern failure cause a crash everywhere *except* a do binding? It makes no sense. It violates the principle of least surprise by behaving differently to every other occurrence of pattern matching in the whole language. As for custom failures, I'd recommend either Michael Snoyman's Failure class or MonadPlus, which were both designed for this sort of thing. But I'd stay away from using fail, since as Omari Norman said, it's a wart.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Dec 30, 2011 at 01:24:02AM +0100, Daniel Fischer wrote:
For Either, there used to be
instance Error e => Monad (Either e) where ... fail s = Left (strMsg s)
in mtl's Control.Monad.error, and all was fine if one used the regex functions with e.g. (Either String) as the Monad.
Recently, however, it was decided to have
instance Monad (Either e) where ... fail s = error s -- not explicitly, but by Monad's default method
in Control.Monad.Instances. So now, if you have a pattern-match failure using (Either String), you don't get a nice 'Left message' but an error.
Thanks so much, I would never have figured all this out. Spent a lot of time tonight rummaging through mtl and transformers and Control.Monad.Instances.
Now, what can you do to get the equivalent of the old (Either String)?
Use 'ErrorT String Identity'.
This I tried. It turned out that it didn't work though and I had the same problem. I am guessing it is because my module has some imports at the top that are bringing the instances in Control.Monad.Instances into scope. Then it seems the Monad instance in Control.Monad.Instances (which is using the default "fail", which calls "error") is being used, rather than the instance from Control.Monad.Trans.Error. Only now do I really understand why orphan instances are bad: http://www.haskell.org/haskellwiki/Orphan_instance A simple fix for it all was to wrap Either in a newtype and then define a Monad instance for the newtype. --Omari
participants (3)
-
Chris Wong
-
Daniel Fischer
-
Omari Norman