How to catch exception within the Get monad (the Binary package)

Hi, The following function* is supposed to decode a list of some serialized objects following each other in a lazy Bytestring: many :: Get a -> Get [a] many prs = many' [] where many' a = do s <- prs r <- isEmpty case r of True -> return (reverse a) False -> many' (s:a) prs is a "parser" to decode a single object. If however something goes wrong, and prs fails, the whole function fails (error is thrown). Since [a] (result of decoding) is a lazy list, actual exception may be thrown at any moment the list is being processed, and exception handler may not be properly set. Is there any way to catch/detect failures inside the Get monad? It is not an instance of MonadError, so catchError does not work. Ideally, the function would keep decoding as long as it is possible, and upon the first failure of the parser, return whatever has been decoded. Thanks. ----------------------------------------------- * there is one intentional inaccuracy in this function: isEmpty is called _after_ decoding is tried, so an empty ByteString would cause parser failure and exception right away; this is used as a test case. -- Dimitry Golubovsky Anywhere on the Web

If however something goes wrong, and prs fails, the whole function fails (error is thrown). Since [a] (result of decoding) is a lazy list, actual exception may be thrown at any moment the list is being processed, and exception handler may not be properly set.
True -> return (reverse a) False -> many' (s:a)
Lazy lists are built in left to right order, but Yours is right to left.
Is there any way to catch/detect failures inside the Get monad? lookAhead, lookAheadM, etc

On Sun, 5 Sep 2010, Dimitry Golubovsky wrote:
Hi,
The following function* is supposed to decode a list of some serialized objects following each other in a lazy Bytestring:
many :: Get a -> Get [a]
many prs = many' [] where many' a = do s <- prs r <- isEmpty case r of True -> return (reverse a) False -> many' (s:a)
prs is a "parser" to decode a single object.
It is more efficient to call fmap (s:) (many prs) in the recursion in order to avoid the final 'reverse'. The way you have implemented the loop, the complete list must be hold in memory, even if it is consumed lazily. The trick to catch exceptions lazily is to make them explicit, either by using http://hackage.haskell.org/packages/archive/capped-list/1.2/doc/html/Data-Ca... or http://hackage.haskell.org/packages/archive/explicit-exception/0.1.5/doc/htm... with an enclosed list. The latter solution is the more general one, but unfortunately it causes a space leak in GHC.

Also any half decent binary format should tell you how long the list is *before* you parse it, either: 1) How many elements it has - for this you just need a counting version of the many combinator. 2) The length of bytes that the flattened list takes. In this case the repeating combinator has to test length remaining before deciding whether to parse the next element. Neither should need exceptions.

Hi, Thanks to everybody who replied. I see another solution: are there any hidden problems? I found an interesting package, ChasingBottoms which contains a function testing a value to be bottom and returning a Boolean (of course it cannot be done without unsafePerformIO). I borrowed the idea from that package, and wrote two functions: unThrow :: (Exception e) => a -> Either e a unThrow a = unsafePerformIO $ (E.evaluate a >>= return . Right) `E.catch` (\e -> return $ Left e) -- or perhaps the right argument of catch could be just (return . Left)? bm2mb :: a -> Maybe a bm2mb a = case unThrow a of Left (e::SomeException) -> Nothing Right a -> Just a So, if there is a value inside the lazy list which is a bottom (binary parse failure of the last received object in this case, catching any possible exception) then the value can be converted to Nothing within pure code, and then excluded from the result using catMaybes. This solution seems to be working for me. PS Maybe the way I am using the serialized data needs to be changed by implementing some kind of an iterator over the binary stream, and then by taking one object at a time and consuming it right there the problem might be eliminated entirely... Thanks again. -- Dimitry Golubovsky Anywhere on the Web

Dimitry Golubovsky schrieb:
Hi,
Thanks to everybody who replied.
I see another solution: are there any hidden problems?
I found an interesting package, ChasingBottoms which contains a function testing a value to be bottom and returning a Boolean (of course it cannot be done without unsafePerformIO).
I borrowed the idea from that package, and wrote two functions:
unThrow :: (Exception e) => a -> Either e a
unThrow a = unsafePerformIO $ (E.evaluate a >>= return . Right) `E.catch` (\e -> return $ Left e)
-- or perhaps the right argument of catch could be just (return . Left)?
bm2mb :: a -> Maybe a
bm2mb a = case unThrow a of Left (e::SomeException) -> Nothing Right a -> Just a
So, if there is a value inside the lazy list which is a bottom (binary parse failure of the last received object in this case, catching any possible exception) then the value can be converted to Nothing within pure code, and then excluded from the result using catMaybes.
This solution seems to be working for me.
This solution looks very ugly to me. Catching 'error's is debugging, but parser failure is kind of exception handling. I guess, the errors you want to catch are caused by non-supported fail method, right? Can't you use a monad transformer like explicit-exception:Synchronous.Exception or transformers:ErrorT around the Binary parser?

On 07.09.2010 20:51, Henning Thielemann wrote:
This solution looks very ugly to me. Catching 'error's is debugging, but parser failure is kind of exception handling. I guess, the errors you want to catch are caused by non-supported fail method, right? Can't you use a monad transformer like explicit-exception:Synchronous.Exception or transformers:ErrorT around the Binary parser?
ErrorT is useless here since it cannot intercept calls to 'error'. Same should be the case with ExcepcionalT. There is also performance penalty

Henning,
On Tue, Sep 7, 2010 at 12:51 PM, Henning Thielemann
This solution looks very ugly to me. Catching 'error's is debugging, but parser failure is kind of exception handling. I guess, the errors you want to catch are caused by non-supported fail method, right? Can't you use a monad transformer like explicit-exception:Synchronous.Exception or transformers:ErrorT around the Binary parser?
Alexey already replied, but after looking at the way Binary processes parser errors I came to the same conclusion: it is by design that it falls hard when data cannot be deserialized (btw makes sense in many cases, but not in mine). I am beginning to realize that probably (de)serializing lazy structures is not a good idea at all since such structure may be consumed in pure code, and such parser errors are hidden inside, and may "fire" at any moment. Any complications to the binary parser will bring performance penalty. Or weird solutions like mine are needed. Thanks. -- Dimitry Golubovsky Anywhere on the Web

Quoth Dimitry Golubovsky
I see another solution: are there any hidden problems?
I found an interesting package, ChasingBottoms which contains a function testing a value to be bottom and returning a Boolean (of course it cannot be done without unsafePerformIO).
I borrowed the idea from that package, and wrote two functions:
unThrow :: (Exception e) => a -> Either e a
unThrow a = unsafePerformIO $ (E.evaluate a >>= return . Right) `E.catch` (\e -> return $ Left e)
Like Henning, I'm not going to try to predict a specific problem with it, but I don't like it. Have you had a chance to look at cereal (Data.Serialize)? I frankly couldn't make it do what you need, as written, but I could make it work if I modify Serialize.Get to export "Get(..)" instead of "Get".
From there I wrote a variant of the Get (>>=) function with an additional end-value-on-failure parameter, [] in your program:
ebind m g e = Get (\ s0 f k -> unGet m s0 (\ _ _ -> k B.empty e) (\ s a -> unGet (g a) s f k)) ... so to get your error bounded array of Int, getx = ebind get (\ a -> getx >>= return . (:) a) [] I think it's very likely there's a simpler way to do this with Data.Serial as written, I am just a little backwards with state monads and that kind of thing. Donn Cave, donn@avvanta.com

Quoth Donn Cave
I think it's very likely there's a simpler way to do this with Data.Serial as written, I am just a little backwards with state monads and that kind of thing.
OK, another look at it reveals that "mplus" can be used for this application, so much more simply, import Control.Monad (mplus) getx = do a <- get x <- getn `mplus` return [] return (a:x) Donn Cave, donn@avvanta.com

On Tue, Sep 7, 2010 at 2:45 PM, Dimitry Golubovsky
unThrow a = unsafePerformIO $ (E.evaluate a >>= return . Right) `E.catch` (\e -> return $ Left e)
-- or perhaps the right argument of catch could be just (return . Left)?
bm2mb :: a -> Maybe a
bm2mb a = case unThrow a of Left (e::SomeException) -> Nothing Right a -> Just a
Philosophically these functions are Nasty because they violate referential transparency. In particular it's possible for the same expression to throw different exceptions each time it's run depending on how it's optimised, what other threads are doing, if the user presses ctrl-C, etc. etc. See the spoon package: http://hackage.haskell.org/package/spoon which alleviates this a little by only catching some kinds of exception, and not telling you which it caught. It still violates monotonicity (I believe), so purists will be upset, but practically it can be useful for when editing the source code to provide explicit exceptions (which is ideally what you'd do) is not an option.

Quoth Dimitry Golubovsky
Is there any way to catch/detect failures inside the Get monad? It is not an instance of MonadError, so catchError does not work.
Ideally, the function would keep decoding as long as it is possible, and upon the first failure of the parser, return whatever has been decoded.
I believe it can't be done. (I've seen three responses that seemed to be proposing some course of action, but ... correct me if I'm wrong, nothing that would allow you to use Get in Data.Binary this way.) The key point is the use of `throw', via `error', in Get's `fail'. `throw' raises an exception that can be caught only in IO, so you can't catch it inside Get. So ... while `fail' is a Monad function, it isn't implemented here in a way you could use, like it is in Maybe for example. Nor could it be, I think, which is kind of unfortunate. Donn Cave, donn@avvanta.com

donn:
Quoth Dimitry Golubovsky
, Is there any way to catch/detect failures inside the Get monad? It is not an instance of MonadError, so catchError does not work.
Ideally, the function would keep decoding as long as it is possible, and upon the first failure of the parser, return whatever has been decoded.
I believe it can't be done. (I've seen three responses that seemed to be proposing some course of action, but ... correct me if I'm wrong, nothing that would allow you to use Get in Data.Binary this way.)
The key point is the use of `throw', via `error', in Get's `fail'. `throw' raises an exception that can be caught only in IO, so you can't catch it inside Get. So ... while `fail' is a Monad function, it isn't implemented here in a way you could use, like it is in Maybe for example. Nor could it be, I think, which is kind of unfortunate.
For strict, checked binary parsing, use the cereal package. For lazy binary parsing with async errors, use binary. They're the main two points in the design space. The other is to tag the lazy stream, and insert failure tags in the structure. -- Don

On 05.09.2010 22:02, Don Stewart wrote:
For strict, checked binary parsing, use the cereal package. For lazy binary parsing with async errors, use binary.
Unfortunately cereal is too slow. I got ~5x slowdown with cereal and had to patch binary in order to incorporated error handling (essentially same as in cereal but simpler).
They're the main two points in the design space. The other is to tag the lazy stream, and insert failure tags in the structure.
It won't help againist "not enough input" errors. Sometimes fragments of data I process are damaged they are too short, some bits are flipped etc. There is no way to guard againist beforementioned errors but to constantly check that there is enough data in stream. Also error handling is very useful in signalling that data is malformed. It's possible to use cereal but it's too slow and it's inconvenient to have both Binary and Serialize instances. Also beginnig from 0.5.0.2 Get monad is strict and consume all required input at once. And therefore isn't suitable for lazy parsing unless lazyness is introduced manually.

If however something goes wrong, and prs fails, the whole function fails (error is thrown). Since [a] (result of decoding) is a lazy list, actual exception may be thrown at any moment the list is being processed, and exception handler may not be properly set.
True -> return (reverse a) False -> many' (s:a) Lazy lists are built in left to right order, but this one is right to left.
Is there any way to catch/detect failures inside the Get monad? lookAhead, lookAheadM, etc
participants (9)
-
Alexey Khudyakov
-
Ben Millwood
-
Dimitry Golubovsky
-
Don Stewart
-
Donn Cave
-
Henning Thielemann
-
Henning Thielemann
-
Stephen Tetley
-
Victor Gorokhov