Parsing of bytestrings with non-String errors?

I've looked at polyparse and attoparsec and they seem to have in common that the error always is a String. My current ideas for a project would be a lot easier if I could just return some other type, something that I can pattern match on. Is there a parser combinator library out there that works on bytestrings and allows using a custom error type? Or maybe there's some very basic reason why String is so commonly used? /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

Is there a parser combinator library out there that works on bytestrings and allows using a custom error type?
The HuttonMeijerWallace combinators (distributed with polyparse) have the custom error type, but not the bytestrings.
Or maybe there's some very basic reason why String is so commonly used?
I don't think there is any deep reason. Strings are convenient, that is all. My guesstimate would be that if you take (e.g.) the polyparse combinators, and manually rewrite String everywhere to a parameter e, (only when it represents an error of course), it would take you maybe an hour in total, including fixing up any site you missed that the typechecker catches for you. Regards, Malcolm

On Sun, Feb 21, 2010 at 4:36 AM, Magnus Therning
I've looked at polyparse and attoparsec and they seem to have in common that the error always is a String. My current ideas for a project would be a lot easier if I could just return some other type, something that I can pattern match on.
It would be easy enough to add this, but you'd end up with a slightly convoluted API. Because of the presence of fail in all monadic APIs, you'd have to support only-a-string as a failure result in some form, so your failure type would have to be something like Either String a. There's no support for this in attoparsec simply because I haven't needed it. I suspect the same is true of other libraries, nothing deeper.

On 22/02/10 18:44, Bryan O'Sullivan wrote:
On Sun, Feb 21, 2010 at 4:36 AM, Magnus Therning
mailto:magnus@therning.org> wrote: I've looked at polyparse and attoparsec and they seem to have in common that the error always is a String. My current ideas for a project would be a lot easier if I could just return some other type, something that I can pattern match on.
It would be easy enough to add this, but you'd end up with a slightly convoluted API. Because of the presence of fail in all monadic APIs, you'd have to support only-a-string as a failure result in some form, so your failure type would have to be something like Either String a.
My thoughts went more like a parser type like data Parser e a = ... Possibly with the addition that 'e' implements a class that goes something like class ParserError e where baseError :: e addError :: e -> e -> e (At first I thought that maybe Monoid would do, but both a identity and associativity feels awkward in this case. :-) With 'String' implemented something like instance ParserError String where baseError = "Parser error, expected:\n" addError = (++) Then I believe 'Parser String' would be equivalent to the existing attoparsec parser (as found in the 0.7 series). I still haven't convinced myself that this will work though. Also, I had a look at attoparsec on bitbucket, and there are some major changes between 0.7 and 0.8. I realised I'll have to spend a lot more time understanding the code than I initially hoped. Right now that is unlikely to happen any time soon :(
There's no support for this in attoparsec simply because I haven't needed it. I suspect the same is true of other libraries, nothing deeper.
Yeah, that's what I thought. In a current project I just have a need to differentiate between errors in different parts of the parser. And handling those errors would just be simple if I could use pattern matching rather than inspect strings. /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe

On Mon, Feb 22, 2010 at 2:38 PM, Magnus Therning
My thoughts went more like a parser type like
data Parser e a = ...
Yes, I knew that's where you were going :-) The trouble is, you'd still have to deal with fail :: Monad m => String -> m a which would require your failure type to somehow accept a string. Plumbing that in would be a little more awkward than your initial exporations suggest :-\ You have two problems. The first is how to construct a value of your failure type that accepts a String parameter so that you can support users of "fail". The second is that you might need to pass extra information to construct your failure value when naïve code uses fail or mzero, otherwise you will only get useful error values out quite infrequently.
I still haven't convinced myself that this will work though. Also, I had a look at attoparsec on bitbucket, and there are some major changes between 0.7 and 0.8.
Even though those changes represent a major modification to the internals of attoparsec, they shouldn't really affect what you want to do, or anything interesting about how to do it.

On Tue, Feb 23, 2010 at 00:39, Bryan O'Sullivan
On Mon, Feb 22, 2010 at 2:38 PM, Magnus Therning
wrote: My thoughts went more like a parser type like
data Parser e a = ...
Yes, I knew that's where you were going :-) The trouble is, you'd still have to deal with fail :: Monad m => String -> m a which would require your failure type to somehow accept a string. Plumbing that in would be a little more awkward than your initial exporations suggest :-\ You have two problems. The first is how to construct a value of your failure type that accepts a String parameter so that you can support users of "fail". The second is that you might need to pass extra information to construct your failure value when naïve code uses fail or mzero, otherwise you will only get useful error values out quite infrequently.
Yes, I suspected there'd be something I had missed. I guess it'd would require 'Parser e a' to have a 'fail' that's similar to the one in 'Maybe'. Users would then be forced to use '>' to get useful error info out. Would that be an unworkable situation?
I still haven't convinced myself that this will work though. Also, I had a look at attoparsec on bitbucket, and there are some major changes between 0.7 and 0.8.
Even though those changes represent a major modification to the internals of attoparsec, they shouldn't really affect what you want to do, or anything interesting about how to do it.
Ah, that's good. I think I'll have to postpone any work on this for now though, and instead implement a 'String -> MyErrorType' function for, hopefully, temporary use. I've already been sidetracked twice before ;-) /M -- Magnus Therning (OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe
participants (3)
-
Bryan O'Sullivan
-
Magnus Therning
-
Malcolm Wallace