Proper round-trip HughesPJ/Parsec for Doubles?

Andy Gimblett

23 Feb 2010 23 Feb '10

8:44 a.m.

Hi all, Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse? Long version: I'm writing and QuickCheck-testing a parser using the approach set out here: http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche... That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles). Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved. At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing. As hpaste.org seems to be down, I'll attach a code example here instead. Thanks! -Andy -- Andy Gimblett http://gimbo.org.uk/

Attachments:

TestParse.hs (application/octet-stream — 1.5 KB)

Show replies by date

Daniel Fischer

23 Feb 23 Feb

9:27 a.m.

Am Dienstag 23 Februar 2010 14:44:50 schrieb Andy Gimblett:

...

Hi all,

Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?

Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:

http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quic kcheck/

That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).

Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.

Parse it as a Rational, then convert with fromRational. I don't know whether that will always have parse . prettyPrint == id, but it'll be much closer. The naturalOrFloat default implementation uses fraction = do{ char '.' ; digits <- many1 digit <?> "fraction" ; return (foldr op 0.0 digits) } <?> "fraction" where op d f = (f + fromIntegral (digitToInt d))/10.0 and division by 10 isn't exact with a binary representation.

...

At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.

As hpaste.org seems to be down, I'll attach a code example here instead.

Thanks!

-Andy

-- Andy Gimblett http://gimbo.org.uk/

Christian Maeder

10:13 a.m.

Andy Gimblett schrieb:

...

Hi all,

Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?

Maybe you have more luck with show and read (without Parsec.Token). Your example: x = 9.91165677454629 fails because the computation performed by the parser 9.0 + 0.91165677454629 yields 9.911656774546291 Cheers Christian

...

Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:

http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche...

That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).

Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.

At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.

As hpaste.org seems to be down, I'll attach a code example here instead.

Thanks!

-Andy

-- Andy Gimblett http://gimbo.org.uk/

------------------------------------------------------------------------

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Andy Gimblett

11:08 a.m.

...

...
Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?

Maybe you have more luck with show and read (without Parsec.Token).

Your example: x = 9.91165677454629

fails because the computation performed by the parser 9.0 + 0.91165677454629 yields 9.911656774546291

That seems to do the trick! Below, for the record, the code I've come up with (I threw away the Either Integer Double part so it's a bit simpler, also). I'm sure it can be improved, but this is passing all tests reliably, it seems. Many thanks, Christian and Daniel, for your help! Best, -Andy parseDouble :: Parser Double parseDouble = try $ do (symbol toks) "-" n <- floater return $ negate n <|> floater where toks = makeTokenParser emptyDef -- This could definitely be improved, but it's working. :-) floater :: Parser Double floater = do w <- many1 digit char '.' f <- many1 digit e <- optionMaybe $ do char 'e' -- Optional exponent part n <- option "" (char '-' >> return "-") -- Optional negation in exponent m <- many1 digit return $ n ++ m case e of Nothing -> return $ read $ w ++ "." ++ f Just e' -> return $ read $ w ++ "." ++ f ++ "e" ++ e' -- Andy Gimblett http://gimbo.org.uk/

Andy Gimblett

4:55 p.m.

For the record, here's the final improved version: float' :: TokenParser st -> GenParser Char st Double float' t = do n <- liftCtoS '-' w <- many1 digit char '.' f <- many1 digit e <- option "" $ do char 'e' n' <- liftCtoS '-' m <- many1 digit return $ concat ["e", n', m] whiteSpace t return $ read $ concat [n, w, ".", f, e] where liftCtoS a = option "" (char a >> return [a]) Thanks for all the help, again. -Andy -- Andy Gimblett http://gimbo.org.uk/

Christian Maeder

24 Feb 24 Feb

8:24 a.m.

Andy Gimblett schrieb:

...

For the record, here's the final improved version:

Hi Andy, I hope you don't mind if I make some style comments to your "final" version. 1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e") 2. The "t :: TokenParser st" is only used for the white spaces. This should be done separately (use "lexeme" from the TokenParser if you really need to). Just using "spaces" is also an alternative. 3. "liftCtoS" is only applied to '-', so an "optSign" would suffice. optSign = option "" $ fmap (: []) (char '-') (read also allows a capital 'E' and a '+' before the exponent, but no initial '+' sign. The decimal point is optional. Also "NaN" and "Infinity" can be read, both possibly preceded by a '-' sign followed by spaces. But you may restrict yourself to the possible outputs of show, which would include "NaN" and "Infinity", though.) It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely. Btw I observed the following problem with read (that readMaybe would also not solve). http://hackage.haskell.org/trac/ghc/ticket/3897 Cheers Christian

...

float' :: TokenParser st -> GenParser Char st Double float' t = do n <- liftCtoS '-' w <- many1 digit char '.' f <- many1 digit e <- option "" $ do char 'e' n' <- liftCtoS '-' m <- many1 digit return $ concat ["e", n', m] whiteSpace t return $ read $ concat [n, w, ".", f, e] where liftCtoS a = option "" (char a >> return [a])

Thanks for all the help, again.

-Andy

-- Andy Gimblett http://gimbo.org.uk/

Ben Millwood

9:22 a.m.

On Wed, Feb 24, 2010 at 1:24 PM, Christian Maeder wrote:

...

1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")

You can also break it immediately before do, which I think is sometimes more clear.

Christian Maeder

3:36 p.m.

Ben Millwood schrieb:

...

On Wed, Feb 24, 2010 at 1:24 PM, Christian Maeder wrote:

...
1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")

You can also break it immediately before do, which I think is sometimes more clear.

If not an extra space is added following "do" this leads to an "odd" indentation of at least one line. C.

Andy Gimblett

25 Feb 25 Feb

7:15 a.m.

Hi Christian, On 24 Feb 2010, at 13:24, Christian Maeder wrote:

...

I hope you don't mind if I make some style comments to your "final" version.

Not at all - thanks!

...

1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")

I'm not convinced by this; perhaps while editing the code it's useful, but those changes don't happen very often, and when they do, any half- decent editor ought to be able to handle making the change consistently. I do sometimes drop the do to the next line, but usually in order to keep things within 80 columns. I think this is somewhat a matter of personal taste though. More on this at the end...

...

2. The "t :: TokenParser st" is only used for the white spaces. This should be done separately (use "lexeme" from the TokenParser if you really need to). Just using "spaces" is also an alternative.

OK - but what I'm trying to do here is create something I can use as a drop-in replacement for float from Text.ParserCombinators.Parsec.Token - in which case it shouldn't be done separately, I think?

...

3. "liftCtoS" is only applied to '-', so an "optSign" would suffice. optSign = option "" $ fmap (: []) (char '-')

Agreed - although I resurrect it later as maybeChar (see below), matching against a choice of characters (to handle +/-) or returning "" if empty.

...

(read also allows a capital 'E' and a '+' before the exponent, but no initial '+' sign.

OK: didn't catch this because show doesn't (it seems) ever write them like that. Thanks.

...

The decimal point is optional.

Same comment. :-) Fixed below, although I remove this optionality for my application (for now) because (I think) I want to be explicit about int vs float...

...

Also "NaN" and "Infinity" can be read, both possibly preceded by a '-' sign followed by spaces. But you may restrict yourself to the possible outputs of show, which would include "NaN" and "Infinity", though.)

OK. Indeed, it seems an initial '-' can be followed by spaces for other cases, e.g. "- 2e4", so have implemented that more general form. Adding the NaN and Infinity cases gives us another level of indent, and pushes us close enough to 80 columns that I've dropped the outermost do to the next line.

...

It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely.

It seems to be kicking up reasonable errors as it is, e.g.: *Main> parse aFloat "" "2e-h" Left (line 1, column 4): unexpected "h" expecting digit I haven't seen any uncaught exceptions propagating, if that's what you're worried about...?

...

Btw I observed the following problem with read (that readMaybe would also not solve). http://hackage.haskell.org/trac/ghc/ticket/3897

Ah, well that's out of scope for me, I fear. :-) So here's what I have now: float' :: TokenParser st -> GenParser Char st Double float' t = do n <- maybeChar "-" spaces fs <- choice [symbol t "NaN", symbol t "Infinity", do whole <- many1 digit frac <- option "" $ do char '.' ds <- many1 digit return $ '.' : ds ex <- option "" $ do choice [char 'e', char 'E'] s <- maybeChar "+-" ds <- many1 digit return $ concat ["e", s, ds] return $ concat [whole, frac, ex] ] whiteSpace t return $ read $ n ++ fs where maybeChar :: String -> GenParser Char st String maybeChar as = option "" (choice (map char as) >>= \a -> return [a])

...

...
You can also break it immediately before do, which I think is sometimes more clear.

If not an extra space is added following "do" this leads to an "odd" indentation of at least one line.

I'm curious: which line in the above is indented oddly? Oh, wait: you don't mean odd as in "strange", do you? You mean odd as in "not even"? So, e.g. the "spaces" line starts at column 5? What's wrong with that? Cheers! -Andy -- Andy Gimblett http://gimbo.org.uk/

Neil Brown

7:41 a.m.

Andy Gimblett wrote:

...

...
1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")

I'm not convinced by this; perhaps while editing the code it's useful, but those changes don't happen very often, and when they do, any half-decent editor ought to be able to handle making the change consistently. I do sometimes drop the do to the next line, but usually in order to keep things within 80 columns. I think this is somewhat a matter of personal taste though. More on this at the end...

I think the implication is that the layout change you mention will cause a version control commit to look like the whole function changed, whereas if you didn't have to alter the indent, it would be clear that the only lines that changed are the one introducing the extra parameter, and any subsequent lines that need to be changed to use the parameter. BTW, to add another option, I like this style: float' t = do ... Thanks, Neil.

Christian Maeder

26 Feb 26 Feb

7:18 a.m.

Andy Gimblett schrieb:

...

Hi Christian, [...]

...
It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely.

It seems to be kicking up reasonable errors as it is, e.g.:

*Main> parse aFloat "" "2e-h" Left (line 1, column 4): unexpected "h" expecting digit

yes, this is fine, because you reject "-h", but suppose it was passed to read (due to a programming error).

...

I haven't seen any uncaught exceptions propagating, if that's what you're worried about...?

Yes, "read" will always work for you. But you could use a parser *almost* as simple as many1 $ oneOf "NaInfity+-.eE0123456789" and ask "readMaybe" if the parsed String can be read as Double.

...

So here's what I have now:

float' :: TokenParser st -> GenParser Char st Double float' t = do n <- maybeChar "-" spaces fs <- choice [symbol t "NaN", symbol t "Infinity", do whole <- many1 digit frac <- option "" $ do char '.' ds <- many1 digit return $ '.' : ds ex <- option "" $ do choice [char 'e', char 'E'] s <- maybeChar "+-" ds <- many1 digit return $ concat ["e", s, ds] return $ concat [whole, frac, ex] ] whiteSpace t return $ read $ n ++ fs where maybeChar :: String -> GenParser Char st String maybeChar as = option "" (choice (map char as) >>= \a -> return [a])

I would omit handling of spaces (that's a separate lexing step). It's enough to be able to parse those numbers, that are possible results of "show" (for round-trip). "symbol t" could be replaced by "(try . string)" in order to get rid of the TokenParser (that I don't like). Spaces following an initial minus sign are quite unusual and rather indicate that the sign does not belong to number, but that the sign is a separate operation.

...

...
...
You can also break it immediately before do, which I think is sometimes more clear.

If not an extra space is added following "do" this leads to an "odd" indentation of at least one line.

I'm curious: which line in the above is indented oddly? Oh, wait: you don't mean odd as in "strange", do you? You mean odd as in "not even"? So, e.g. the "spaces" line starts at column 5? What's wrong with that?

Right, again a matter of taste. Cheers Christian P.S. below is my parser for tptp numbers (as comparison). It rejects leading zeros, but allows an initial + sign. "fmap read real" would work if the input does not start with + (or is NaN or Infinity). -- | does not allow leading zeros natural :: Parser String natural = string "0" <|> many1 digit decimal :: Parser String decimal = do s <- option "" $ string "+" <|> string "-" ds <- natural return $ s ++ ds real :: Parser String real = do d <- decimal f <- option "" $ do p <- char '.' n <- many1 digit return $ p : n e <- option "" $ do x <- char 'e' <|> char 'E' g <- decimal return $ x : g return $ d ++ f ++ e

Evan Laforge

1:45 p.m.

...

real :: Parser String real = do d <- decimal f <- option "" $ do p <- char '.' n <- many1 digit return $ p : n

Just to throw two bits in here, this is the only style that doesn't require leaning on the space bar and squinting to line things up, doesn't require any fancy editor support, and looks fine with proportional fonts. It also allows you to move lines around with 'dd' and 'p' in vi :)

Lennart Augustsson

23 Feb 23 Feb

3:08 p.m.

If you use read (reads) and show for the actual conversion it will round trip. It appears to be non-trivial since most languages and libraries get it wrong. :) -- Lennart On Tue, Feb 23, 2010 at 1:44 PM, Andy Gimblett wrote:

...

Hi all,

Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?

Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:

http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche...

That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).

Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.

At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.

As hpaste.org seems to be down, I'll attach a code example here instead.

Thanks!

-Andy

-- Andy Gimblett http://gimbo.org.uk/

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

5562

Age (days ago)

5565

Last active (days ago)

List overview

Download

12 comments

7 participants

participants (7)

Andy Gimblett
Ben Millwood
Christian Maeder
Daniel Fischer
Evan Laforge
Lennart Augustsson
Neil Brown