Proper round-trip HughesPJ/Parsec for Doubles?

Hi all, Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse? Long version: I'm writing and QuickCheck-testing a parser using the approach set out here: http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche... That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles). Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved. At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing. As hpaste.org seems to be down, I'll attach a code example here instead. Thanks! -Andy -- Andy Gimblett http://gimbo.org.uk/

Am Dienstag 23 Februar 2010 14:44:50 schrieb Andy Gimblett:
Hi all,
Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?
Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:
http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quic kcheck/
That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).
Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.
Parse it as a Rational, then convert with fromRational. I don't know whether that will always have parse . prettyPrint == id, but it'll be much closer. The naturalOrFloat default implementation uses fraction = do{ char '.' ; digits <- many1 digit <?> "fraction" ; return (foldr op 0.0 digits) } <?> "fraction" where op d f = (f + fromIntegral (digitToInt d))/10.0 and division by 10 isn't exact with a binary representation.
At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.
As hpaste.org seems to be down, I'll attach a code example here instead.
Thanks!
-Andy
-- Andy Gimblett http://gimbo.org.uk/

Andy Gimblett schrieb:
Hi all,
Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?
Maybe you have more luck with show and read (without Parsec.Token). Your example: x = 9.91165677454629 fails because the computation performed by the parser 9.0 + 0.91165677454629 yields 9.911656774546291 Cheers Christian
Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:
http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche...
That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).
Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.
At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.
As hpaste.org seems to be down, I'll attach a code example here instead.
Thanks!
-Andy
-- Andy Gimblett http://gimbo.org.uk/
------------------------------------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?
Maybe you have more luck with show and read (without Parsec.Token).
Your example: x = 9.91165677454629
fails because the computation performed by the parser 9.0 + 0.91165677454629 yields 9.911656774546291
That seems to do the trick! Below, for the record, the code I've come up with (I threw away the Either Integer Double part so it's a bit simpler, also). I'm sure it can be improved, but this is passing all tests reliably, it seems. Many thanks, Christian and Daniel, for your help! Best, -Andy parseDouble :: Parser Double parseDouble = try $ do (symbol toks) "-" n <- floater return $ negate n <|> floater where toks = makeTokenParser emptyDef -- This could definitely be improved, but it's working. :-) floater :: Parser Double floater = do w <- many1 digit char '.' f <- many1 digit e <- optionMaybe $ do char 'e' -- Optional exponent part n <- option "" (char '-' >> return "-") -- Optional negation in exponent m <- many1 digit return $ n ++ m case e of Nothing -> return $ read $ w ++ "." ++ f Just e' -> return $ read $ w ++ "." ++ f ++ "e" ++ e' -- Andy Gimblett http://gimbo.org.uk/

For the record, here's the final improved version: float' :: TokenParser st -> GenParser Char st Double float' t = do n <- liftCtoS '-' w <- many1 digit char '.' f <- many1 digit e <- option "" $ do char 'e' n' <- liftCtoS '-' m <- many1 digit return $ concat ["e", n', m] whiteSpace t return $ read $ concat [n, w, ".", f, e] where liftCtoS a = option "" (char a >> return [a]) Thanks for all the help, again. -Andy -- Andy Gimblett http://gimbo.org.uk/

Andy Gimblett schrieb:
For the record, here's the final improved version:
Hi Andy, I hope you don't mind if I make some style comments to your "final" version. 1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e") 2. The "t :: TokenParser st" is only used for the white spaces. This should be done separately (use "lexeme" from the TokenParser if you really need to). Just using "spaces" is also an alternative. 3. "liftCtoS" is only applied to '-', so an "optSign" would suffice. optSign = option "" $ fmap (: []) (char '-') (read also allows a capital 'E' and a '+' before the exponent, but no initial '+' sign. The decimal point is optional. Also "NaN" and "Infinity" can be read, both possibly preceded by a '-' sign followed by spaces. But you may restrict yourself to the possible outputs of show, which would include "NaN" and "Infinity", though.) It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely. Btw I observed the following problem with read (that readMaybe would also not solve). http://hackage.haskell.org/trac/ghc/ticket/3897 Cheers Christian
float' :: TokenParser st -> GenParser Char st Double float' t = do n <- liftCtoS '-' w <- many1 digit char '.' f <- many1 digit e <- option "" $ do char 'e' n' <- liftCtoS '-' m <- many1 digit return $ concat ["e", n', m] whiteSpace t return $ read $ concat [n, w, ".", f, e] where liftCtoS a = option "" (char a >> return [a])
Thanks for all the help, again.
-Andy
-- Andy Gimblett http://gimbo.org.uk/

Ben Millwood schrieb:
On Wed, Feb 24, 2010 at 1:24 PM, Christian Maeder
wrote: 1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")
You can also break it immediately before do, which I think is sometimes more clear.
If not an extra space is added following "do" this leads to an "odd" indentation of at least one line. C.

Hi Christian, On 24 Feb 2010, at 13:24, Christian Maeder wrote:
I hope you don't mind if I make some style comments to your "final" version.
Not at all - thanks!
1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")
I'm not convinced by this; perhaps while editing the code it's useful, but those changes don't happen very often, and when they do, any half- decent editor ought to be able to handle making the change consistently. I do sometimes drop the do to the next line, but usually in order to keep things within 80 columns. I think this is somewhat a matter of personal taste though. More on this at the end...
2. The "t :: TokenParser st" is only used for the white spaces. This should be done separately (use "lexeme" from the TokenParser if you really need to). Just using "spaces" is also an alternative.
OK - but what I'm trying to do here is create something I can use as a drop-in replacement for float from Text.ParserCombinators.Parsec.Token - in which case it shouldn't be done separately, I think?
3. "liftCtoS" is only applied to '-', so an "optSign" would suffice. optSign = option "" $ fmap (: []) (char '-')
Agreed - although I resurrect it later as maybeChar (see below), matching against a choice of characters (to handle +/-) or returning "" if empty.
(read also allows a capital 'E' and a '+' before the exponent, but no initial '+' sign.
OK: didn't catch this because show doesn't (it seems) ever write them like that. Thanks.
The decimal point is optional.
Same comment. :-) Fixed below, although I remove this optionality for my application (for now) because (I think) I want to be explicit about int vs float...
Also "NaN" and "Infinity" can be read, both possibly preceded by a '-' sign followed by spaces. But you may restrict yourself to the possible outputs of show, which would include "NaN" and "Infinity", though.)
OK. Indeed, it seems an initial '-' can be followed by spaces for other cases, e.g. "- 2e4", so have implemented that more general form. Adding the NaN and Infinity cases gives us another level of indent, and pushes us close enough to 80 columns that I've dropped the outermost do to the next line.
It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely.
It seems to be kicking up reasonable errors as it is, e.g.: *Main> parse aFloat "" "2e-h" Left (line 1, column 4): unexpected "h" expecting digit I haven't seen any uncaught exceptions propagating, if that's what you're worried about...?
Btw I observed the following problem with read (that readMaybe would also not solve). http://hackage.haskell.org/trac/ghc/ticket/3897
Ah, well that's out of scope for me, I fear. :-) So here's what I have now: float' :: TokenParser st -> GenParser Char st Double float' t = do n <- maybeChar "-" spaces fs <- choice [symbol t "NaN", symbol t "Infinity", do whole <- many1 digit frac <- option "" $ do char '.' ds <- many1 digit return $ '.' : ds ex <- option "" $ do choice [char 'e', char 'E'] s <- maybeChar "+-" ds <- many1 digit return $ concat ["e", s, ds] return $ concat [whole, frac, ex] ] whiteSpace t return $ read $ n ++ fs where maybeChar :: String -> GenParser Char st String maybeChar as = option "" (choice (map char as) >>= \a -> return [a])
You can also break it immediately before do, which I think is sometimes more clear.
If not an extra space is added following "do" this leads to an "odd" indentation of at least one line.
I'm curious: which line in the above is indented oddly? Oh, wait: you don't mean odd as in "strange", do you? You mean odd as in "not even"? So, e.g. the "spaces" line starts at column 5? What's wrong with that? Cheers! -Andy -- Andy Gimblett http://gimbo.org.uk/

Andy Gimblett wrote:
1. break the line after "do" (to avoid a layout change when change name or arguments of float' or rename the variable "e")
I'm not convinced by this; perhaps while editing the code it's useful, but those changes don't happen very often, and when they do, any half-decent editor ought to be able to handle making the change consistently. I do sometimes drop the do to the next line, but usually in order to keep things within 80 columns. I think this is somewhat a matter of personal taste though. More on this at the end...
I think the implication is that the layout change you mention will cause a version control commit to look like the whole function changed, whereas if you didn't have to alter the indent, it would be clear that the only lines that changed are the one introducing the extra parameter, and any subsequent lines that need to be changed to use the parameter. BTW, to add another option, I like this style: float' t = do ... Thanks, Neil.

Andy Gimblett schrieb:
Hi Christian, [...]
It may make sense to use something like readMaybe (which is missing in the Prelude) instead of "read" to allow the parser to fail more nicely.
It seems to be kicking up reasonable errors as it is, e.g.:
*Main> parse aFloat "" "2e-h" Left (line 1, column 4): unexpected "h" expecting digit
yes, this is fine, because you reject "-h", but suppose it was passed to read (due to a programming error).
I haven't seen any uncaught exceptions propagating, if that's what you're worried about...?
Yes, "read" will always work for you. But you could use a parser *almost* as simple as many1 $ oneOf "NaInfity+-.eE0123456789" and ask "readMaybe" if the parsed String can be read as Double.
So here's what I have now:
float' :: TokenParser st -> GenParser Char st Double float' t = do n <- maybeChar "-" spaces fs <- choice [symbol t "NaN", symbol t "Infinity", do whole <- many1 digit frac <- option "" $ do char '.' ds <- many1 digit return $ '.' : ds ex <- option "" $ do choice [char 'e', char 'E'] s <- maybeChar "+-" ds <- many1 digit return $ concat ["e", s, ds] return $ concat [whole, frac, ex] ] whiteSpace t return $ read $ n ++ fs where maybeChar :: String -> GenParser Char st String maybeChar as = option "" (choice (map char as) >>= \a -> return [a])
I would omit handling of spaces (that's a separate lexing step). It's enough to be able to parse those numbers, that are possible results of "show" (for round-trip). "symbol t" could be replaced by "(try . string)" in order to get rid of the TokenParser (that I don't like). Spaces following an initial minus sign are quite unusual and rather indicate that the sign does not belong to number, but that the sign is a separate operation.
You can also break it immediately before do, which I think is sometimes more clear.
If not an extra space is added following "do" this leads to an "odd" indentation of at least one line.
I'm curious: which line in the above is indented oddly? Oh, wait: you don't mean odd as in "strange", do you? You mean odd as in "not even"? So, e.g. the "spaces" line starts at column 5? What's wrong with that?
Right, again a matter of taste. Cheers Christian P.S. below is my parser for tptp numbers (as comparison). It rejects leading zeros, but allows an initial + sign. "fmap read real" would work if the input does not start with + (or is NaN or Infinity). -- | does not allow leading zeros natural :: Parser String natural = string "0" <|> many1 digit decimal :: Parser String decimal = do s <- option "" $ string "+" <|> string "-" ds <- natural return $ s ++ ds real :: Parser String real = do d <- decimal f <- option "" $ do p <- char '.' n <- many1 digit return $ p : n e <- option "" $ do x <- char 'e' <|> char 'E' g <- decimal return $ x : g return $ d ++ f ++ e

real :: Parser String real = do d <- decimal f <- option "" $ do p <- char '.' n <- many1 digit return $ p : n
Just to throw two bits in here, this is the only style that doesn't require leaning on the space bar and squinting to line things up, doesn't require any fancy editor support, and looks fine with proportional fonts. It also allows you to move lines around with 'dd' and 'p' in vi :)

If you use read (reads) and show for the actual conversion it will round trip.
It appears to be non-trivial since most languages and libraries get it wrong. :)
-- Lennart
On Tue, Feb 23, 2010 at 1:44 PM, Andy Gimblett
Hi all,
Short version: How can I pretty print and parse values of type Double such that those operations are each other's inverse?
Long version: I'm writing and QuickCheck-testing a parser using the approach set out here:
http://lstephen.wordpress.com/2007/07/29/parsec-parser-testing-with-quickche...
That is, each syntactic category gets a pretty-printer and a parser and an Arbitrary instance, and QuickCheck checks that (parse . prettyPrint) == id, basically. Somewhat unsurprisingly, this sometimes fails for floating point values (I'm using Doubles).
Now, I know that floats are in some sense imprecise, and comparing for equality is fraught with peril, but it seems that if x==x then it ought to be at least _possible_ to arrange matters such that (parse . prettyPrint x) == x as well. At worst, pretty-printing the underlying binary representation!? So my feeling is that my parser could be improved.
At the moment I'm working around it by defining a type class which checks for equality within some margin of error, and using that instead of Eq - but it's messier than I'd like, so I wondered if there was something obvious I'm missing.
As hpaste.org seems to be down, I'll attach a code example here instead.
Thanks!
-Andy
-- Andy Gimblett http://gimbo.org.uk/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (7)
-
Andy Gimblett
-
Ben Millwood
-
Christian Maeder
-
Daniel Fischer
-
Evan Laforge
-
Lennart Augustsson
-
Neil Brown