Proposal (base): Add new Exp constructor to Text.Read.Lex.Lexeme

Core of proposal: Add data Lexeme ... | Exp Integer Integer -- ^ Floating point literal in exponential form a*10^b So that numbers in exponential notation can be represented directly. Rationale: Currently when parsing 12e1000000000000 it is at some point represented as Rational, and of course does not fit in memory. Later on this Rational is converted to Integer or Int8 or Float or Double, but we never get to that point. This uncovers three bugs: 1) 12e12 :: Integer should not parse per Report 2) 12eXXX :: Double should parse, for large values of XXX should give +Infinity, for large negative values of XXX should give +0.0 3) integer-gmp (and/or integer-simple) is not immune to allocation errors and does 'Segmentation violation' or 'Bus error' (sometimes correctly throws 'Out of memory' though) My proposal addresses 1) and 2) as I needed to change public interface of Text.Read.Lex to get it fixed. More info and a patch in trac: http://hackage.haskell.org/trac/ghc/ticket/5688 As reading numbers is core for web (for example HTTP headers, URLs, JSON etc) this is a security issue. As this is security issue I'm tempted to insist this goes into GHC 7.4. What do you think? Discussion period: 2 weeks, 25 Dec 2011. -- Gracjan

Gracjan Polak wrote:
Core of proposal: Add data Lexeme ... | Exp Integer Integer -- ^ Floating point literal in exponential form a*10^b Currently when parsing 12e1000000000000 it is at some point represented as Rational... I needed to change public interface of Text.Read.Lex to get it fixed.
+1 for doing whatever is needed to fix this, and promptly. I defer to the opinions of those more intimately familiar with GHC internals about whether this is the right way to do it. Thanks for pushing this forward! -Yitz

On 09/12/2011 08:01, Gracjan Polak wrote:
Core of proposal: Add
data Lexeme ... | Exp Integer Integer -- ^ Floating point literal in exponential form a*10^b
So that numbers in exponential notation can be represented directly.
Rationale:
Currently when parsing 12e1000000000000 it is at some point represented as Rational, and of course does not fit in memory. Later on this Rational is converted to Integer or Int8 or Float or Double, but we never get to that point.
This uncovers three bugs: 1) 12e12 :: Integer should not parse per Report 2) 12eXXX :: Double should parse, for large values of XXX should give +Infinity, for large negative values of XXX should give +0.0 3) integer-gmp (and/or integer-simple) is not immune to allocation errors and does 'Segmentation violation' or 'Bus error' (sometimes correctly throws 'Out of memory' though)
My proposal addresses 1) and 2) as I needed to change public interface of Text.Read.Lex to get it fixed.
More info and a patch in trac:
http://hackage.haskell.org/trac/ghc/ticket/5688
As reading numbers is core for web (for example HTTP headers, URLs, JSON etc) this is a security issue.
As this is security issue I'm tempted to insist this goes into GHC 7.4. What do you think?
Ok by me, though I'm not familiar with the details of Text.Read.Lex.Lexeme. I think it's an "internal" API of sorts, since it was added to implement Text.Read.lex. I wonder why integer-gmp crashes when trying to allocate too much... sounds like we need to look into that - could you make a ticket if there isn't already one? Cheers, Simon

Simon Marlow
Ok by me, though I'm not familiar with the details of Text.Read.Lex.Lexeme. I think it's an "internal" API of sorts, since it was added to implement Text.Read.lex.
If this API is internal then I'd propose to remove Rat constructor, as this one is ever used only with powers of 10 as denominator. And we would have that case covered better by Exp contructor.
I wonder why integer-gmp crashes when trying to allocate too much... sounds like we need to look into that - could you make a ticket if there isn't already one?
I've checked 6.12.3, where this problem appears. I've heard this issue is not present in 7.2, I'm going to check HEAD and see if I can reproduce the crash. -- Gracjan

Am 12.12.2011 12:17, schrieb Gracjan Polak:
Simon Marlow
writes: Ok by me, though I'm not familiar with the details of Text.Read.Lex.Lexeme. I think it's an "internal" API of sorts, since it was added to implement Text.Read.lex.
If this API is internal then I'd propose to remove Rat constructor, as this one is ever used only with powers of 10 as denominator. And we would have that case covered better by Exp contructor.
How do you intent to store fractional numbers (like "10.01") without the Rat constructor? How do you represent "10.01e10"? Exp Integer Integer seems to be not enough. In any case, "^" from the integer-gmp should not be used to compute a Rat value for the powers of 10. (see also http://hackage.haskell.org/trac/ghc/ticket/3897) Christian
I wonder why integer-gmp crashes when trying to allocate too much... sounds like we need to look into that - could you make a ticket if there isn't already one?
I've checked 6.12.3, where this problem appears. I've heard this issue is not present in 7.2, I'm going to check HEAD and see if I can reproduce the crash.

Christian Maeder
How do you intent to store fractional numbers (like "10.01") without the Rat constructor?
10.01 = Exp 1001 (-2)
How do you represent "10.01e10"?
Exp Integer Integer
seems to be not enough.
10.01e10 = Exp 1001 8
In any case, "^" from the integer-gmp should not be used to compute a Rat value for the powers of 10. (see also http://hackage.haskell.org/trac/ghc/ticket/3897)
True. I think it is time to take discussion to trac and just report to this list on final conclusion. -- Gracjan

Am 12.12.2011 15:09, schrieb Gracjan Polak:
Christian Maeder
writes: How do you intent to store fractional numbers (like "10.01") without the Rat constructor?
10.01 = Exp 1001 (-2)
can this be converted to a Rational, when e-Notation is no longer supported for rationals?
How do you represent "10.01e10"?
Exp Integer Integer
seems to be not enough.
10.01e10 = Exp 1001 8
In any case, "^" from the integer-gmp should not be used to compute a Rat value for the powers of 10. (see also http://hackage.haskell.org/trac/ghc/ticket/3897)
True.
I think it is time to take discussion to trac and just report to this list on final conclusion.
Ok, Christian
participants (4)
-
Christian Maeder
-
Gracjan Polak
-
Simon Marlow
-
Yitzchak Gale