Proposal: modify `Read` instances for `Float` and `Double`

Hello, I am working on a small GHC extension to support floating point literals in hexadecimal notation (https://github.com/ghc-proposals/ghc-proposals/pull/37), which is similar to what's available in other languages. To support this change, I would like to propose that we modify the `Read` instances for `Float` and `Double` to parse literals in the new notation. This may affect existing programs---although it doesn't seem very likely. Here is an example: current behavior: reads "0x10p10" = [(16.0,"p10")] new behavior: reads "0x10p10" = [(16384,"")] What do people think? -Iavor

On Tue, 28 Feb 2017, Iavor Diatchki wrote:
This may affect existing programs---although it doesn't seem very likely. Here is an example:
current behavior:
reads "0x10p10" = [(16.0,"p10")]
new behavior:
reads "0x10p10" = [(16384,"")]
"p" refers to a power of two and the exponent is written in decimal for a hexadecimal mantissa. Looks pretty confusing to me but it seems that the standard was made somewhen before this proposal.

Henning: Indeed, the proposal follows the description in p57-58 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, which dates back to 2007. (Some Haskell related deviations do exist, like dropping the final suffix, since Haskell doesn't need it; and requiring digits both before and after the dot.) I think of the format as precisely representing the value "mantissa x 2^exp"; where the mantissa is written in hexadecimal, and the exponent is left as a regular decimal integer. The discrepancy is rather weird, but I guess it made more sense when the standard was drafted. More importantly, all the other languages (C, Java, Python: http://www.exploringbinary.com/hexadecimal-floating-point-constants/) follow this convention as well; so it would be unfortunate if Haskell diverged. For the change in semantics for "reads:" That is indeed unfortunate since we lose backwards compatibility. But it's a very minor one and I would be curious if anyone depended on the existing semantics for any legitimate reason. I personally do not see any issues with it. -Levent. On Tue, Feb 28, 2017 at 11:20 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Tue, 28 Feb 2017, Iavor Diatchki wrote:
This may affect existing programs---although it doesn't seem very likely.
Here is an example:
current behavior:
reads "0x10p10" = [(16.0,"p10")]
new behavior:
reads "0x10p10" = [(16384,"")]
"p" refers to a power of two and the exponent is written in decimal for a hexadecimal mantissa. Looks pretty confusing to me but it seems that the standard was made somewhen before this proposal. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

To me, the notation makes sense if you think of the binary representation
of the number: each hex digit is 4 bits, and the base 2 exponent allows you
to move the decimal point by one bit. I would guess that the exponent is
written in base 10, because that's easier for most people to understand,
and its bit-pattern representation is not all that important.
On Tue, Feb 28, 2017 at 5:07 PM, Levent Erkok
Henning:
Indeed, the proposal follows the description in p57-58 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, which dates back to 2007. (Some Haskell related deviations do exist, like dropping the final suffix, since Haskell doesn't need it; and requiring digits both before and after the dot.)
I think of the format as precisely representing the value "mantissa x 2^exp"; where the mantissa is written in hexadecimal, and the exponent is left as a regular decimal integer. The discrepancy is rather weird, but I guess it made more sense when the standard was drafted. More importantly, all the other languages (C, Java, Python: http://www.exploringbinary.com/ hexadecimal-floating-point-constants/) follow this convention as well; so it would be unfortunate if Haskell diverged.
For the change in semantics for "reads:" That is indeed unfortunate since we lose backwards compatibility. But it's a very minor one and I would be curious if anyone depended on the existing semantics for any legitimate reason. I personally do not see any issues with it.
-Levent.
On Tue, Feb 28, 2017 at 11:20 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Tue, 28 Feb 2017, Iavor Diatchki wrote:
This may affect existing programs---although it doesn't seem very
likely. Here is an example:
current behavior:
reads "0x10p10" = [(16.0,"p10")]
new behavior:
reads "0x10p10" = [(16384,"")]
"p" refers to a power of two and the exponent is written in decimal for a hexadecimal mantissa. Looks pretty confusing to me but it seems that the standard was made somewhen before this proposal. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Plus one from me
Also this is actually more ieee compliant than the c standards spec because
we don't need suffixes on literals :)
On Tue, Feb 28, 2017 at 8:20 PM Iavor Diatchki
To me, the notation makes sense if you think of the binary representation of the number: each hex digit is 4 bits, and the base 2 exponent allows you to move the decimal point by one bit. I would guess that the exponent is written in base 10, because that's easier for most people to understand, and its bit-pattern representation is not all that important.
On Tue, Feb 28, 2017 at 5:07 PM, Levent Erkok
wrote: Henning:
Indeed, the proposal follows the description in p57-58 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, which dates back to 2007. (Some Haskell related deviations do exist, like dropping the final suffix, since Haskell doesn't need it; and requiring digits both before and after the dot.)
I think of the format as precisely representing the value "mantissa x 2^exp"; where the mantissa is written in hexadecimal, and the exponent is left as a regular decimal integer. The discrepancy is rather weird, but I guess it made more sense when the standard was drafted. More importantly, all the other languages (C, Java, Python: http://www.exploringbinary.com/hexadecimal-floating-point-constants/) follow this convention as well; so it would be unfortunate if Haskell diverged.
For the change in semantics for "reads:" That is indeed unfortunate since we lose backwards compatibility. But it's a very minor one and I would be curious if anyone depended on the existing semantics for any legitimate reason. I personally do not see any issues with it.
-Levent.
On Tue, Feb 28, 2017 at 11:20 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Tue, 28 Feb 2017, Iavor Diatchki wrote:
This may affect existing programs---although it doesn't seem very likely. Here is an example:
current behavior:
reads "0x10p10" = [(16.0,"p10")]
new behavior:
reads "0x10p10" = [(16384,"")]
"p" refers to a power of two and the exponent is written in decimal for a hexadecimal mantissa. Looks pretty confusing to me but it seems that the standard was made somewhen before this proposal. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

This does bring up portability concerns and would cause further divergence
of Read from the language standard. If not handled carefully, this drags us
in an ever more implementation-defined rather than specification-defined
direction.
As a data point for this discussion, a similar proposal to extend the Read
syntax to add support BinaryLiterals was rejected over portability and
silent behavioral change concerns.
https://ghc.haskell.org/trac/ghc/ticket/10092
Whatever we do here, we may well want to be consistent with how we treat
both of these proposals.
If we do choose to accept this, we may well need to back and re-tackle
#10092.
Currently, we do have at least one chink in the armor, in that Read is
currently more liberal in what it will accept Unicode-wise than what the
language specification states as a result of
https://ghc.haskell.org/trac/ghc/ticket/10444
I do think that whatever we do here, it should involve a conscious decision
to either stick to the current report, or diverge from the current report
and then to revise this part of the report.
If we can get the Haskell Prime folks to fix the language report to include
them in the next language standard (if by default, even better!) then I'm
fully +1. I'm also fully on board with both these and binary literals going
into the language standard.
If we're doing this entirely on our own in the spirit of "being liberal in
what you accept and conservative in what you output" then I'm personally
far more dubious of the merits of that approach in practice, and will wait
to weigh in from a CLC perspective until more feedback is in place.
-Edward
On Fri, Mar 3, 2017 at 5:41 PM, Carter Schonwald wrote: Plus one from me Also this is actually more ieee compliant than the c standards spec
because we don't need suffixes on literals :) On Tue, Feb 28, 2017 at 8:20 PM Iavor Diatchki To me, the notation makes sense if you think of the binary representation
of the number: each hex digit is 4 bits, and the base 2 exponent allows you
to move the decimal point by one bit. I would guess that the exponent is
written in base 10, because that's easier for most people to understand,
and its bit-pattern representation is not all that important. On Tue, Feb 28, 2017 at 5:07 PM, Levent Erkok Henning: Indeed, the proposal follows the description in p57-58 of
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, which dates
back to 2007. (Some Haskell related deviations do exist, like dropping the
final suffix, since Haskell doesn't need it; and requiring digits both
before and after the dot.) I think of the format as precisely representing the value "mantissa x
2^exp"; where the mantissa is written in hexadecimal, and the exponent is
left as a regular decimal integer. The discrepancy is rather weird, but I
guess it made more sense when the standard was drafted. More importantly,
all the other languages (C, Java, Python: http://www.exploringbinary.com/
hexadecimal-floating-point-constants/) follow this convention as well;
so it would be unfortunate if Haskell diverged. For the change in semantics for "reads:" That is indeed unfortunate since
we lose backwards compatibility. But it's a very minor one and I would be
curious if anyone depended on the existing semantics for any legitimate
reason. I personally do not see any issues with it. -Levent. On Tue, Feb 28, 2017 at 11:20 AM, Henning Thielemann <
lemming@henning-thielemann.de> wrote: On Tue, 28 Feb 2017, Iavor Diatchki wrote: This may affect existing programs---although it doesn't seem very
likely. Here is an example: current behavior: reads "0x10p10" = [(16.0,"p10")] new behavior: reads "0x10p10" = [(16384,"")] "p" refers to a power of two and the exponent is written in decimal for a
hexadecimal mantissa. Looks pretty confusing to me but it seems that the
standard was made somewhen before this proposal.
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries _______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries _______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

As a member of the prime committee I would support adding hex floats to the
next standard. I'm not current on the related Unicode topics mind you :)
On Sat, Mar 4, 2017 at 2:10 AM Edward Kmett
This does bring up portability concerns and would cause further divergence of Read from the language standard. If not handled carefully, this drags us in an ever more implementation-defined rather than specification-defined direction.
As a data point for this discussion, a similar proposal to extend the Read syntax to add support BinaryLiterals was rejected over portability and silent behavioral change concerns.
https://ghc.haskell.org/trac/ghc/ticket/10092
Whatever we do here, we may well want to be consistent with how we treat both of these proposals.
If we do choose to accept this, we may well need to back and re-tackle #10092.
Currently, we do have at least one chink in the armor, in that Read is currently more liberal in what it will accept Unicode-wise than what the language specification states as a result of
https://ghc.haskell.org/trac/ghc/ticket/10444
I do think that whatever we do here, it should involve a conscious decision to either stick to the current report, or diverge from the current report and then to revise this part of the report.
If we can get the Haskell Prime folks to fix the language report to include them in the next language standard (if by default, even better!) then I'm fully +1. I'm also fully on board with both these and binary literals going into the language standard.
If we're doing this entirely on our own in the spirit of "being liberal in what you accept and conservative in what you output" then I'm personally far more dubious of the merits of that approach in practice, and will wait to weigh in from a CLC perspective until more feedback is in place.
-Edward
On Fri, Mar 3, 2017 at 5:41 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
Plus one from me
Also this is actually more ieee compliant than the c standards spec because we don't need suffixes on literals :)
On Tue, Feb 28, 2017 at 8:20 PM Iavor Diatchki
wrote: To me, the notation makes sense if you think of the binary representation of the number: each hex digit is 4 bits, and the base 2 exponent allows you to move the decimal point by one bit. I would guess that the exponent is written in base 10, because that's easier for most people to understand, and its bit-pattern representation is not all that important.
On Tue, Feb 28, 2017 at 5:07 PM, Levent Erkok
wrote: Henning:
Indeed, the proposal follows the description in p57-58 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, which dates back to 2007. (Some Haskell related deviations do exist, like dropping the final suffix, since Haskell doesn't need it; and requiring digits both before and after the dot.)
I think of the format as precisely representing the value "mantissa x 2^exp"; where the mantissa is written in hexadecimal, and the exponent is left as a regular decimal integer. The discrepancy is rather weird, but I guess it made more sense when the standard was drafted. More importantly, all the other languages (C, Java, Python: http://www.exploringbinary.com/hexadecimal-floating-point-constants/) follow this convention as well; so it would be unfortunate if Haskell diverged.
For the change in semantics for "reads:" That is indeed unfortunate since we lose backwards compatibility. But it's a very minor one and I would be curious if anyone depended on the existing semantics for any legitimate reason. I personally do not see any issues with it.
-Levent.
On Tue, Feb 28, 2017 at 11:20 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Tue, 28 Feb 2017, Iavor Diatchki wrote:
This may affect existing programs---although it doesn't seem very likely. Here is an example:
current behavior:
reads "0x10p10" = [(16.0,"p10")]
new behavior:
reads "0x10p10" = [(16384,"")]
"p" refers to a power of two and the exponent is written in decimal for a hexadecimal mantissa. Looks pretty confusing to me but it seems that the standard was made somewhen before this proposal. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I’m looking forward to having support for the hex floating point syntax. I’d like to see lex updated to reflect the new syntax as well. I think that the most likely code to be broken by changing the Read instance is code independently implementing a parser for hex floating point literals. -- Eric
participants (6)
-
Carter Schonwald
-
Edward Kmett
-
Eric Mertens
-
Henning Thielemann
-
Iavor Diatchki
-
Levent Erkok