11 Sep
2015
11 Sep
'15
3:12 p.m.
#8524: GHC is inconsistent with the Haskell Report on which Unicode characters are
allowed in string and character literals
-------------------------------------+-------------------------------------
Reporter: oerjan | Owner:
| RyanGlScott
Type: bug | Status: new
Priority: low | Milestone:
Component: Compiler | Version: 7.6.3
(Parser) |
Resolution: | Keywords: newcomer
Operating System: Unknown/Multiple | Architecture:
Type of failure: GHC rejects | Unknown/Multiple
valid program | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions: Phab:D1235
-------------------------------------+-------------------------------------
Changes (by thomie):
* cc: hvr (added)
Comment:
@RyanGlScott: sorry about that, I should not have put the newcomer keyword
on this ticket prematurely.
Some code:
* Whitespace characters that the report excludes from strings:
{{{
> delete '\SP' $ filter isSpace ['\0'..]
"\t\n\v\f\r\160\5760\8192\8193\8194\8195\8196\8197\8198\8199\8200\8201\8202\8239\8287\12288"
}}}
* Whitespace characters that GHC excludes from strings:
{{{
> filter (\c -> generalCategory c == Control && isSpace c) ['\0'..]
"\t\n\v\f\r"
}}}
* `generalCategories` that the report and GHC also exclude from strings:
{{{
> nub $ map generalCategory $ filter (not . isPrint) ['\0'..]
[Control,Format,NotAssigned,LineSeparator,ParagraphSeparator,Surrogate,PrivateUse]
}}}
If we're going to be "as inclusive as possible", why not allow all of
these? Are there any downsides to this? Perhaps under a new flag
`FullUnicodeStrings`, enabled by default and disabled in Haskell98 and
Haskell2010 mode.
I'm also ok with just mentioning the current deviation from the report in
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/bugs-and-
infelicities.html.
--
Ticket URL:
GHC
The Glasgow Haskell Compiler