[GHC] #9668: Unicode info is out of date

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: new Priority: normal | Milestone: 7.10.1 Component: Core Libraries | Version: 7.9 Keywords: Unicode | Operating System: Architecture: Unknown/Multiple | Unknown/Multiple Difficulty: Unknown | Type of failure: Incorrect Blocked By: | result at runtime Related Tickets: | Test Case: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- The automatically generated `WCsubst.c` was last generated in 2011. The Unicode standard has been updated several times since then. I would like to try to replace that whole mechanism with something a little more cache- friendly, but at the very list the tables need to be right. Unfortunately, the script that generates this file gives no information about its input format, or just where a file with the right format is supposed to come from. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: new Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Description changed by dfeuer: Old description:
The automatically generated `WCsubst.c` was last generated in 2011. The Unicode standard has been updated several times since then. I would like to try to replace that whole mechanism with something a little more cache-friendly, but at the very list the tables need to be right. Unfortunately, the script that generates this file gives no information about its input format, or just where a file with the right format is supposed to come from.
New description: The automatically generated `WCsubst.c` was last generated in 2011. The Unicode standard has been updated several times since then. I would like to try to replace that whole mechanism with something a little more cache- friendly, but at the very least the tables need to be right. Unfortunately, the script that generates this file gives no information about its input format, or just where a file with the right format is supposed to come from. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: new Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by rwbarton): Yes, we really ought to have a `Note: [Unicode standard updates]` that documents everything that needs to happen when a new Unicode standard is released. (e.g. if there are magic constants in the Haskell `isSpace` that need to change if a formerly reserved character becomes whitespace, that should be listed too.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: new Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:2 rwbarton]:
Yes, we really ought to have a `Note: [Unicode standard updates]` that documents everything that needs to happen when a new Unicode standard is released. (e.g. if there are magic constants in the Haskell `isSpace` that need to change if a formerly reserved character becomes whitespace, that should be listed too.)
(Also, the User's Guide should mention the version of the Unicode standard ghc implements, if it doesn't do so already.)
It appears that all that's required is running `libraries/base/cbits/ubconfc` and feeding it the appropriate specification table as standard input. This replaces `WCsubst.c` and everything (including `isSpace`) works. If I manage to write a Haskell version of this system, the requirements will be similarly simple. The only trouble is figuring out just what table it's supposed to be fed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: patch Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: core-libraries-committee@… (added) * status: new => patch Comment: Phab:D316 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: patch Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): Phab:D317 was abandonned, Phab:316 seems to be the current one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: merge Priority: normal | Milestone: 7.8.4 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * status: patch => merge * milestone: 7.10.1 => 7.8.4 Comment: This is completed by https://phabricator.haskell.org/rGHCd4fd16801bc59034abdc6214e60fcce2b21af9c8 but we probably want to merge to 7.8.4. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: closed Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by rwbarton): * status: merge => closed * resolution: => fixed * milestone: 7.8.4 => 7.10.1 Comment: Er, we should probably have some guidelines written down about what is suitable for merging to a maintenance branch, but I ''really'' don't think this qualifies. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Er, we should probably have some guidelines written down about what is suitable for merging to a maintenance branch, but I ''really'' don't think
#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: closed Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:7 rwbarton]: this qualifies. I'm reminded of the prayer books my synagogue uses for Rosh Hashanah and Yom Kippur—the holiday date charts in the back were reprinted verbatim from an earlier edition, so they include the dates for the holidays many years ago, but ran out just a few years after the second printing. I personally consider such things as Unicode table update to be part of regular maintenance, rather than new features. Note: if we ''do'' decide to merge to 7.8.4, the tables will have to be generated for that, not copied over from 7.9. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9668: Unicode info is out of date -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: task | Status: closed Priority: normal | Milestone: 7.10.1 Component: Core | Version: 7.9 Libraries | Keywords: Unicode Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Incorrect | Related Tickets: result at runtime | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by ekmett): As much as I'd like to fix it in 7.8.4 this probably belongs in 7.10. It is hard for users to detect patch levels and switch behaviors. This is a fairly weak argument, but it's really enough that there **exists** an argument to give pause about changing something in a patch-level release. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9668#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC