Clarification on uniWhite lexical definition

The haskell report says: uniWhite → any Unicode character defined as whitespace it's not clear to me whether this means that the unicode character should have "Zs" as it's general category ;; Zs Space_Separator a space character (of various non-zero widths) or whether it should be defined as whitespace as in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt Any clarification appreciated, Immanuel -- -- Researching the dual problem of finding the function that has a given point as fixpoint.

On 2020-10-20 6:43 a.m., Immanuel Litzroth wrote:
The haskell report says: uniWhite → any Unicode character defined as whitespace
it's not clear to me whether this means that the unicode character should have "Zs" as it's general category ;; Zs Space_Separator a space character (of various non-zero widths) or whether it should be defined as whitespace as in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
Recall that this production dates from 1998, which was the early days of Unicode. You should be looking approximately at the Unicode 2.1.8 standard, not the latest one. And once you look there, you'll find it was much simpler:
Property dump for: 0x10000004 (White space)
0009..000D (5 chars) 0020 00A0 2000..200B (12 chars) 2028..2029 (2 chars) 3000
So there was no ambiguity at the time. Now if you're trying to extrapolate the intent to the present standard... well I have no more authority than you in the matter, but I'd go with the more inclusive definition.

On Tue, Oct 20, 2020 at 12:43:06PM +0200, Immanuel Litzroth wrote:
The haskell report says: uniWhite → any Unicode character defined as whitespace
it's not clear to me whether this means that the unicode character should have "Zs" as it's general category ;; Zs Space_Separator a space character (of various non-zero widths) or whether it should be defined as whitespace as in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
Any clarification appreciated,
FWIW, GHC uses "Zs": https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x... https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x... https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x... https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x... with the definition of generalCategory "Space" at: https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/Unicode.... -- Viktor.
participants (3)
-
Immanuel Litzroth
-
Mario
-
Viktor Dukhovni