Unicode alternative for '..' (ticket #3894)

I'm a big fan of the UnicodeSyntax [1] language extension. But I don't particularly like the alternative for the ellipsis '..'. I'm not sure if it was a conscious choice or a mistake to use the '⋯' character. I haven't really encountered that symbol before except for matrices and the like [2]. Therefore I propose to change the character from '⋯' to '…'. I submitted a bug report for this a few weeks ago [3]. Today I attached a patch which implements this change. Here are 4 code snippets for comparison: -- Simple ASCII import Data.Bool ( Bool(..) ) f :: [Int] f = [1, 3 .. 10] ++ [10..100] -- Current situation (MIDLINE HORIZONTAL ELLIPSIS, U+22EF) import Data.Bool ( Bool(⋯) ) f ∷ [Int] f = [1, 3 ⋯ 10] ++ [10⋯100] -- Proposed change (HORIZONTAL ELLIPSIS, U+2026) import Data.Bool ( Bool(…) ) f ∷ [Int] f = [1, 3 … 10] ++ [10…100] -- Another alternative (TWO DOT LEADER, U+2025) import Data.Bool ( Bool(‥) ) f ∷ [Int] f = [1, 3 ‥ 10] ++ [10‥100] The TWO DOT LEADER also looks nice, but I think we must consider more than looks alone (which is a matter of fonts) and take the semantics into account. I would really like some feedback on this since it is a change that breaks backwards compatibility (even though it is a really small change). Regards, Roel van Dijk 1 - http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#un... 2 - http://en.wikipedia.org/wiki/Ellipsis#In_mathematical_notation 3 - http://hackage.haskell.org/trac/ghc/ticket/3894

I think the baseline ellipsis makes much more sense; it's hard to see how the midline ellipsis was chosen. -- Jason Dusek

My opinion is that we should either use TWO DOT LEADER, or just leave it as it is now, two FULL STOP characters. Two dots indicating a range is not the same symbol as a three dot ellipsis. Traditional non-Unicode Haskell will continue to be around for a long time to come. It would be very confusing to have two different visual glyphs for this symbol. I don't think there is any semantic problem with using TWO DOT LEADER here. All three of the characters ONE DOT LEADER, TWO DOT LEADER, and HORIZONTAL ELLIPSIS are legacy characters from Xerox's XCCS. There, the characters they come from were used for forming dot leaders, e.g., in a table of contents. Using them that way in Unicode is considered incorrect unless they represent text that was originally encoded in XCCS; in Unicode, one does not form dot leaders using those characters. However, other new uses are considered legitimate. For example, HORIZONTAL ELLIPSIS can be used for fonts that have a special ellipsis glyph, and ONE DOT LEADER represents mijaket in Armenian encodings. So I don't see any reason why we can't use TWO DOT LEADER to represent the two-dot range symbol. The above analysis is based in part upon a discussion of these characters on the Unicode list in 2003: http://www.mail-archive.com/unicode@unicode.org/msg16285.html The author of that particular message, Kenneth Whistler, is of the opinion that two dots expressing a range as in [0..1] should be represented in Unicode as two FULL STOP characters, as we do now in Haskell. Others in that thread - whom Mr. Whistler seems to feel are less expert than himself regarding Unicode - think that TWO DOT LEADER is appropriate. No one considers replacing two-dot ranges with HORIZONTAL ELLIPSIS. If we can't find a Unicode character that everyone agrees upon, I also don't see any problem with leaving it as two FULL STOP characters. Thanks, Yitz

That is very interesting. I didn't know the history of those characters.
If we can't find a Unicode character that everyone agrees upon, I also don't see any problem with leaving it as two FULL STOP characters.
I agree. I don't like the current Unicode variant for "..", therefore I suggested an alternative. But I didn't consider removing it altogether. It is an interesting idea.

On 15/04/2010 18:12, Yitzchak Gale wrote:
My opinion is that we should either use TWO DOT LEADER, or just leave it as it is now, two FULL STOP characters.
Just to be clear, you're suggesting *removing* the Unicode alternative for '..' from GHC's UnicodeSyntax extension? I have no strong opinions about this and I'm happy to defer to those who know more about such things than me. The current choice of MIDLINE is probably accidental. Cheers, Simon
Two dots indicating a range is not the same symbol as a three dot ellipsis.
Traditional non-Unicode Haskell will continue to be around for a long time to come. It would be very confusing to have two different visual glyphs for this symbol.
I don't think there is any semantic problem with using TWO DOT LEADER here. All three of the characters ONE DOT LEADER, TWO DOT LEADER, and HORIZONTAL ELLIPSIS are legacy characters from Xerox's XCCS. There, the characters they come from were used for forming dot leaders, e.g., in a table of contents. Using them that way in Unicode is considered incorrect unless they represent text that was originally encoded in XCCS; in Unicode, one does not form dot leaders using those characters. However, other new uses are considered legitimate. For example, HORIZONTAL ELLIPSIS can be used for fonts that have a special ellipsis glyph, and ONE DOT LEADER represents mijaket in Armenian encodings. So I don't see any reason why we can't use TWO DOT LEADER to represent the two-dot range symbol.
The above analysis is based in part upon a discussion of these characters on the Unicode list in 2003:
http://www.mail-archive.com/unicode@unicode.org/msg16285.html
The author of that particular message, Kenneth Whistler, is of the opinion that two dots expressing a range as in [0..1] should be represented in Unicode as two FULL STOP characters, as we do now in Haskell. Others in that thread - whom Mr. Whistler seems to feel are less expert than himself regarding Unicode - think that TWO DOT LEADER is appropriate. No one considers replacing two-dot ranges with HORIZONTAL ELLIPSIS.
If we can't find a Unicode character that everyone agrees upon, I also don't see any problem with leaving it as two FULL STOP characters.
Thanks, Yitz _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

I wrote:
My opinion is that we should either use TWO DOT LEADER, or just leave it as it is now, two FULL STOP characters.
Simon Marlow wrote:
Just to be clear, you're suggesting *removing* the Unicode alternative for '..' from GHC's UnicodeSyntax extension?
Yes, sorry. Either use TWO DOT LEADER, or remove this Unicode alternative altogether (i.e. leave it the way it is *without* the UnicodeSyntax extension). I'm happy with either of those. I just don't like moving the dots up to the middle, or changing the number of dots. Thanks, Yitz

On Wed, Apr 21, 2010 at 12:51 AM, Yitzchak Gale
Yes, sorry. Either use TWO DOT LEADER, or remove this Unicode alternative altogether (i.e. leave it the way it is *without* the UnicodeSyntax extension).
I'm happy with either of those. I just don't like moving the dots up to the middle, or changing the number of dots.
I would be happy with either changing the character to the baseline ellipsis or removing it altogether. It would be nice if we could grep (or emacs grep-find) all sources on Hackage to check which packages use the ⋯ character. I suspect it is very close to 0.
participants (4)
-
Jason Dusek
-
Roel van Dijk
-
Simon Marlow
-
Yitzchak Gale