Unicode Haskell source -- Yippie!

older
Announcement: First Dutch FinTech...

Rustom Mody

24 Apr 2014 24 Apr '14

5:15 p.m.

I'm mighty pleased to note that the following is valid Haskell code! Do others find this useful/appealing? Any possibilities on making the commented out parts work? [Pragmatics about typing this at the same speed and facility as we do with Ascii is a separate and (IMHO) solvable problem though its not the case at the moment] -------------------- import qualified Data.Set as Set -- Experimenting with Unicode in Haskell source -- Numbers x ≠ y = x /= y x ≤ y = x <= y x ≥ y = x >= y x ÷ y = divMod x y x ⇑ y = x ^ y x × y = x * y -- readability hmmm !!! π = pi -- ⌊ x = floor x -- ⌈ x = ceiling x -- Lists xs ⤚ ys = xs ++ ys -- Bools x ∧ y = x && y x ∨ y = y || y -- ¬x = not x -- Sets x ∈ s = x `Set.member` s -- or keep ∈ for list elem? s ∪ t = s `Set.union` t s ∩ t = s `Set.intersection` t s ⊆ t = s `Set.isSubsetOf` t s ⊂ t = s `Set.isProperSubsetOf` t s ⊈ t = not (s `Set.isSubsetOf` t) -- ∅ = Set.null

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Brandon Allbery

24 Apr 24 Apr

5:21 p.m.

On Thu, Apr 24, 2014 at 1:15 PM, Rustom Mody wrote:

...

Any possibilities on making the commented out parts work?

Unary operators are not really doable. Take a look at the ugliness around unary minus for why. (Note in particular how it breaks section syntax.) Nullary operators are even less doable; Set.null must be done with an identifier character, not a symbol character. (There are a number of such that would fit, since it is actually used in some languages.) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

MigMit

5:22 p.m.

It's disgusting. Отправлено с iPhone

...

24 апр. 2014 г., в 21:15, Rustom Mody написал(а):

I'm mighty pleased to note that the following is valid Haskell code!

Do others find this useful/appealing? Any possibilities on making the commented out parts work?

[Pragmatics about typing this at the same speed and facility as we do with Ascii is a separate and (IMHO) solvable problem though its not the case at the moment]

-------------------- import qualified Data.Set as Set -- Experimenting with Unicode in Haskell source

-- Numbers x ≠ y = x /= y x ≤ y = x <= y x ≥ y = x >= y x ÷ y = divMod x y x ⇑ y = x ^ y

x × y = x * y -- readability hmmm !!! π = pi

-- ⌊ x = floor x -- ⌈ x = ceiling x

-- Lists xs ⤚ ys = xs ++ ys

-- Bools x ∧ y = x && y x ∨ y = y || y -- ¬x = not x

-- Sets

x ∈ s = x `Set.member` s -- or keep ∈ for list elem? s ∪ t = s `Set.union` t s ∩ t = s `Set.intersection` t s ⊆ t = s `Set.isSubsetOf` t s ⊂ t = s `Set.isProperSubsetOf` t s ⊈ t = not (s `Set.isSubsetOf` t) -- ∅ = Set.null

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rustom Mody

25 Apr 25 Apr

6:31 a.m.

Many and varied answers -- this is sure a vibrant place -- Thanks for all suggestions and pointers -- including negative -- I really wish to know the lay-of-the-land! It will sure take me a while to follow up check and get back -- hope thats ok :-) As of this point haskell seems to be ahead of the competition in embracing unicode. eg python moved to python 3 with a number of backward incompatible changes, the most notable of which is the moving to unicode as the default. However for unicode in python source it still seems people are finding it hard to accept.

Niklas Haas

6:34 a.m.

On Fri, 25 Apr 2014 12:01:41 +0530, Rustom Mody wrote:

...

As of this point haskell seems to be ahead of the competition in embracing unicode. eg python moved to python 3 with a number of backward incompatible changes, the most notable of which is the moving to unicode as the default. However for unicode in python source it still seems people are finding it hard to accept.

FWIW, python's support for Unicode in its standard library is significantly better than Haskell's. Haskell fails on basic functions such as ‘toUpper’, ‘length’ or ‘==’.

Travis Cardwell

6:54 a.m.

On 2014年04月25日 15:34, Niklas Haas wrote:

...

FWIW, python's support for Unicode in its standard library is significantly better than Haskell's. Haskell fails on basic functions such as ‘toUpper’, ‘length’ or ‘==’.

I have to humbly disagree. Python does indeed have great Unicode support, but using Unicode for everything is not efficient in cases where it is not needed. With Haskell, one can use bytestring [1] and text [2] as necessary to have more control over how content is processed. Both packages are in Haskell Platform, the equivalent of Python's standard library. Cheers, Travis ---- [1] http://hackage.haskell.org/package/bytestring [2] http://hackage.haskell.org/package/text

Christopher Allen

7:25 a.m.

I'm going to disagree for a different reason. The transition to Python 3 improved unicode support in some respects, but utterly gutted the previously excellent codec support. Now you can't really handle arbitrary source/destination encodings of text without treating everything as if they were bytes. Really bad. On Fri, Apr 25, 2014 at 1:54 AM, Travis Cardwell < travis.cardwell@extellisys.com> wrote:

...

On 2014年04月25日 15:34, Niklas Haas wrote:

...
FWIW, python's support for Unicode in its standard library is significantly better than Haskell's. Haskell fails on basic functions such as ‘toUpper’, ‘length’ or ‘==’.

I have to humbly disagree. Python does indeed have great Unicode support, but using Unicode for everything is not efficient in cases where it is not needed. With Haskell, one can use bytestring [1] and text [2] as necessary to have more control over how content is processed. Both packages are in Haskell Platform, the equivalent of Python's standard library.

Cheers,

Travis

----

[1] http://hackage.haskell.org/package/bytestring [2] http://hackage.haskell.org/package/text _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Travis Cardwell

8:24 a.m.

On 2014年04月25日 16:25, Christopher Allen wrote:

...

I'm going to disagree for a different reason. The transition to Python 3 improved unicode support in some respects, but utterly gutted the previously excellent codec support. Now you can't really handle arbitrary source/destination encodings of text without treating everything as if they were bytes. Really bad.

Perhaps I am misunderstanding, but, from my experience, Python 3 still has excellent codec support: https://docs.python.org/3.4/library/codecs.html When reading from a file, the source encoding can be passed to the `open` function so that it handles transcoding for you. When writing to a file, the destination encoding can similarly be specified to `open`. When dealing with other sources/destinations, data must be read/written as bytes, but content can be encoded/decoded as necessary using the functions in the codecs module. Haskell has excellent codec support thanks to ICU: http://hackage.haskell.org/package/text-icu The contents of the `Data.Text.ICU.Convert` module can be used to convert between codecs. For reference, here is a list of supported codecs: http://demo.icu-project.org/icu-bin/convexp Cheers, Travis

Christopher Allen

8:37 a.m.

http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/ On Fri, Apr 25, 2014 at 3:24 AM, Travis Cardwell < travis.cardwell@extellisys.com> wrote:

...

...
I'm going to disagree for a different reason. The transition to Python 3 improved unicode support in some respects, but utterly gutted the previously excellent codec support. Now you can't really handle arbitrary source/destination encodings of text without treating everything as if

On 2014年04月25日 16:25, Christopher Allen wrote: they

...
were bytes. Really bad.

Perhaps I am misunderstanding, but, from my experience, Python 3 still has excellent codec support:

https://docs.python.org/3.4/library/codecs.html

When reading from a file, the source encoding can be passed to the `open` function so that it handles transcoding for you. When writing to a file, the destination encoding can similarly be specified to `open`. When dealing with other sources/destinations, data must be read/written as bytes, but content can be encoded/decoded as necessary using the functions in the codecs module.

Haskell has excellent codec support thanks to ICU:

http://hackage.haskell.org/package/text-icu

The contents of the `Data.Text.ICU.Convert` module can be used to convert between codecs. For reference, here is a list of supported codecs:

http://demo.icu-project.org/icu-bin/convexp

Cheers,

Travis _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Travis Cardwell

10:43 a.m.

On 2014年04月25日 17:37, Christopher Allen wrote:

...

http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/

Much of this article relates to what I wrote in my first reply: On 2014年04月25日 15:54, Travis Cardwell wrote:

...

Python does indeed have great Unicode support, but using Unicode for everything is not efficient in cases where it is not needed.

Armin says that Python 3 is not appropriate for real-world applications due to this issue. He wants functionality in the standard library that processes bytes directly (as in Python 2). The problem is that processing bytes directly is not safe. The `urlparse` example is a good one: naively parsing URLs as bytes can lead to major security vulnerabilities. While Armin would not parse things naively, people with less experience with encodings are less likely to make mistakes in Python 3, at the expense of performance. I think that Haskell's support for byte-strings and Unicode strings (as well as many other encodings via ICU transcoding) is quite nice because it supports doing whatever needs to be done while giving the programmer the control necessary to implement real-world applications. Though one can still manage to shoot themselves in the foot, Haskell's types make the confusing subject of encoding more approachable and significantly reduces chance of error, IMHO. The article also talks about how Python's codec system is used for non-character encodings (such as zlib!) in addition to character encodings. I do not think that it is particularly good design. Attempts to clean up the design have resulted in compatibility issues with old code: type errors! As a Haskell programmer, I am clearly biased, but I think that the design of such modules could be significantly improved by using static types and type classes! ;) Cheers, Travis

Kyle Murphy

24 Apr 24 Apr

5:27 p.m.

It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language. If you really like that sort of thing though you might want to look into APL which is either famous or infamous depending on your outlook for needing its own custom keyboard in order to write it. -R. Kyle Murphy -- Curiosity was framed, Ignorance killed the cat. On Thu, Apr 24, 2014 at 1:15 PM, Rustom Mody wrote:

...

I'm mighty pleased to note that the following is valid Haskell code!

Do others find this useful/appealing? Any possibilities on making the commented out parts work?

[Pragmatics about typing this at the same speed and facility as we do with Ascii is a separate and (IMHO) solvable problem though its not the case at the moment]

-------------------- import qualified Data.Set as Set -- Experimenting with Unicode in Haskell source

-- Numbers x ≠ y = x /= y x ≤ y = x <= y x ≥ y = x >= y x ÷ y = divMod x y x ⇑ y = x ^ y

x × y = x * y -- readability hmmm !!! π = pi

-- ⌊ x = floor x -- ⌈ x = ceiling x

-- Lists xs ⤚ ys = xs ++ ys

-- Bools x ∧ y = x && y x ∨ y = y || y -- ¬x = not x

-- Sets

x ∈ s = x `Set.member` s -- or keep ∈ for list elem? s ∪ t = s `Set.union` t s ∩ t = s `Set.intersection` t s ⊆ t = s `Set.isSubsetOf` t s ⊂ t = s `Set.isProperSubsetOf` t s ⊈ t = not (s `Set.isSubsetOf` t) -- ∅ = Set.null

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rustom Mody

5:36 p.m.

On Thu, Apr 24, 2014 at 10:57 PM, Kyle Murphy wrote:

...

It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language. If you really like that sort of thing though you might want to look into APL which is either famous or infamous depending on your outlook for needing its own custom keyboard in order to write it.

-R. Kyle Murphy

I dont think anyone can reasonably talk of making it a default! Just seeing how much and to where the envelope can be pushed. eg I would like to see \ spelled as λ As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

Sylvain Henry

5:54 p.m.

As a first step, it could be useful to provide this kind of syntax for haddock generated HTML sources (read-only). Fortress language provides something similar (see Section 2.3 in the spec [1]). -Sylvain [1] http://www.ccs.neu.edu/home/samth/fortress-spec.pdf 2014-04-24 19:36 GMT+02:00 Rustom Mody :

...

On Thu, Apr 24, 2014 at 10:57 PM, Kyle Murphy wrote:

...
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language. If you really like that sort of thing though you might want to look into APL which is either famous or infamous depending on your outlook for needing its own custom keyboard in order to write it.

-R. Kyle Murphy

I dont think anyone can reasonably talk of making it a default! Just seeing how much and to where the envelope can be pushed.

eg I would like to see \ spelled as λ

As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Nickolay Kudasov

6:40 p.m.

...

eg I would like to see \ spelled as λ

I have symbol substitution enabled in Vim. E.g. when I write \ (and it is syntactically lambda) I get λ. The same way composition (.) is replaced with ∘. The same trick can be enabled for other operators as well. So I have normal text and nice presentation in *my* text editor: it does not bother anyone but me. Nick 2014-04-24 21:36 GMT+04:00 Rustom Mody :

...

On Thu, Apr 24, 2014 at 10:57 PM, Kyle Murphy wrote:

...
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language. If you really like that sort of thing though you might want to look into APL which is either famous or infamous depending on your outlook for needing its own custom keyboard in order to write it.

-R. Kyle Murphy

I dont think anyone can reasonably talk of making it a default! Just seeing how much and to where the envelope can be pushed.

eg I would like to see \ spelled as λ

As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ben Franksen

27 Apr 27 Apr

9:30 a.m.

Nickolay Kudasov wrote:

...

...
eg I would like to see \ spelled as λ

I have symbol substitution enabled in Vim. E.g. when I write \ (and it is syntactically lambda) I get λ. The same way composition (.) is replaced with ∘. The same trick can be enabled for other operators as well. So I have normal text and nice presentation in *my* text editor: it does not bother anyone but me.

I think this is the right approach. See also https://github.com/i-tu/Hasklig/ The main problem with special Unicode characters, as I see it, is that it is no longer possible to distinguish characters unambiguously just by looking at them. Apart from questions of maintainability, this is also a potential security problem: it enables an attacker to slip in malicious code simply by importing a module whose name looks like a well known safe module. In a big and complex piece of software, such an attack might not be spotted for some time. Cheers Ben -- "Make it so they have to reboot after every typo." -- Scott Adams

Richard A. O'Keefe

28 Apr 28 Apr

2:33 a.m.

On 27/04/2014, at 9:30 PM, Ben Franksen wrote:

...

The main problem with special Unicode characters, as I see it, is that it is no longer possible to distinguish characters unambiguously just by looking at them.

"No longer"? Hands up all the people old enough to have used "coding forms". Yes, children, there was a time when programmers wrote their programs on printed paper forms (sort of like A4 tipped sideways) so that the keypunch girls (not my sexism, historical accuracy) knew exactly which column each character went in. And at the top of each sheet was a row of boxes for you to show how you wrote 2 Z 7 1 I ! 0 O and the like. For that matter, I recall a PhD thesis from the 80s in which the author spent a page grumbling about the difficulty of telling commas and semicolons apart...

...

Apart from questions of maintainability, this is also a potential security problem: it enables an attacker to slip in malicious code simply by importing a module whose name looks like a well known safe module. In a big and complex piece of software, such an attack might not be spotted for some time.

Again, considering the possibilities of "1" "i" "l", I don't think we actually have a new problem here. Presumably this can be addressed by tools: "here is are some modules, tell me what exactly they depend on" not entirely unlike ldd(1). Of course, the gotofail bug shows that it's not enough to _have_ tools like that, you have to use them and review the results periodically.

Chris Warburton

25 Apr 25 Apr

1:02 p.m.

Rustom Mody writes:

...

As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

I know it's bikeshedding, but I think Agda and Idris are more relevant to Haskell than APL, since their semantics are closer (and they're both implemented in Haskell). Agda makes extensive (ab)use of Unicode identifiers, eg. https://github.com/agda/agda-stdlib/blob/master/src/Algebra.agda Idris specifically avoids Unicode identifiers, for reasons outlined at https://github.com/idris-lang/Idris-dev/wiki/Unofficial-FAQ Personally I prefer working in Idris to Agda, since the Unicode puts me off. I usually resort to copy/pasting symbols, which is tedious compared to typing names. Cheers, Chris

Rustom Mody

1:30 p.m.

On Fri, Apr 25, 2014 at 6:32 PM, Chris Warburton wrote:

...

Rustom Mody writes:

...
As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

I know it's bikeshedding, but I think Agda and Idris are more relevant to Haskell than APL, since their semantics are closer (and they're both implemented in Haskell).

Agda makes extensive (ab)use of Unicode identifiers, eg. https://github.com/agda/agda-stdlib/blob/master/src/Algebra.agda

Idris specifically avoids Unicode identifiers, for reasons outlined at https://github.com/idris-lang/Idris-dev/wiki/Unofficial-FAQ

Personally I prefer working in Idris to Agda, since the Unicode puts me off. I usually resort to copy/pasting symbols, which is tedious compared to typing names.

Cheers, Chris

Thanks Chris for that evaluation. Not bike-shedding as far as I can see. Yes input-ing things by some GUI-picker or copy-pasting etc would quickly become a major pain. I believe that there are roughly these 5 levels to this with per-char cost decreasing and fixed cost increasing as we go down 1. GUI-picker (IBUS etc) copy-pasting from the web etc -- ok for arm-chair discussions; ridiculous for serious development 2. Editor based input methods eg tex input-method in emacs 3. Window-system (X/MS etc) input methods 4. OS-based input methods 5. Special purpose hardware-keyboards I believe 3 makes for a particularly good fixed/variable cost balance point eg in X-windows if you run this command $ setxkbmap -layout "us,gr" -option "grp:switch" then typing abcdefg with right-alt depressed, gives: αβψδεφγ For those who prefer a more moded approach (vi-users?) here is $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr" This makes the Shift-Alt chord switch in and out (ie toggle) greek keyboard with the scroll-lock light as indicator All this is clearly just an analogy; what we need is not a greek keyboard but a keyboard mapping analogous to gr(eek). Try s/gr/apl in the commands above for apl which, while distant from haskell gives a taste for what a *programmer* can use/make. regards Rusi

Richard A. O'Keefe

28 Apr 28 Apr

2:05 a.m.

On 26/04/2014, at 1:30 AM, Rustom Mody wrote:

...

On Fri, Apr 25, 2014 at 6:32 PM, Chris Warburton wrote: Rustom Mody writes:

...
As for APL, it failed for various reasons eg - mixing up assembly language (straight line code with gotos) with functional idioms - the character set was a major hurdle in the 60s. Thats not an issue today when most OSes/editors are unicode compliant

I strongly suspect that the failure of APL had very little to do with the character set. When APL was introduced, the character set was just a matter of dropping in a different golf-ball. Later, it was just bits on a screen. Heck in 1984 I was using C and LaTeX on an IBM mainframe where the terminals displayed curly braces as spaces, and oddly enough that didn't kill C... In any case, it was possible to enter any arbitrary APL text using straight ASCII, so that was no great problem. There were a number of much more serious issues with APL. (1) In "classic" APL everything is an n-dimensional array, either an array of characters or an array of (complex) numbers. An absolutely regular array. Want to process a collection of records where some of the fields are strings? No can do. Want to process a collection of strings of different length? No can do: you must use a 2-dimensional array, padding all the strings to the same length. Want type checking? Hysterical laughter. APL2 "fixed" this by introducing nested arrays. This is powerful, but occasionally clumsy. And it is positional, not named. You *can* represent trees, you can represent records with mixed fields, you can do all sorts of stuff. But it's positional, not named. (2) There aren't _that_ many APL symbols, and it didn't take too long to learn them, and once you did, they weren't that hard to remember. (Although the use of the horseshoe symbols in APL2 strikes me as *ab*use.) Problem is, a whole lot of other things were done with numbers. Here's the trig functions: 0 ◦ x sqrt(1-x**2) 1 ◦ x sin x ¯1 ◦ x arcsin x 2 ◦ x cos x ¯2 ◦ x arccos x 3 ◦ x tan x ¯3 ◦ x arctan x 4 ◦ x sqrt(x**2+1) ¯4 ◦ x sqrt(x**2-1) 5 ◦ x sinh x ¯5 ◦ x arcsinh x 6 ◦ x cosh x ¯6 ◦ x arccosh x 7 ◦ x tanh x ¯7 ◦ x arctanh x Who thought _that_ was a good idea? Well, presumably it was the same person who introduced the "I-beam functions". A range of system functions (time of day, cpu time used, space available, ...) were distinguished by *numbers*. (3) Which brings me to the dialect problem. No two systems had the *same* set of I-beam functions. You couldn't even rely on two systems having the same *kind* of approach to files. There were several commercial APL systems, and they weren't priced for the hobbyist or student.

David Fox

26 Apr 26 Apr

3:58 p.m.

On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy wrote:

...

It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us.

Bardur Arantsson

4:46 p.m.

On 2014-04-26 17:58, David Fox wrote:

...

On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy wrote:

...
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us.

Typing into a computer != Handwriting (in various significant ways). Most of mathematics notation predates computers/typewriters. Just compare writing a formula by hand and typing the same formula in (La)TeX. Regards,

Carter Schonwald

5:45 p.m.

the vast majority of math is written using latex, which while supporting unicode, is mostly ascii :) On Sat, Apr 26, 2014 at 11:58 AM, David Fox wrote:

...

On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy wrote:

...
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rustom Mody

27 Apr 27 Apr

11:45 a.m.

On Sat, Apr 26, 2014 at 9:28 PM, David Fox wrote:

...

On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy wrote:

...
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us.

Chris spoke of his choice of Idris over Agda related to not going overboard with unicode. The FAQ he linked to has this to say: | And I'm sure that in a few years time things will be different and software will | cope better and it will make sense to revisit this. For now, however, I would | prefer not to allow arbitrary unicode symbols in operators. 1. I'd like to underscore the 'arbitrary'. Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?] By contrast math may at least have some pretensions to universality? 2. Maybe its a good time now to 'revisit'? Otherwise like klunky-qwerty, it may happen that when the technological justifications for an inefficient choice are long gone, social inertia will prevent any useful change. On Sun, Apr 27, 2014 at 3:00 PM, Ben Franksen wrote:

...

The main problem with special Unicode characters, as I see it, is that it is no longer possible to distinguish characters unambiguously just by looking at them. Apart from questions of maintainability, this is also a potential security problem: it enables an attacker to slip in malicious code simply by importing a module whose name looks like a well known safe module. In a big and complex piece of software, such an attack might not be spotted for some time.

Bang on! However the Pandora-box is already open and the creepy-crawlies are all over us. Witness: GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> let а = 1 Prelude> a <interactive>:11:1: Not in scope: `a' Prelude> In case you cant see it the two a's are different unicode characters: CYRILLIC SMALL LETTER A vs LATIN SMALL LETTER A Regards Rusi

Bardur Arantsson

12:22 p.m.

On 2014-04-27 13:45, Rustom Mody wrote:

...

1. I'd like to underscore the 'arbitrary'. Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?] By contrast math may at least have some pretensions to universality?

The symbols in math are also mostly arbitrary. In effect they should be considered as "parallel" to the Cyrillic, Latin or Greek alphabets. (Of course math borrows quite a few symbols from the latter, but I digress.)

...

2. Maybe its a good time now to 'revisit'? Otherwise like klunky-qwerty, it may happen that when the technological justifications for an inefficient choice are long gone, social inertia will prevent any useful change.

Billions of people have QWERTY keyboards. Unless you come up with something *radically* better then they're not going to change. Inertia has made anything but incremental change impossible. (I note that Microsoft actually managed to change the QWERTY keyboard incrementally a decade or two ago by adding the Windows and Context Menu keys. Of course that didn't removing/change any of the existing functionality of the basic QWERTY, so it was a relatively small change.) Using "macros" like "\" (for lambda) or "\sum_{i=0}^{n} i" and having the editor/IDE display that differently is at least semi-practical for typing stuff into your computer using QWERTY. Regards,

Brandon Allbery

3:32 p.m.

On Sun, Apr 27, 2014 at 7:45 AM, Rustom Mody wrote:

...

1. I'd like to underscore the 'arbitrary'. Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?] By contrast math may at least have some pretensions to universality?

Math notations are not as universal as many would like to think, sadly. And I am not sure the historical accident is really irrelevant; as the same "accident" was involved in most of the computer languages and protocols we use daily, I would not be at all surprised to find that there are subtle dependencies buried in the whole mess --- similar to how (most... sigh) humans pick up language and culture signals as children too young to apply any kind of critical analysis to it, and can have real problems trying to eradicate or modify them later. (Yes, languages can be fixed. But how many tools do you use when working with them? It's almost certainly more than the ones that immediately come to mind or are listed on e.g. Hackage. In particular, that ligature may be great in your editor and unfortunate when you pop a terminal and grep for it --- especially if you start extending this to other languages so you need a different set of ligatures [a different font!] for each language....) -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Rustom Mody

4:58 p.m.

On Sun, Apr 27, 2014 at 9:02 PM, Brandon Allbery wrote:

...

On Sun, Apr 27, 2014 at 7:45 AM, Rustom Mody wrote:

...
1. I'd like to underscore the 'arbitrary'. Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?] By contrast math may at least have some pretensions to universality?

Math notations are not as universal as many would like to think, sadly.

And I am not sure the historical accident is really irrelevant; as the same "accident" was involved in most of the computer languages and protocols we use daily, I would not be at all surprised to find that there are subtle dependencies buried in the whole mess --- similar to how (most... sigh) humans pick up language and culture signals as children too young to apply any kind of critical analysis to it, and can have real problems trying to eradicate or modify them later. (Yes, languages can be fixed. But how many tools do you use when working with them? It's almost certainly more than the ones that immediately come to mind or are listed on e.g. Hackage. In particular, that ligature may be great in your editor and unfortunate when you pop a terminal and grep for it --- especially if you start extending this to other languages so you need a different set of ligatures [a different font!] for each language....)

Nice point! And as I said above that Pandora's box is already wide open for current Haskell. [And python and probably most modern languages] Can we reverse it?? Witness: ---------------------- $ ghci GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> let (fine,ﬁne) = (1,2) Prelude> (fine,ﬁne) (1,2) Prelude> --------------------- If you had the choice would you allow that f-i ligature to be thus confusable with the more normal fi? I probably wouldn't but nobody is asking us and the water that's flowed under the bridge cannot be 'flowed' backwards (to the best of my knowledge!) In case that seems far-fetched consider the scenario: 1. Somebody loads (maybe innocently) the code involving variables like 'fine' into a 'ligature-happy 'IDE/editor' 2. The editor quietly changes all the fine to ﬁne. 3. Since all those variables are in local scope nothing untoward is noticed 4. Until someone loads it into an 'old-fashioned' editor... and then... Would you like to be on the receiving end on such 'fun'? IOW the choice "Ascii is the universal bedrock of computers -- best to stick with it" vs "Ascii is arbitrary and parochial and we SHOULD move on" is not a choice at all. We (ie OSes, editors, languages) have all already moved on. And moved on in a particularly ill-considered way. For example there used to be the minor nuisance that linux filesystems were typically case-sensitive, windows case-insensitive. Now with zillions of new confusables like the Latin vs Cyrillic а vs a -- well we have quite a mess! Embracing math in a well-considered and systematic way does not increase the mess; it can even reduce it. My 2 (truly American) ¢ Rusi PS Someone spoke of APL and someone else said Agda/Idris may be more relevant. I wonder how many of the younger generation have heard of squiggol?

Ian Tuomi

8:57 p.m.

On 27 Apr 2014, at 19:58, Rustom Mody wrote:

...

If you had the choice would you allow that f-i ligature to be thus confusable with the more normal fi? I probably wouldn't but nobody is asking us and the water that's flowed under the bridge cannot be 'flowed' backwards (to the best of my knowledge!)

In case that seems far-fetched consider the scenario: 1. Somebody loads (maybe innocently) the code involving variables like 'fine' into a 'ligature-happy 'IDE/editor' 2. The editor quietly changes all the fine to ﬁne. 3. Since all those variables are in local scope nothing untoward is noticed 4. Until someone loads it into an 'old-fashioned' editor... and then...

I develop Hasklig, and have enjoyed the discussion about the pros and cons of ligatures in coding fonts. However, I really must protest this line of reasoning since it is based on false premises. As an opentype feature, ligatures have nothing to do with the 'fi' and 'fl' unicode points, (which are legacy only, and heavily discouraged by the unicode consortium), or with unicode at all. The encoding of the file could be pure ASCII for all the ligatures care. The font used changes how the text looks, and nothing else. When speaking of special unicode symbols in code, I agree with most objections raised against them :) br, Ian P.S. Sorry for potential repost - I'm getting automatic rejects

Conrad Parker

28 Apr 28 Apr

1:31 a.m.

On 28 April 2014 06:57, Ian Tuomi wrote:

...

On 27 Apr 2014, at 19:58, Rustom Mody wrote:

...
If you had the choice would you allow that f-i ligature to be thus confusable with the more normal fi? I probably wouldn't but nobody is asking us and the water that's flowed under the bridge cannot be 'flowed' backwards (to the best of my knowledge!)

In case that seems far-fetched consider the scenario: 1. Somebody loads (maybe innocently) the code involving variables like 'fine' into a 'ligature-happy 'IDE/editor' 2. The editor quietly changes all the fine to ﬁne. 3. Since all those variables are in local scope nothing untoward is noticed 4. Until someone loads it into an 'old-fashioned' editor... and then...

I develop Hasklig, and have enjoyed the discussion about the pros and cons of ligatures in coding fonts. However, I really must protest this line of reasoning since it is based on false premises.

As an opentype feature, ligatures have nothing to do with the 'fi' and 'fl' unicode points, (which are legacy only, and heavily discouraged by the unicode consortium), or with unicode at all. The encoding of the file could be pure ASCII for all the ligatures care. The font used changes how the text looks, and nothing else.

When speaking of special unicode symbols in code, I agree with most objections raised against them :)

Ian, thanks for hasklig. My first thought when I saw it was that hopefully it would assuage the annoying promoters of unicode overreach. Conrad.

Roel van Dijk

24 Apr 24 Apr

7:51 p.m.

I think it is a nice feature if used sparingly. Note that while Unicode symbols are a normal part of the Haskell language you can also turn on some Unicode syntax using the UnicodeSyntax [1] language extension. This means the following will be accepted by GHC: (∈) ∷ ∀ α. Eq α ⇒ α → [α] → Bool (∈) = elem You might want to take a look at some packages I created that define some Unicode symbols for common operators and values [2, 3, 4]. Opinions on whether this is a good idea vary. My anecdotal observation is that it seems to be used more by people who speak a native language that is already poorly served by ASCII. Perhaps because they are already used to not being able to simply type every character they need. 1 - http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#un... 2 - http://www.haskell.org/haskellwiki/Unicode-symbols 3 - http://hackage.haskell.org/package/base-unicode-symbols 4 - http://hackage.haskell.org/package/containers-unicode-symbols

Tikhon Jelvis

8:02 p.m.

I'm actually a fan of using Unicode in my code. As people like to say, code is read more often than it's written, so I'm willing to make typing a bit harder in return for making the code prettier. Happily, typing Unicode characters is quite easy with a good editor (Emacs). I use the TeX input mode which just lets me use TeX names for symbols, but somebody has actually written a Haskell-specific mode which might be even better[1]. I might try it some day. One peculiar habit I have is using x₁ x₂ x₃ instead of x1, x2, x3 or x_1, x_2, x_3. I definitely find the Unicode version easier to read and work with, although it probably helps that Emacs highlights the number in a different color. Unfortunately, this is a minority opinion at the moment. Even in *this* day and age, people still find Unicode too difficult to type! For my internal code, this is not a problem, but it's kept me from putting any Unicode in public APIs. Shame. I also don't use UnicodeSyntax because Emacs can do most of the transformations transparently for me without changing the underlying file. You can turn this on by setting `haskell-font-lock-symbols' to t. I find it makes for much nicer code that's easier to read and, even more importantly, easier to skim. [1]: https://github.com/roelvandijk/emacs-haskell-unicode-input-method On Thu, Apr 24, 2014 at 12:51 PM, Roel van Dijk wrote:

...

I think it is a nice feature if used sparingly.

Note that while Unicode symbols are a normal part of the Haskell language you can also turn on some Unicode syntax using the UnicodeSyntax [1] language extension. This means the following will be accepted by GHC:

(∈) ∷ ∀ α. Eq α ⇒ α → [α] → Bool (∈) = elem

You might want to take a look at some packages I created that define some Unicode symbols for common operators and values [2, 3, 4].

Opinions on whether this is a good idea vary. My anecdotal observation is that it seems to be used more by people who speak a native language that is already poorly served by ASCII. Perhaps because they are already used to not being able to simply type every character they need.

1 - http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#un... 2 - http://www.haskell.org/haskellwiki/Unicode-symbols 3 - http://hackage.haskell.org/package/base-unicode-symbols 4 - http://hackage.haskell.org/package/containers-unicode-symbols

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Niklas Hambüchen

9:07 p.m.

A smart, recently proposed alternative is using a font that does this automatically for you: https://github.com/i-tu/Hasklig In my opinion, this gives you all benefits of unicode syntax without imposing the drawbacks on others, and you don't even have to set up a custom input method to conveniently type them in. Quoting: Some Haskell programmers have resorted to unicode symbols in code as a solution (⇒, ← etc.). This opens a whole new can of worms. In addition to encoding/compatibility problems and all the reasons it never worked out for APL, these symbols are one-character-wide and therefore eye-strainingly small. Hasklig solves this problem the way typographers have always solved ill-fitting characters which co-occur often: ligatures. The underlying code stays the same — only the representation changes. On Thu 24 Apr 2014 21:02:30 BST, Tikhon Jelvis wrote:

...

I'm actually a fan of using Unicode in my code. As people like to say, code is read more often than it's written, so I'm willing to make typing a bit harder in return for making the code prettier.

Happily, typing Unicode characters is quite easy with a good editor (Emacs). I use the TeX input mode which just lets me use TeX names for symbols, but somebody has actually written a Haskell-specific mode which might be even better[1]. I might try it some day.

One peculiar habit I have is using x₁ x₂ x₃ instead of x1, x2, x3 or x_1, x_2, x_3. I definitely find the Unicode version easier to read and work with, although it probably helps that Emacs highlights the number in a different color.

Unfortunately, this is a minority opinion at the moment. Even in *this* day and age, people still find Unicode too difficult to type!

For my internal code, this is not a problem, but it's kept me from putting any Unicode in public APIs. Shame.

I also don't use UnicodeSyntax because Emacs can do most of the transformations transparently for me without changing the underlying file. You can turn this on by setting `haskell-font-lock-symbols' to t. I find it makes for much nicer code that's easier to read and, even more importantly, easier to skim.

[1]: https://github.com/roelvandijk/emacs-haskell-unicode-input-method

On Thu, Apr 24, 2014 at 12:51 PM, Roel van Dijk mailto:vandijk.roel@gmail.com> wrote:

I think it is a nice feature if used sparingly.

Note that while Unicode symbols are a normal part of the Haskell language you can also turn on some Unicode syntax using the UnicodeSyntax [1] language extension. This means the following will be accepted by GHC:

(∈) ∷ ∀ α. Eq α ⇒ α → [α] → Bool (∈) = elem

You might want to take a look at some packages I created that define some Unicode symbols for common operators and values [2, 3, 4].

Opinions on whether this is a good idea vary. My anecdotal observation is that it seems to be used more by people who speak a native language that is already poorly served by ASCII. Perhaps because they are already used to not being able to simply type every character they need.

1 - http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#un... 2 - http://www.haskell.org/haskellwiki/Unicode-symbols 3 - http://hackage.haskell.org/package/base-unicode-symbols 4 - http://hackage.haskell.org/package/containers-unicode-symbols

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lucas Paul

9 May 9 May

4:57 a.m.

Since I saw there wasn't one yet, I have taken the liberty of writing a PKGBUILD for Hasklig and uploading it to the Arch AUR. Anyone running Arch can now get Hasklig from the AUR package otf-hasklig. On Thu, Apr 24, 2014 at 3:07 PM, Niklas Hambüchen wrote:

...

A smart, recently proposed alternative is using a font that does this automatically for you: https://github.com/i-tu/Hasklig

In my opinion, this gives you all benefits of unicode syntax without imposing the drawbacks on others, and you don't even have to set up a custom input method to conveniently type them in.

Quoting:

Some Haskell programmers have resorted to unicode symbols in code as a solution (⇒, ← etc.). This opens a whole new can of worms. In addition to encoding/compatibility problems and all the reasons it never worked out for APL, these symbols are one-character-wide and therefore eye-strainingly small.

Hasklig solves this problem the way typographers have always solved ill-fitting characters which co-occur often: ligatures. The underlying code stays the same — only the representation changes.

Ian Tuomi

7:44 a.m.

On 9 May 2014, at 7:57, Lucas Paul wrote:

...

Since I saw there wasn't one yet, I have taken the liberty of writing a PKGBUILD for Hasklig and uploading it to the Arch AUR. Anyone running Arch can now get Hasklig from the AUR package otf-hasklig.

That's great! Thank you very much for doing this. -Ian

Nick Rudnick

25 Apr 25 Apr

8:49 a.m.

Thanks for the hint =) I ever wondered about some special characters on the standard keyboard, ° -- i.e. [⇑][^] § ¢ € ¶ --e.g. [Alt Gr][R] · -- e.g. [Alt Gr][,] … -- e.g. [Alt Gr][,] – -- e.g. [Alt Gr][-], though optically ambiguous with -, a candidate for *unary minus*, which I think ever has been considered to be realized a little unsatisfactorily ¬ e.g. [Alt Gr][6], also a great candidate for unary use now I see they are usable, too, which to me is great news. Since some time, I began using chars like þ, ø, æ, ŋ, ð, µ, ħ, etc., which, easily accessible on a Linux keyboard, have more than cosmetic utility to me; with find / replace it prevents you from the usual trouble when working – but that with the special chars above is a pleasing discovery. :-) I ever wondered for the reason that in Haskell function composition is not declared by (°) instead of (.), which (a) interferes with other uses of the period and (b) looks less similar to the actual symbol. It also reminds me of a very beautiful feature (Emacs) haskell-mode had some years ago; it was able to display special characters in wider characters (maybe using a special Emacs font??) – does anybody know what I am speaking about, too? Unfortunately, this feature wasn't continued while frankly, "←" as in recent use to me slightly hurts the eye a little everywhere it's seen in code (sorry, "←"... ;-), while back then, in Emacs there where BEAUTIFUL arrows in double width and gorgeous lambdas. Cheers, Nick 2014-04-24 19:15 GMT+02:00 Rustom Mody :

...

I'm mighty pleased to note that the following is valid Haskell code!

Do others find this useful/appealing? Any possibilities on making the commented out parts work?

[Pragmatics about typing this at the same speed and facility as we do with Ascii is a separate and (IMHO) solvable problem though its not the case at the moment]

-------------------- import qualified Data.Set as Set -- Experimenting with Unicode in Haskell source

-- Numbers x ≠ y = x /= y x ≤ y = x <= y x ≥ y = x >= y x ÷ y = divMod x y x ⇑ y = x ^ y

x × y = x * y -- readability hmmm !!! π = pi

-- ⌊ x = floor x -- ⌈ x = ceiling x

-- Lists xs ⤚ ys = xs ++ ys

-- Bools x ∧ y = x && y x ∨ y = y || y -- ¬x = not x

-- Sets

x ∈ s = x `Set.member` s -- or keep ∈ for list elem? s ∪ t = s `Set.union` t s ∩ t = s `Set.intersection` t s ⊆ t = s `Set.isSubsetOf` t s ⊂ t = s `Set.isProperSubsetOf` t s ⊈ t = not (s `Set.isSubsetOf` t) -- ∅ = Set.null

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Richard A. O'Keefe

28 Apr 28 Apr

1:16 a.m.

On 25/04/2014, at 5:15 AM, Rustom Mody wrote:

...

x ÷ y = divMod x y

This one looks wrong to me. In common usage, ÷ indicates plain old division, e.g., 3÷2 = 1½. See for example http://en.wikipedia.org/wiki/Table_of_mathematical_symbols One possibility would be

...

x ÷ y = x / y :: Rational

Rustom Mody

4:23 a.m.

On Mon, Apr 28, 2014 at 6:46 AM, Richard A. O'Keefe wrote:

...

On 25/04/2014, at 5:15 AM, Rustom Mody wrote:

...
x ÷ y = divMod x y

This one looks wrong to me. In common usage, ÷ indicates plain old division, e.g., 3÷2 = 1½. See for example http://en.wikipedia.org/wiki/Table_of_mathematical_symbols

One possibility would be

...
x ÷ y = x / y :: Rational

Thanks Richard for (as usual!) look at that list with a fine-toothed comb

I started with writing a corresponding list for python: http://blog.languager.org/2014/04/unicoded-python.html As you will see I mention there that ÷ mapped to divMod is one but hardly the only possibility. That list is mostly about math, not imperative features and so carries over from python to haskell mostly unchanged. Please (if you have 5 minutes) glance at it and give me your comments. I may then finish a similar one for Haskell. Thanks Rusi -- http://www.the-magus.in http://blog.languager.org

Richard A. O'Keefe

29 Apr 29 Apr

5:34 a.m.

Before speaking of "Apl's mistakes", one should be clear about what exactly those mistakes *were*. I should point out that the symbols of APL, as such, were not a problem. But the *number* of such symbols was. In order to avoid questions about operator precedence, APL *hasn't* any. In the same way, Smalltalk has an extensible set of 'binary selectors'. If you see an expression like a ÷> b ~@ c which operator dominates which? Smalltalk adopted the same solution as APL: no operator precedence. Before Pascal, there was something approaching a consensus in programming languages that ** tightest *,/,div,mod unary and binary +,- relational operators not and or In order to make life easier with user-defined operators, Algol 68 broke this by making unary operators (including not and others you haven't heard of like 'down' and 'upb') bind tightest. As it turned out, this make have made life easier for the compiler, but not for people. In order, allegedly, to make life easier for students, Pascal broke this by making 'or' and 'and' at the same level as '+' and '*'. To this day, many years after Pascal vanished (Think Pascal is dead, MrP is dead, MPW Pascal is dead, IBM mainframe Pascal died so long ago it doesn't smell any more, Sun Pascal is dead, ...) a couple of generations of programmers believe that you have to write (x > 0) && (x < n) in C, because of what their Pascal-trained predecessor taught them. If we turn to Unicode, how should we read a ⊞ b ⟐ c Maybe someone has a principled way to tell. I don't. And then we have to ask about a ⊞⟐ b ⟐⊞ c. This is NOT a new problem. Haskell already has way too many operators floating around for me to remember their relative precedence, and I have to follow a rule "when an expression contains two operators from different 'semantic fields', use parentheses." Don't ask me to explain that! Unicode does make the problem rather more pressing. Instead of agonising over the difference between < << <<< <<<< and the like, now we can agonise over the difference between a couple of dozen variously decorated and accompanied versions of the subset sign as single characters. Did you know that there is a single ⩵ character? Distinct from ==? I firmly believe that *careful* introduction of mathematical symbols can be good, but that it needs rather more care for consistency and readabiity than Haskell operators have had so far. I think wide consideration is necessary lest we end up with things like x ÷ y where x and y are numbers not giving a number.

...

I started with writing a corresponding list for python: http://blog.languager.org/2014/04/unicoded-python.html

The "Math Space Advantage" there can be summarised as: "if you use Unicode symbols for operators you can omit even more spaces than you already do, wow!" Never mind APL. What about SETL? For years I yearned to get my hands on SETL so that I could write (∀x∈s)(∃y∈s)f(x, y) The idea of using *different* symbols for testing and binding (2.2, "Dis") strikes me as "Dis" indeed. I want to use the same character in both places because they mean the same thing. It's the ∀ and ∃ that mean "bind". The name space burden reduction argument won't fly either. Go back and look at http://en.wikipedia.org/wiki/Table_of_mathematical_symbols ≤ less than or equal to in a partial order is a subgroup of can be reduced to × multiplication Cartesian product cross product (as superscript) group of units In mathematics, the same meaning may be represented by several different symbols. And the same symbol may be used for several different meanings. (If Haskell allowed prefix and superscript operators, think of the fun we could have keeping track of * the Hodge dual *v and the ordinary dual: v .) Representing π as π seems like a clear win. But do we want to use c, e, G, α, γ and other constants with familiar 1-character names by those characters? What if someone is writing Haskell in Greek? (Are you reading this, Kostis?) I STRONGLY disagree that x÷y should violate the norms of school by returning something other than a number. When it comes to returning a quotient and remainder, Haskell has two ways to do this and Common Lisp has four. I don't know how many Python has, but in situation of such ambiguity, it would be disastrous NOT to use words to make it clear which is meant. I find the use of double up arrow for exponentiation odd. Back in the days of BASIC on a model 33 Teletype, one used the single up arrow for that purpose. As for floor and ceiling, it would be truer to mathematical notation to use ⌊x⌋ for floor. (I note that in Arial Unicode as this appears on my screen these characters look horrible. They should have the same vertical extent as the square brackets they are derived from. Cambria Math and Lucida Sans are OK.) The claim that "dicts are more fundamental to programming than sets" appears to be falsified by SETL, in which dicts were just a special case of sets. (For that matter, so they were in Smalltalk-80.) For existing computational notations with rich sets of mathematical symbols look at Z and B. (B as in the B method, not as in the ancestor of C.) The claim that mathematical expressions cannot be written in Lisp or COBOL is clearly false. See Interlisp, which allowed infix operators. COBOL uses "-" for subtraction, it just needs spaces around it, which is a Good Thing. Using the centre dot as a word separator would have more merit if it weren't so useful as an operator. The reference to APL has switch the operands of take and drop. It should be number_to_keep ↑ vector number_to_lose ↓ vector

...

Rustom Mody

30 Apr 30 Apr

8:21 a.m.

Hi Richard Thanks for a vigorous and rigorous appraisal of my blog post: http://blog.languager.org/2014/04/unicoded-python.html However this is a Haskell list and my post being not just a discussion about python but some brainstorming for how python could change, a detailed discussion about that is probably too off-topic here dont you think? So for now let me address just one of your points, which is appropriate for this forum. I'd be pleased to discuss the other points you raise off list. Also, while Ive learnt a lot from this thread, I also see some confusions and fallacies. So before drilling down into details and losing the forest for the trees, I'd prefer to start with a broad perspective rather than a narrow technological focus -- more at end. On Tue, Apr 29, 2014 at 11:04 AM, Richard A. O'Keefe wrote:

...

Before speaking of "Apl's mistakes", one should be clear about what exactly those mistakes *were*. I should point out that the symbols of APL, as such, were not a problem. But the *number* of such symbols was. In order to avoid questions about operator precedence, APL *hasn't* any. In the same way, Smalltalk has an extensible set of 'binary selectors'. If you see an expression like

a ÷> b ~@ c

which operator dominates which? Smalltalk adopted the same solution as APL: no operator precedence.

Before Pascal, there was something approaching a consensus in programming languages that ** tightest *,/,div,mod unary and binary +,- relational operators not and or In order to make life easier with user-defined operators, Algol 68 broke this by making unary operators (including not and others you haven't heard of like 'down' and 'upb') bind tightest. As it turned out, this make have made life easier for the compiler, but not for people. In order, allegedly, to make life easier for students, Pascal broke this by making 'or' and 'and' at the same level as '+' and '*'. To this day, many years after Pascal vanished (Think Pascal is dead, MrP is dead, MPW Pascal is dead, IBM mainframe Pascal died so long ago it doesn't smell any more, Sun Pascal is dead, ...) a couple of generations of programmers believe that you have to write (x > 0) && (x < n) in C, because of what their Pascal-trained predecessor taught them.

If we turn to Unicode, how should we read

a ⊞ b ⟐ c

Maybe someone has a principled way to tell. I don't.

Without claiming to cover all cases, this is a 'principle' If we have: (⊞) :: a -> a -> b (⟐) :: b -> b -> c then ⊞'s precedence should be higher than ⟐. This is what makes it natural to have the precedences of (+) (<) (&&) in decreasing order. This is also why the bitwise operators in C have the wrong precedence: x & 0xF == 0xF has only 1 meaningful interpretation; C chooses the other! The error comes (probably) from treating & as close to the logical operators like && whereas in fact it is more kin to arithmetic operators like +. There are of course other principles: Dijkstra argued vigorously that boolean algebra being completely symmetric in (∨,True) (∧, False), ∧, ∨ should have the same precedence. Evidently not too many people agree with him! ---------------------- To come back to the broader questions. I started looking at Niklas' link (thanks Niklas!) http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#un... and I find that the new unicode chars for -<< and >>- are missing. Ok, a minor doc-bug perhaps? Poking further into that web-page, I find that it has charset=ISO-8859-1 Running w3's validator http://validator.w3.org/ on it one gets: No DOCTYPE found! What has this got to do with unicode in python source? That depends on how one sees it. When I studied C (nearly 30 years now!) we used gets as a matter of course. Today we dont. Are Kernighan and Ritchie wrong in teaching it? Are today's teacher's wrong in proscribing it? I believe the only reasonable outlook is that truth changes with time: it was ok then; its not today. Likewise DOCTYPE-missing and charset-other-than-UTF-8. Random example showing how right yesterday becomes wrong today: http://www.sitepoint.com/forums/showthread.php?660779-Content-type-iso-8859-... Unicode vs ASCII in program source is similar (I believe). My thoughts on this (of a philosophical nature) are: http://blog.languager.org/2014/04/unicode-and-unix-assumption.html If we can get the broader agreements (disagreements!) out of the way to start with, we may then look at the details. Thanks and regards, Rusi

Daniel Fischer

9:03 a.m.

On Wednesday 30 April 2014, 13:51:38, Rustom Mody wrote:

...

Without claiming to cover all cases, this is a 'principle' If we have: (⊞) :: a -> a -> b (⟐) :: b -> b -> c then ⊞'s precedence should be higher than ⟐. But what if (⟐) :: b -> b -> a?

...

This is what makes it natural to have the precedences of (+) (<) (&&) in decreasing order.

This is also why the bitwise operators in C have the wrong precedence: x & 0xF == 0xF has only 1 meaningful interpretation; C chooses the other! The error comes (probably) from treating & as close to the logical operators like && whereas in fact it is more kin to arithmetic operators like +.

That comes from `&` and `|` being logical operators in B. Quoth Dennis Ritchie (http://cm.bell-labs.com/who/dmr/chist.html in the section "Neonatal C"):

...

to make the conversion less painful, we decided to keep the precedence of the & operator the same relative to ==, and merely split the precedence of && slightly from &. Today, it seems that it would have been preferable to move the relative precedences of & and ==, and thereby simplify a common C idiom

Rustom Mody

10:24 a.m.

On Wed, Apr 30, 2014 at 2:33 PM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:

...

...
x & 0xF == 0xF has only 1 meaningful interpretation; C chooses the other! The error comes (probably) from treating & as close to the logical operators like && whereas in fact it is more kin to arithmetic operators like +.

That comes from `&` and `|` being logical operators in B. Quoth Dennis Ritchie (http://cm.bell-labs.com/who/dmr/chist.html in the section "Neonatal C"):

...
to make the conversion less painful, we decided to keep the precedence of the & operator the same relative to ==, and merely split the precedence of && slightly from &. Today, it seems that it would have been preferable to move the relative precedences of & and ==, and thereby simplify a common C idiom

Nice! I learn a bit of history. Hope we learn from it! viz. Some things which are easy in a state of transition become painful in a (more) steady state.

Rustom Mody

11:24 a.m.

On Wed, Apr 30, 2014 at 2:33 PM, Daniel Fischer < daniel.is.fischer@googlemail.com> wrote:

...

On Wednesday 30 April 2014, 13:51:38, Rustom Mody wrote:

...
Without claiming to cover all cases, this is a 'principle' If we have: (⊞) :: a -> a -> b (⟐) :: b -> b -> c then ⊞'s precedence should be higher than ⟐. But what if (⟐) :: b -> b -> a?

Sorry, missed that question tucked away :-) I did say a (not the) principle, not claiming to cover all cases! I guess it should be non-associative (ie infix without l/r) same precedence?

ok＠cs.otago.ac.nz

12:38 p.m.

I wrote

...

...
If we turn to Unicode, how should we read

a â b â c

Maybe someone has a principled way to tell. I don't.

Rustom Mody wrote:

...

Without claiming to cover all cases, this is a 'principle' If we have: (â) :: a -> a -> b (â) :: b -> b -> c

then â's precedence should be higher than â.

I always have trouble with "higher" and "lower" precedence, because I've used languages where the operator with the bigger number binds tighter and languages where the operator with the bigger number gets to dominate the other. Both are natural enough, but with opposite meanings for "higher". This principle does not explain why * binds tighter than +, which means we need more than one principle. It also means that if OP1 :: a -> a -> b and OP2 :: b -> b -> a then OP1 should be higher than OP2 and OP2 should be higher than OP1, which is a bit of a puzzler, unless perhaps you are advocating a vaguely CGOL-ish asymmetric precedence scheme where the precedence on the left and the precedence on the right can be different. For the record, let me stipulate that I had in mind a situation where OP1, OP2 : a -> a -> a. For example, APL uses the floor and ceiling operators infix to stand for max and min. This principle offers us no help in ordering max and min. Or consider APL again, whence I'll borrow (using ASCII because this is webmail tonight) take, rotate :: Int -> Vector t -> Vector t Haskell applies operator precedence before it does type checking, so how would it know to parse n `take` m `rotate` v as (n `take` (m `rotate` v))? I don't believe there was anything in my original example to suggest that either operator had two operands of the same type, so I must conclude that this principle fails to provide any guidance in that case (like this one).

...

This is what makes it natural to have the precedences of (+) (<) (&&) in decreasing order.

This is also why the bitwise operators in C have the wrong precedence:

Oh, I agree with that!

...

The error comes (probably) from treating & as close to the logical operators like && whereas in fact it is more kin to arithmetic operators like +.

The error comes from BCPL where & and && were the same operator (similarly | and ||). At some point in the evolution of C from BCPL the operators were split apart but the bitwise ones left in the wrong place.

...

There are of course other principles: Dijkstra argued vigorously that boolean algebra being completely symmetric in (â¨,True) (â§, False), â§, â¨ should have the same precedence.

Evidently not too many people agree with him!

Sadly, I am reading this in a web browser where the Unicode symbols are completely garbled. (More precisely, I think it's WebMail doing it.) Maybe Unicode isn't ready for prime time yet? You might be interested to hear that in the Ada programming language, you are not allowed to mix 'and' with 'or' (or 'and then' with 'or else') without using parentheses. The rationale is that the designers did not believe that enough programmers understood the precedence of and/or. The GNU C compiler kvetches when you have p && q || r without otiose parentheses. Seems that there are plenty of designers out there who agree with Dijkstra, not out of a taste for well-engineered notation, but out of contempt for the Average Programmer.

...

When I studied C (nearly 30 years now!) we used gets as a matter of course. Today we dont.

Hmm. I started with C in late 1979. Ouch. That's 34 and a half years ago. This was under Unix version 6+, with a slightly "pre-classic" C. A little later we got EUC Unix version 7, and a 'classic' C compiler that, oh joy, supported /\ (min) and \/ (max) operators. [With a bug in the code generator that I patched.]

...

Are Kernighan and Ritchie wrong in teaching it? Are today's teacher's wrong in proscribing it?

I believe the only reasonable outlook is that truth changes with time: it was ok then; its not today.

In this case, bull-dust! gets() is rejected today because a botch in its design makes it bug-prone. Nothing has changed. It was bug-prone 34 years ago. It has ALWAYS been a bad idea to use gets(). Amongst other things, the Unix manuals have always presented the difference between gets() -- discards the terminator -- and fgets() -- annoyingly retains the terminator -- as a bug which they thought it was too late to fix; after all, C had hundreds of users! No, it was obvious way back then: you want to read a line? Fine, WRITE YOUR OWN FUNCTION, because there is NO C library function that does quite what you want. The great thing about C was that you *could* write your own line-reading function without suffering. Not only would your function do the right thing (whatever you conceived that to be), it would be as fast, or nearly as fast, as the built-in one. Try doing *that* in PL/I! No, in this case, *opinions* may have changed, peoples *estimation* of and *tolerance for* the risks may have changed, but the truth has not changed.

...

Likewise DOCTYPE-missing and charset-other-than-UTF-8. Random example showing how right yesterday becomes wrong today: http://www.sitepoint.com/forums/showthread.php?660779-Content-type-iso-8859-...

Well, "missing" DOCTYPE is where it starts to get a bit technical. An SGML document is basically made up of three parts: - an SGML declaration (meta-meta-data) that tells the parser, amongst other things, what characters to use for delimiters, whether various things are case sensitive, what the numeric limits are, and whether various features are enabled. - a Document Type Declaration (meta-data) that conforms to the lexical rules set up by the SGML declaration and defines (a) the grammar rules and (b) a bunch of macros. - a document (data). The SGML declaration can be supplied to a parser as data (and yes, I've done that), or it can be stipulated by convention (as the HTML standards do). In the same way, the DTD can be - completely declared in-line - defined by reference with local amendments - defined solely by reference - known by convention. If there is a convention that a document without a DTD uses a particular DTD, SGML is fine with that. (It's all part of "entity management", one of the minor arcana of SGML.) As for the link in question, it doesn't show right turning into wrong. A quick summary of the sensible part of that thread: - If you use a <meta> tag to specify the encoding of your file, it had better be *right*. This has been true ever since <meta> tags first existed. - If you have a document in Latin 1 and any characters outside that range are written as character entity references or numeric character references, there is no need to change. No change of right to wrong here! - If you want to use English punctuation marks like dashes and curly quotes, using UTF-8 will let you write these characters without character entities or NCRs. This is only half true. It will let you do this conveniently IF your local environment has fonts that include the characters. (Annoyingly, in Mac OS 10.6, which I'm typing on, Edit|Special characters is not only geographically confused, listing Coptic as a *European* script -- last type I checked Egypt was still in Africa -- but it doesn't display any Coptic characters. In the Mac OS 10.7 system I normally use, Edit|Special characters got dramatically worse as an interface, but no more competent with Coptic characters. Just because a character is in Unicode doesn't mean it can be *used*, practically speaking.) Instead of saying that what is wrong has become or is becoming right, I'd prefer to say that what was impossible is becoming possible and what was broken (Unicode font support) is gradually getting fixed. - Some Unicode characters, indeed, some Latin 1 characters, are so easy to confuse with other characters that it is advisable to use character entities. Again, nothing about wrong turning into right. This was good advice as soon as Latin 1 came out.

...

Unicode vs ASCII in program source is similar (I believe).

Well, not really. People using specification languages like Z routinely used characters way outside the ASCII range; one way was to use LaTeX. Another way was to have GUI systems that let you key in using LaTeX character names or menus but see the intended characters. Back in about 1984 I was able to use a 16-bit character set on the Xerox Lisp Machines. I've still got a manual for the XNS character set somewhere. In one of the founding documents for the ISO Prolog standard, I recommended, in 1984, that the Prolog standard. That's THREE YEARS before Unicode was a gleam in its founders' eyes. This is NOT new. As soon as there were bit-mapped displays and laser printers, there was pressure to allow a wider range of characters in programs. Let me repeat that: 30 years ago I was able to use non-ASCII characters in computer programs. *Easily*, via virtual keyboards. In 1987, the company I was working at in California revamped their system to handle 16-bit characters and we bought a terminal that could handle Japanese characters. Of course this was because we wanted to sell our system in Japan. But this was shortly before X11 came out; the MIT window system of the day was X10 and the operating system we were using the 16-bit characters on was VMS. That's 27 years ago. This is not new. So what _is_ new? * A single standard. Wait, we DON'T have a single standard. We have a single standard *provider* issuing a rapid series of revisions of an increasingly complex standard, where entire features are first rejected outright, then introduced, and then deprecated again. Unicode 6.3 came out last year with five new characters (bringing the total to 110,122), over a thousand new character *variants*, two new normative properties, and a new BIDI algorithm which I don't yet understand. And Unicode 7.0 is due out in 3 months. Because of this - different people WILL have tools that understand different versions of Unicode. In fact, different tools in the same environment may do this. - your beautiful character WILL show up as garbage or even blank on someone's screen UNLESS it is an old or extremely popular (can you say Emoji? I knew you could. Can you teach me how to say it?) one. - when proposing to exploit Unicode characters, it is VITAL to understand that the Unicode "stability" rules are and which characters have what stable properties. * With large cheap discs, large fonts are looking like a lot less of a problem. (I failed to learn to read the Armenian letters, but do have those. I succeeded in learning to read the Coptic letters -- but not the language(s)! -- but don't have those. Life is not fair.) * We now have (a series of versions of) a standard character set containing a vast number of characters. I very much doubt whether there is any one person who knows all the Unicode characters. * Many of these characters are very similar. I counted 64 "right arrow" characters before I gave up; this didn't include harpoons. Some of these are _very_ similar. Some characters are visibly distinct, but normally regarded as mere stylistic differences. For example, <= has at least three variations (one bar, slanted; one bar, flat; two bars, flat) which people familiar with less than or equal have learned *not* to tell apart. But they are three different Unicode characters, from which we could make three different operators with different precedence or associativity, and of course type.

...

My thoughts on this (of a philosophical nature) are: http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

If we can get the broader agreements (disagreements!) out of the way to start with, we may then look at the details.

I think Haskell can tolerate an experimental phase where people try out a lot of things as long as everyone understands that it *IS* an experimental phase, and as long as experimental operators are kept out of Hackage, certainly out of the Platform, or at least segregate it into areas with big flashing "danger" signs. I think a *small* number of "pretty" operators can be added to Haskell, without the sky falling, and I'll probably quite like the result. (Does anyone know how to get a copy of the collected The Squiggolist?) Let's face it, if a program is full of Armenian identifiers or Ogham ones I'm not going to have a clue what it's about anyway. But keeping the "standard" -- as in used in core modules -- letter and operator sets smallish is probably a good idea.

4083

Age (days ago)

4098

Last active (days ago)

List overview

Download

41 comments

24 participants

participants (24)

Bardur Arantsson
Ben Franksen
Brandon Allbery
Carter Schonwald
Chris Warburton
Christopher Allen
Conrad Parker
Daniel Fischer
David Fox
Ian Tuomi
Kyle Murphy
Lucas Paul
MigMit
Nick Rudnick
Nickolay Kudasov
Niklas Haas
Niklas Hambüchen
ok＠cs.otago.ac.nz
Richard A. O'Keefe
Roel van Dijk
Rustom Mody
Sylvain Henry
Tikhon Jelvis
Travis Cardwell