Composition operator [was: Re: Records in Haskell]

On 01/12/2012 07:06 AM, Matthew Farkas-Dyck wrote:
On 09/01/2012, Greg Weber
wrote: Note that a move to a different operator for function composition (discussed in dot operator section) would make things easier to parse:
b<~ .a
where the unicode dot might be even nicer.
I told you so (^_^)
Unicode dot (∘) would be optimal, since that's what it's for. If to type '∘' is awkward, then one can use (Control.Category.<<<). We need not (and, in my opinion, should not) define another operator.
Is ∘ (U+2218 RING OPERATOR)* in Prelude yet? We should propose that.** I checked my compose-key (Linux) and it can produce middle-dot · (U+00B7 MIDDLE DOT) with "Compose . -", but not ∘ in any way***. If we use the proper Unicode operator ∘, then let's make a wiki page for all the common OSes/input methods, saying how to input it (aside from copy/paste). Is there anything on the Web somewhere already? Did Perl do this ( - I think they introduced some Unicode-based syntax)? There's http://www.haskell.org/haskellwiki/Unicode-symbols , which has some information (none of which let me write ∘ in an e-mail without using copy/paste). * found out using http://www.decodeunicode.org/ ** "(negate ∘ (+ 1)) 3" doesn't work in my ghci 7.0.3 with no command-line options (and a UTF-8 system locale and UTF-8-compatible terminal, as is typical these days). *** I checked the list in /usr/share/X11/locale/en_US.UTF-8/Compose

On Thu, Jan 12, 2012 at 13:20, Isaac Dupree
way***. If we use the proper Unicode operator ∘, then let's make a wiki page for all the common OSes/input methods, saying how to input it (aside from copy/paste). Is there anything on the Web somewhere already? Did Perl do this ( - I think they introduced some Unicode-based syntax)? There's http://www.haskell.org/**haskellwiki/Unicode-symbolshttp://www.haskell.org/haskellwiki/Unicode-symbols, which has some information (none of which let me write ∘ in an e-mail without using copy/paste).
Most platforms have some way to define new keys: on Unix with X11 you can use Xkb or xmodmap (the keysym for a Unicode character is the codepoint expressed as a hex constant, so 0x2218 for ∘), and there are a handful of Xkb editors out there; on OS X you can use keyboard substitutions (Language & Text > Text) or use a program such as Ukelele to modify the keyboard layout; I don't know specifics for Windows, but at its lowest level there are registry tweaks and there should also be programs to do those tweaks in people-comprehensible ways. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

I told you so (^_^)
Unicode dot (∘) would be optimal, since that's what it's for. If to type '∘' is awkward, then one can use (Control.Category.<<<). We need not (and, in my opinion, should not) define another operator.
Is ∘ (U+2218 RING OPERATOR)* in Prelude yet? We should propose that.**
I checked my compose-key (Linux) and it can produce middle-dot · (U+00B7 MIDDLE DOT) with "Compose . -", but not ∘ in any way***. If we use the proper Unicode operator ∘, then let's
OS X makes a MIDDLE DOT with option-shift-9: · However, changing the composition operator from (.) will involve huge amounts of changes to source code. It can mean changing a large percentage of all the lines in each file---I for one use (.) quite heavily. With a haskell parser the rename could happen automatically, but we're still talking about a wall in source control where every single line was changed by one person. Groups that are reluctant to make formatting changes for fear of confusing revision history are really going to hate that one.

Quoth Evan Laforge
... Groups that are reluctant to make formatting changes for fear of confusing revision history are really going to hate that one.
I think a lively discussion would also be possible over whether exotic characters are suitable at all. But this is a more or less academic discussion, taking place on ghc-users, nominally out of view of the general Haskell community, right? So I don't need to intrude with mundane objections of that nature. Donn

But this is a more or less academic discussion, taking place on ghc-users, nominally out of view of the general Haskell community, right? So I don't need to intrude with mundane objections of that nature.
True, true, there is that. However, I think there's at least a little bit in the idea that we could put this back into haskell to help with the record problem. Otherwise I was going to say that further academic discussion can inform future language design, but of course it occurs to me that languages like agda already provide a living example of lots of unicode. So the discussion need not be academic, just write some agda :) After all, they've failed to take hold in haskell but history and culture play a large role so it's not a level playing field. And to add further to the academic discussion *ahem* :) I tried a vim script a while back that turned -> and :: and (.) and whatnot into unicode versions. I eventually decided that many of these require variable width fonts to look nice, or maybe it's just that the common fixed width fonts haven't payed much attention to those little-used characters, but the result is that they turn into a lot of little misshapen blobs. => looks like a blobbier ->, etc. For example, on the mac 11 pt menlo, (.) is a nice solid 4 pixel square, while · is a single pixel with some anti-aliasing fluff. I'd have to crank up the font size on everything else. So I turned it off. I mention it because I haven't seen anyone else mention variable width as a prerequisite for using unicode operators. I enjoyed writing with variable width fonts on acme back in the day but in the end I'm just too comfortable with vim keys and vim doesn't like variable width. Given infinite time I'd fix up yi for variable width, improve its vi keys, get a really high DPI monitor, and give it a shot. Of course a significant part is that unfamiliar symbols haven't engraved themselves into the instant pattern recognition part of the brain so much, maybe after a year of using them exclusively they'd look like perfectly clear little blobs.

On 12 Jan 2012, at 18:41, Evan Laforge wrote:
Unicode dot (∘) would be optimal, since that's what it's for.
Is ∘ (U+2218 RING OPERATOR)* in Prelude yet? We should propose that.**
However, changing the composition operator from (.) will involve huge amounts of changes to source code.
Indeed. It strikes me that it should be the _new_ feature that takes the new syntax, rather than stealing the old syntax and disrupting all the existing code that happily uses . for function composition. So, who is up for proposing centred dot as the new record-field syntax? Regards, Malcolm

On Thu, Jan 12, 2012 at 6:23 PM, Malcolm Wallace
On 12 Jan 2012, at 18:41, Evan Laforge wrote:
Unicode dot (∘) would be optimal, since that's what it's for.
Is ∘ (U+2218 RING OPERATOR)* in Prelude yet? We should propose that.**
However, changing the composition operator from (.) will involve huge amounts of changes to source code.
Indeed. It strikes me that it should be the _new_ feature that takes the new syntax, rather than stealing the old syntax and disrupting all the existing code that happily uses . for function composition.
So, who is up for proposing centred dot as the new record-field syntax?
We don't need to make this change overnight. The new records system will be turned on by an extension. If you use the new records system, then you will be forced to place spaces around the dot composition operator, or use the unicode dot or an alternative operator. It is good that we are starting this process of discussing what the future should look like though.

Quoth Greg Weber
On Thu, Jan 12, 2012 at 6:23 PM, Malcolm Wallace
wrote: So, who is up for proposing centred dot as the new record-field syntax?
We don't need to make this change overnight. The new records system will be turned on by an extension. If you use the new records system, then you will be forced to place spaces around the dot composition operator, or use the unicode dot or an alternative operator.
The point seems pretty well taken. If many programmers will actually want the records extension, then they'll want to use it without breaking their code, and the above proposal would help with that. Changing the compose notation to some other character would break practically all Haskell code, so it's hard to take that seriously. "Spaces or unicode" would be the worst idea yet, but hopefully that isn't what you meant. Donn

On Thu, Jan 12, 2012 at 17:14, Donn Cave
"Spaces or unicode" would be the worst idea yet, but hopefully that isn't what you meant.
Thing is, I think the spaces idea is considered acceptable because it's *already there*. Take a look at how GHC decides whether (.) is the composition operator or a module qualification. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Quoth Brandon Allbery
On Thu, Jan 12, 2012 at 17:14, Donn Cave
wrote: "Spaces or unicode" would be the worst idea yet, but hopefully that isn't what you meant.
Thing is, I think the spaces idea is considered acceptable because it's *already there*. Take a look at how GHC decides whether (.) is the composition operator or a module qualification.
Sure, but I mean: given that "f . g" continues to be composition, but a record notation takes over the unspaced dot, breaking an existing "f.g" ... ... what is the rationale for an additional unicode dot? That's why I more or less assume that wasn't what he meant, that both " . " and "<unicode dot>" would be supported at the same time for composition, but rather just that one or the other would be chosen. Donn

On Thu, Jan 12, 2012 at 17:33, Donn Cave
Quoth Brandon Allbery
, On Thu, Jan 12, 2012 at 17:14, Donn Cave
wrote: "Spaces or unicode" would be the worst idea yet, but hopefully that isn't what you meant.
Thing is, I think the spaces idea is considered acceptable because it's *already there*. Take a look at how GHC decides whether (.) is the composition operator or a module qualification.
... what is the rationale for an additional unicode dot?
That's why I more or less assume that wasn't what he meant, that both " . " and "<unicode dot>" would be supported at the same time for composition, but rather just that one or the other would be chosen.
Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Quoth Brandon Allbery
, ... Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues.
OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.) Donn

On Thu, Jan 12, 2012 at 18:15, Donn Cave
Quoth Brandon Allbery
, ... Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues. OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.)
What? The weird parsing rules are part of the ASCII one; it's what the Unicode is trying to *avoid*. We're just about out of ASCII, weird parsing is going to be required at some point. I also wish to note that I have never been a member of the "anything beyond plain ASCII is fundamental evil" set; if we're going to think that way, just go back to BAUDOT and punched cards. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

I also wish to note that I have never been a member of the "anything beyond plain ASCII is fundamental evil" set; if we're going to think that way, just go back to BAUDOT and punched cards.
Well, it's similar to the 80 columns debate. You have to draw the line somewhere. It's not about fundamental evil vs. punch cards, but rather n vs. n+1. ASCII is a particularly well worn value of 'n'. In the case of records, we're not really out of symbols, there is still @, #, &, etc. It's just that we like the look of a dot, and we are out of dot lookalikes :) For that matter we have a high'dot though it would mess up the "x' = f x" convention. But it has fine precedent in perl 4 if I'm remembering correctly :)

Quoth Brandon Allbery
, ... Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues. OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.)
What? The weird parsing rules are part of the ASCII one; it's what the Unicode is trying to *avoid*. We're just about out of ASCII, weird parsing is going to be required at some point.
What what? Are you not proposing to allow both ways to write composition, "." and "<unicode symbol>" at the same time, but with different syntactical requirements? Unicode characters as code would be bad enough, but mixing them with a hodge-podge of ASCII aliases with different parsing rules isn't going to win any prizes for elegance. Donn

On Thu, Jan 12, 2012 at 19:38, Donn Cave
Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues.
OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.)
What? The weird parsing rules are part of the ASCII one; it's what the Unicode is trying to *avoid*. We're just about out of ASCII, weird parsing is going to be required at some point.
What what? Are you not proposing to allow both ways to write composition, "." and "<unicode symbol>" at the same time, but with different syntactical requirements? Unicode characters as code would be bad enough, but mixing them with a hodge-podge of ASCII aliases with different parsing rules isn't going to win any prizes for elegance.
Backward compatibility is rarely elegant, and this is in any case piggybacking on already existing (indeed, longstanding) parser horkage. The point of the Unicode is a first step at getting away from said horkage, which hopefully can be completed someday. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Requiring unicode characters for the Haskell syntax to solve a *relatively* simple problem is a bad bad idea. It is the equivalent of shooting birds with nuclear missiles. Yes you do solve the "bird" problem but it is nothing compared with the fallout consequences. Morten On 13/01/12 10:15, Donn Cave wrote:
Quoth Brandon Allbery
, ... Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues. OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.) Donn
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Thu, Jan 12, 2012 at 22:32, Morten Brodersen < Morten.Brodersen@constrainttec.com> wrote:
Requiring unicode characters for the Haskell syntax to solve a *relatively* simple problem is a bad bad idea.
Nobody said anything about requiring it. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Even if Unicode is not required, there is still a fallout. Let's look at a simple scenario: Somebody uploads a nice useful Haskell module that include a number of Unicode symbols. Unfortunately most unix/windows/tools/source controls/editors out there are Ascii only. So people who wants to use the module now potentially need to convert the code to Ascii (and potentially back again) in order to use it with non-Unicode tools. Yes it is *of course* doable but all of that just because of a *relatively" simple problem to do with how you access record fields? Really? That is IMHO a clear example of shooting birds with nuclear rockets. Let me suggest that a simple non-nuclear alternative would be for people interested in Unicode symbols to use an editor that auto converts from Haskell Ascii to Haskell Unicode when loading and (of course) back again when saving. You can do that today. You can even pick your own Ascii from/to Unicode mapping. No need to argue about whether a symbol is prettier than another. All of this without forcing the rest of the (couldn't care less about record access syntax) Haskell community to have to deal with Unicode :-) Morten On 13/01/12 14:43, Brandon Allbery wrote:
On Thu, Jan 12, 2012 at 22:32, Morten Brodersen
mailto:Morten.Brodersen@constrainttec.com> wrote: Requiring unicode characters for the Haskell syntax to solve a *relatively* simple problem is a bad bad idea.
Nobody said anything about requiring it.
-- brandon s allbery allbery.b@gmail.com mailto:allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Fri, 2012-01-13 at 15:16 +1100, Morten Brodersen wrote:
Unfortunately most unix/windows/tools/source controls/editors out there are Ascii only.
So after about 20 years the unicode standard has been around, the quantification "most" still applies? Maybe I'm using a non-representative platform, but every tool for manipulating source-code I use nowadays has support for the unicode charset w/ with at least the utf8 encoding... -- hvr

On 13/01/2012, Herbert Valerio Riedel
On Fri, 2012-01-13 at 15:16 +1100, Morten Brodersen wrote:
Unfortunately most unix/windows/tools/source controls/editors out there are Ascii only.
So after about 20 years the unicode standard has been around, the quantification "most" still applies? Maybe I'm using a non-representative platform, but every tool for manipulating source-code I use nowadays has support for the unicode charset w/ with at least the utf8 encoding...
This is my experience also.
-- hvr
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Let me suggest that a simple non-nuclear alternative would be for people interested in Unicode symbols to use an editor that auto converts from Haskell Ascii to Haskell Unicode when loading and (of course) back again when saving. You can do that today. You can even pick your own Ascii from/to Unicode mapping. No need to argue about whether a symbol is prettier than another. All of this without forcing the rest of the (couldn't care less about record access syntax) Haskell community to have to deal with Unicode :-)
I tried this with an existing vim extension, but found it to be slightly buggy. It seems much greater effort to tell people to use a mapping - now everyone must learn how to configure/program their editor, and this may be impossible for some editors. I was under the impression that virtually every code editor and viewer supports utf-8. If you want to do mappings, it seems preferable to first use a new ASCII operator for composition, which can still be mapped it to a unicode dot.

On 12/01/2012, Morten Brodersen
Even if Unicode is not required, there is still a fallout. Let's look at a simple scenario:
Somebody uploads a nice useful Haskell module that include a number of Unicode symbols.
Unfortunately most unix/windows/tools/source controls/editors out there are Ascii only.
If so, most unix/windows/tools/source controls/editors out there are broken.
So people who wants to use the module now potentially need to convert the code to Ascii (and potentially back again) in order to use it with non-Unicode tools.
No, people need to get Unicode (or, better yet, when possible, code-agnostic) tools.
Yes it is *of course* doable but all of that just because of a *relatively" simple problem to do with how you access record fields? Really?
That is IMHO a clear example of shooting birds with nuclear rockets.
Let me suggest that a simple non-nuclear alternative would be for people interested in Unicode symbols to use an editor that auto converts from Haskell Ascii to Haskell Unicode when loading and (of course) back again when saving. You can do that today. You can even pick your own Ascii from/to Unicode mapping. No need to argue about whether a symbol is prettier than another. All of this without forcing the rest of the (couldn't care less about record access syntax) Haskell community to have to deal with Unicode :-)
That is (in my opinion) a clear example of shooting foes in heavy armour with bird-shot. From a muzzle-loader.
Morten
On 13/01/12 14:43, Brandon Allbery wrote:
On Thu, Jan 12, 2012 at 22:32, Morten Brodersen
mailto:Morten.Brodersen@constrainttec.com> wrote: Requiring unicode characters for the Haskell syntax to solve a *relatively* simple problem is a bad bad idea.
Nobody said anything about requiring it.
-- brandon s allbery allbery.b@gmail.com mailto:allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

On Jan 12, 2012 9:18 PM, "Morten Brodersen" < Morten.Brodersen@constrainttec.com> wrote:
Unfortunately most unix/windows/tools/source controls/editors out there are Ascii only.
This is probably not true. I'm among the most annoyed people at this Unicode trend... but the problem is in text entry methods, and occasionally display and fonts if you go completely insane. Any text processing tool that has trouble with utf8 these days is broken, plain and simple. Sadly, convenient and general Unicode input is probably not a solvable problem... and the human mental acumen to remember so many symbols and their names or pronunciations will certainly remain unsolved for the foreseeable future. Anyone want to hazard a guess what percentage of Haskell programmers can even name every letter of the Greek alphabet? I'm betting less than 10%.

On 1/13/12 11:31 PM, Chris Smith wrote:
Anyone want to hazard a guess what percentage of Haskell programmers can even name every letter of the Greek alphabet? I'm betting less than 10%.
That's easy for anyone with a classical education (or far too much mathematics education). There may well be less than 10% of Haskellers who have such, but Greek letters are rather trivial to learn. The problems arise when suddenly everyone needs to know the names of all the graphemes of Cyrillic, Amharic, Mandarin, Japanese, Arabic, Hebrew,... (because there are a great number of writing systems, and because they are relatively unknown having not been coopted by the mathematical tradition) or when everyone needs to know the names of all the obscure symbols mathematicians have come up with over the years (because generally they've never been named in the first place). -- Live well, ~wren

I broke out the dot operator section of the proposal to its own page since
it is actually fairly independent of the different proposals.
http://hackage.haskell.org/trac/ghc/wiki/Records/DotOperator
On Sat, Jan 14, 2012 at 7:26 PM, wren ng thornton
On 1/13/12 11:31 PM, Chris Smith wrote:
Anyone want to hazard a guess what percentage of Haskell programmers can even name every letter of the Greek alphabet? I'm betting less than 10%.
That's easy for anyone with a classical education (or far too much mathematics education). There may well be less than 10% of Haskellers who have such, but Greek letters are rather trivial to learn.
The problems arise when suddenly everyone needs to know the names of all the graphemes of Cyrillic, Amharic, Mandarin, Japanese, Arabic, Hebrew,... (because there are a great number of writing systems, and because they are relatively unknown having not been coopted by the mathematical tradition) or when everyone needs to know the names of all the obscure symbols mathematicians have come up with over the years (because generally they've never been named in the first place).
-- Live well, ~wren
______________________________**_________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.**org
http://www.haskell.org/**mailman/listinfo/glasgow-**haskell-usershttp://www.haskell.org/mailman/listinfo/glasgow-haskell-users

ghc could start warning about "missing spaces around dot" immediately, i.e. in the next release. In later releases this can be made an error and then (field) names starting with a dot can be introduced as postfix functions. I think only a space between "." and a following lowercase letter is needed, though. C. Am 18.01.2012 05:02, schrieb Greg Weber:
I broke out the dot operator section of the proposal to its own page since it is actually fairly independent of the different proposals.

Actually, we don't need symbols at all, nor all these damned letters.
The set of valid characters in an identifier can be of size 2: one
each upper- and lower-case, e.g. [Pp].
For example, to define const function:
p :: P (p (P pp p));
p pp _ = pp;
where P is function type.
If we drop all the symbols, and all numerals but [01], we could have a
6-bit character set!
On 12/01/2012, Donn Cave
Quoth Brandon Allbery
, ... Seems obvious to me: on the one hand, there should be a plain-ASCII version of any Unicode symbol; on the other, the ASCII version has shortcomings the Unicode one doesn't (namely the existing conflict between use as composition and use as module and now record qualifier). So, the Unicode one requires support but avoids weird parse issues. OK. To me, the first hand is all you need - if there should be a plain-ASCII version of any Unicode symbol anyway, then you can avoid some trouble by just recognizing that you don't need Unicode symbols (let alone with different parsing rules.)
Donn
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Fri, Jan 13, 2012 at 8:23 PM, Matthew Farkas-Dyck
p :: P (p (P pp p))
This is not too far off the original design of Miranda, in which type variable names were drawn from the set {*, **, ***, ...} /g -- "Would you be so kind as to remove the apricots from the mashed potatoes?"
participants (13)
-
Brandon Allbery
-
Chris Smith
-
Christian Maeder
-
Donn Cave
-
Evan Laforge
-
Greg Weber
-
Herbert Valerio Riedel
-
Isaac Dupree
-
J. Garrett Morris
-
Malcolm Wallace
-
Matthew Farkas-Dyck
-
Morten Brodersen
-
wren ng thornton