[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

newer
Announce: ~Haskell 2011

older
Re: Proposal: Define UTF-8 to be...

Colin Paul Adams

6 Apr 2011 6 Apr '11

3:34 p.m.

I forgot to CC the list:

...

...
...
...
...
"Roel" == Roel van Dijk writes:

Roel> I propose to make UTF-8 the only allowed encoding for Haskell Roel> source files. Implementations must discard an initial Byte Roel> Order Mark (BOM) if present [3]. Roel> * Pros - Ensures that Haskell source can be reliably exchanged Roel> on the byte level. - Disallows implicit ISO-8859-* encodings Roel> in source code, ensuring portability. - Little or no Roel> implementation burden for compiler writers. Having thought this over a bit more, I don't think it's a good idea. Allowed? Allowed for what? What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate. I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit). -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Show replies by date

Bas van Dijk

6 Apr 6 Apr

6:02 p.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

On 6 April 2011 17:34, Colin Paul Adams wrote:

...

I forgot to CC the list:

...
...
...
...
...
"Roel" == Roel van Dijk writes:

Roel> I propose to make UTF-8 the only allowed encoding for Haskell Roel> source files. Implementations must discard an initial Byte Roel> Order Mark (BOM) if present [3].

Roel> * Pros - Ensures that Haskell source can be reliably exchanged Roel> on the byte level. - Disallows implicit ISO-8859-* encodings Roel> in source code, ensuring portability. - Little or no Roel> implementation burden for compiler writers.

Having thought this over a bit more, I don't think it's a good idea.

Allowed? Allowed for what?

Allowed to be called a Haskell file. If the report doesn't specify what a Haskell file is then we can't reliably exchange Haskell source files by only looking at the files themselves.

...

What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate.

Suppose I give you MyHaskellFile.hs. But before telling you how it's encoded I go gliding (a hobby of mine). Unfortunately I crash my glider and die :-(. Now what encoding option do you give to your front-end script?

...

I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit).

That would only allow users of Hackage and Cabal to reliably exchange their Haskell files. If we specify it in the report every user can benefit. Regards, Bas

Colin Paul Adams

6:42 p.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

...

...
...
...
...
"Bas" == Bas van Dijk writes:

Bas> On 6 April 2011 17:34, Colin Paul Adams wrote: >> Allowed? Allowed for what? Bas> Allowed to be called a Haskell file. Well, what the report says on that is irrelevant. If I see a file containing Haskell code, I shall call it a Haskell file, irrespective. I suspect I will be in the majority. Bas> If the report doesn't specify what a Haskell file is then we Bas> can't reliably exchange Haskell source files by only looking at Bas> the files themselves. Sure we can. >> What does it achieve? Nothing, as far as I can see. Authors will >> still be able to write their Haskell code in any encoding they >> like. And any compiler can have a front-end script with an option >> to specify the encoding used by source files, which simply uses >> iconv on the fly to translate. Bas> Suppose I give you MyHaskellFile.hs. But before telling you how Bas> it's encoded I go gliding (a hobby of mine). Unfortunately I Bas> crash my glider and die :-(. Now what encoding option do you Bas> give to your front-end script? Whatever the encoding happens to be. That won't be hard to find out. And presumably Haskell programmers don't dies so very frequently that it will become a time-consuming affair. >> I think the real place to mandate UTF-8 would be for >> Hackage. That's where it matters (an alternative design would be >> to add an encoding field in the .cabal file, but I don't think >> this has much merit). Bas> That would only allow users of Hackage and Cabal to reliably Bas> exchange their Haskell files. If we specify it in the report Bas> every user can benefit. There is no benefit that I see. Anyone is free to write Haskell code in whatever encoding they fancy. Irrespective of what the report says. It's not going to have the force of law. -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Roel van Dijk

7 Apr 7 Apr

7:32 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

On 6 April 2011 20:42, Colin Paul Adams wrote:

...

...
...
...
...
...
"Bas" == Bas van Dijk writes: Bas> On 6 April 2011 17:34, Colin Paul Adams wrote: >> Allowed? Allowed for what? Bas> Allowed to be called a Haskell file. Well, what the report says on that is irrelevant. If I see a file containing Haskell code, I shall call it a Haskell file, irrespective. I suspect I will be in the majority.

It seems you have a problem with the word "allowed". What do you think of the interoperability guidelines as proposed by Duncan? They are less stringent while having the same intention as my original proposal.

Colin Paul Adams

10:05 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

...

...
...
...
...
"Roel" == Roel van Dijk writes:

Roel> On 6 April 2011 20:42, Colin Paul Adams wrote: Roel> It seems you have a problem with the word "allowed". What do Roel> you think of the interoperability guidelines as proposed by Roel> Duncan? They are less stringent while having the same Roel> intention as my original proposal. I think they are fine. -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Christian Maeder

9:29 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Am 06.04.2011 20:02, schrieb Bas van Dijk:

...

On 6 April 2011 17:34, Colin Paul Adams wrote: [...]

...
I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit).

That would only allow users of Hackage and Cabal to reliably exchange their Haskell files. If we specify it in the report every user can benefit.

I agree that Haskell files should be UTF-8, but I also agree that it is only relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or higher. The motivation for this proposal can only be that future cabal packages will use more and more non-ASCII characters as is possible via http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\" as lambda symbol - probably because lambda is a letter and no symbol!) However, I think, these extra characters only make sense for corner cases and should not be recommended for general purposes. For nicer looking sources I would recommend special viewers or post-processors (like haddock or hscolour) that translate certain ASCII sequences to unicode points. So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8. Cheers Christian

Christian Maeder

9:43 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Am 07.04.2011 11:29, schrieb Christian Maeder:

...

So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8.

This means all comments in haskell sources (for hackage) should be in English, exclusively! Supply separate documentation in your mother tongue if required. And I rather write out "Euro" or "Lambda" than trying to find the corresponding unicode character (and even in .tex sources ASCII sequences exist for those).

...

Cheers Christian

David Virebayre

12:15 p.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011/4/7 Christian Maeder

...

Am 07.04.2011 11:29, schrieb Christian Maeder:

So my view is: Stick to ASCII and only if you must (not just for casual

...
reasons) use UTF-8.

This means all comments in haskell sources (for hackage) should be in English, exclusively! Supply separate documentation in your mother tongue if required.

This thread being about the encoding of haskell source files, not hackage's, I don't see the point in talking about restricting hackage's langage to English. - it is not the topic - it's already a de-facto standard anyways. On the other hand, not restricting the usage of any langage in haskell source files is IMHO a must, and it's not well supported as it is; for example haddock does't support accentuated letters in comments. This proposal gives a clear signal that utf8 characters have to be taken into account, and hopefully tools like haddock will evolve to support them thanks to this proposal.

Roel van Dijk

11:09 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

On 7 April 2011 11:29, Christian Maeder wrote:

...

I agree that Haskell files should be UTF-8, but I also agree that it is only relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or higher.

It is relevant for all tools and systems which process Haskell sources.

...

The motivation for this proposal can only be that future cabal packages will use more and more non-ASCII characters as is possible via http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\" as lambda symbol - probably because lambda is a letter and no symbol!)

The motivation for this proposal is interoperability of all tools and systems which process Haskell source files. Perhaps I could have made that more clear.

...

However, I think, these extra characters only make sense for corner cases and should not be recommended for general purposes.

Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs I have many more like that. I do not consider Chinese a corner case. Nor the vast amount of languages which can not be represented using ASCII.

...

So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8.

When to use certain characters is not part of the proposal.

Christian Maeder

11:24 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Am 07.04.2011 13:09, schrieb Roel van Dijk:

...

Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

Great, that file made my firefox open infinitely many tabs (so that I had to close it). C.

Colin Paul Adams

11:33 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

...

...
...
...
...
"Christian" == Christian Maeder writes:

Christian> Am 07.04.2011 13:09, schrieb Roel van Dijk: >> Please take a look at the following file: >> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs Christian> Great, that file made my firefox open infinitely many Christian> tabs (so that I had to close it). On mine, it just launched Emacs to open the file (where it looked great). Note that I certainly agree with Roel on Chinese not being a corner case. (And my wife would certainly have something to say if I didn't, she being Chinese herself!) -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Christian Maeder

11:52 a.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Am 07.04.2011 13:24, schrieb Christian Maeder:

...

Am 07.04.2011 13:09, schrieb Roel van Dijk:

...
Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

Great, that file made my firefox open infinitely many tabs (so that I had to close it).

Well, my firefox had "use firefox" for "Haskell source code" (and failed for any .hs file) C.

Christian Maeder

1:03 p.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

Am 07.04.2011 13:09, schrieb Roel van Dijk:

...

Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points. C.

Roel van Dijk

1:25 p.m.

New subject: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

On 7 April 2011 15:03, Christian Maeder wrote:

...

The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points.

The code in question contains Chinese characters like '三', which in a US-ASCII encoded Haskell file must be written as '\x4e09'. I do not consider these escape sequences an acceptable substitute. But this discussion is tangential to the proposal. I am interested in having a common set of guidelines to ensure interoperability of Haskell sources. An important part of that is having a common method of decoding files containing Haskell code. The easiest way to achieve that is using only 1 encoding. UTF-8 is the best candidate for that role.

5211

Age (days ago)

5212

Last active (days ago)

List overview

Download

13 comments

5 participants

participants (5)

Bas van Dijk
Christian Maeder
Colin Paul Adams
David Virebayre
Roel van Dijk