[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

I forgot to CC the list:
"Roel" == Roel van Dijk
writes:
Roel> I propose to make UTF-8 the only allowed encoding for Haskell Roel> source files. Implementations must discard an initial Byte Roel> Order Mark (BOM) if present [3]. Roel> * Pros - Ensures that Haskell source can be reliably exchanged Roel> on the byte level. - Disallows implicit ISO-8859-* encodings Roel> in source code, ensuring portability. - Little or no Roel> implementation burden for compiler writers. Having thought this over a bit more, I don't think it's a good idea. Allowed? Allowed for what? What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate. I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit). -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

On 6 April 2011 17:34, Colin Paul Adams
I forgot to CC the list:
"Roel" == Roel van Dijk
writes: Roel> I propose to make UTF-8 the only allowed encoding for Haskell Roel> source files. Implementations must discard an initial Byte Roel> Order Mark (BOM) if present [3].
Roel> * Pros - Ensures that Haskell source can be reliably exchanged Roel> on the byte level. - Disallows implicit ISO-8859-* encodings Roel> in source code, ensuring portability. - Little or no Roel> implementation burden for compiler writers.
Having thought this over a bit more, I don't think it's a good idea.
Allowed? Allowed for what?
Allowed to be called a Haskell file. If the report doesn't specify what a Haskell file is then we can't reliably exchange Haskell source files by only looking at the files themselves.
What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate.
Suppose I give you MyHaskellFile.hs. But before telling you how it's encoded I go gliding (a hobby of mine). Unfortunately I crash my glider and die :-(. Now what encoding option do you give to your front-end script?
I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit).
That would only allow users of Hackage and Cabal to reliably exchange their Haskell files. If we specify it in the report every user can benefit. Regards, Bas

"Bas" == Bas van Dijk
writes:
Bas> On 6 April 2011 17:34, Colin Paul Adams

On 6 April 2011 20:42, Colin Paul Adams
"Bas" == Bas van Dijk
writes: Bas> On 6 April 2011 17:34, Colin Paul Adams wrote: >> Allowed? Allowed for what? Bas> Allowed to be called a Haskell file. Well, what the report says on that is irrelevant. If I see a file containing Haskell code, I shall call it a Haskell file, irrespective. I suspect I will be in the majority.
It seems you have a problem with the word "allowed". What do you think of the interoperability guidelines as proposed by Duncan? They are less stringent while having the same intention as my original proposal.

"Roel" == Roel van Dijk
writes:
Roel> On 6 April 2011 20:42, Colin Paul Adams

Am 06.04.2011 20:02, schrieb Bas van Dijk:
On 6 April 2011 17:34, Colin Paul Adams
wrote: [...] I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit).
That would only allow users of Hackage and Cabal to reliably exchange their Haskell files. If we specify it in the report every user can benefit.
I agree that Haskell files should be UTF-8, but I also agree that it is only relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or higher. The motivation for this proposal can only be that future cabal packages will use more and more non-ASCII characters as is possible via http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\" as lambda symbol - probably because lambda is a letter and no symbol!) However, I think, these extra characters only make sense for corner cases and should not be recommended for general purposes. For nicer looking sources I would recommend special viewers or post-processors (like haddock or hscolour) that translate certain ASCII sequences to unicode points. So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8. Cheers Christian

Am 07.04.2011 11:29, schrieb Christian Maeder:
So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8.
This means all comments in haskell sources (for hackage) should be in English, exclusively! Supply separate documentation in your mother tongue if required. And I rather write out "Euro" or "Lambda" than trying to find the corresponding unicode character (and even in .tex sources ASCII sequences exist for those).
Cheers Christian

2011/4/7 Christian Maeder
Am 07.04.2011 11:29, schrieb Christian Maeder:
So my view is: Stick to ASCII and only if you must (not just for casual
reasons) use UTF-8.
This means all comments in haskell sources (for hackage) should be in English, exclusively! Supply separate documentation in your mother tongue if required.
This thread being about the encoding of haskell source files, not hackage's, I don't see the point in talking about restricting hackage's langage to English. - it is not the topic - it's already a de-facto standard anyways. On the other hand, not restricting the usage of any langage in haskell source files is IMHO a must, and it's not well supported as it is; for example haddock does't support accentuated letters in comments. This proposal gives a clear signal that utf8 characters have to be taken into account, and hopefully tools like haddock will evolve to support them thanks to this proposal.

On 7 April 2011 11:29, Christian Maeder
I agree that Haskell files should be UTF-8, but I also agree that it is only relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or higher.
It is relevant for all tools and systems which process Haskell sources.
The motivation for this proposal can only be that future cabal packages will use more and more non-ASCII characters as is possible via http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and LANGUAGE pragma "UnicodeSyntax" (that happens to have no support for "\" as lambda symbol - probably because lambda is a letter and no symbol!)
The motivation for this proposal is interoperability of all tools and systems which process Haskell source files. Perhaps I could have made that more clear.
However, I think, these extra characters only make sense for corner cases and should not be recommended for general purposes.
Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs I have many more like that. I do not consider Chinese a corner case. Nor the vast amount of languages which can not be represented using ASCII.
So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8.
When to use certain characters is not part of the proposal.

Am 07.04.2011 13:09, schrieb Roel van Dijk:
Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs
Great, that file made my firefox open infinitely many tabs (so that I had to close it). C.

"Christian" == Christian Maeder
writes:
Christian> Am 07.04.2011 13:09, schrieb Roel van Dijk: >> Please take a look at the following file: >> http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs Christian> Great, that file made my firefox open infinitely many Christian> tabs (so that I had to close it). On mine, it just launched Emacs to open the file (where it looked great). Note that I certainly agree with Roel on Chinese not being a corner case. (And my wife would certainly have something to say if I didn't, she being Chinese herself!) -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Am 07.04.2011 13:24, schrieb Christian Maeder:
Am 07.04.2011 13:09, schrieb Roel van Dijk:
Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs
Great, that file made my firefox open infinitely many tabs (so that I had to close it).
Well, my firefox had "use firefox" for "Haskell source code" (and failed for any .hs file) C.

Am 07.04.2011 13:09, schrieb Roel van Dijk:
Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs
The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points. C.

On 7 April 2011 15:03, Christian Maeder
The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points.
The code in question contains Chinese characters like '三', which in a US-ASCII encoded Haskell file must be written as '\x4e09'. I do not consider these escape sequences an acceptable substitute. But this discussion is tangential to the proposal. I am interested in having a common set of guidelines to ensure interoperability of Haskell sources. An important part of that is having a common method of decoding files containing Haskell code. The easiest way to achieve that is using only 1 encoding. UTF-8 is the best candidate for that role.
participants (5)
-
Bas van Dijk
-
Christian Maeder
-
Colin Paul Adams
-
David Virebayre
-
Roel van Dijk