Type system madness

Andrew Coppin

9 Jul 2007 9 Jul '07

8:05 p.m.

OK, can somebody explain to me *really slowly* exactly what the difference between an existential type and a rank-N type is? (I couldn't find much of use on the wiki. I have now in fact written some stuff there myself, but since I don't understand it in the first place, I'm having difficulty trying to explain it to anybody else...)

Show replies by date

Stefan O'Rear

9 Jul 9 Jul

8:23 p.m.

On Mon, Jul 09, 2007 at 09:05:55PM +0100, Andrew Coppin wrote:

...

OK, can somebody explain to me *really slowly* exactly what the difference between an existential type and a rank-N type is?

(I couldn't find much of use on the wiki. I have now in fact written some stuff there myself, but since I don't understand it in the first place, I'm having difficulty trying to explain it to anybody else...)

There isn't really such a thing as existential types. Rank-N types exist, but they are more of an implementation detail. All users should worry about is Quantifiers. A quantifier is an operator on types which defines a variable in some way. id has type :: ∀α. α → α This means that id has type Int → Int, Bool → Bool, [Char] → [Char], etc etc etc. FOR ALL toUpper (can) have type :: ∃α. α → α toUpper has ONE of Int → Int, Char → Char, etc etc etc. a type α EXISTS such that toUpper has type α → α. Yes, I know toUpper has a more specific type - bare with me, it was the best example I could think of. If you're at all familiar with mathematics logic, don't hesistate to apply your intuitions about forall and exists - type systems and logics really are the same things. If you have a value of existential type, you can only do things with it that you can do with any type, because you don't know the actual type. Existential types hide information from the users. If you have a value of universal type, you can do things with it as if it had any matching type of your choice, because it doesn't know and can't care about the actual use type. Universal types hide information from the implementors. In Haskell 98, existential quantification is not supported at all, and universal quantification is not first class - values can have universal types if and only if they are bound by let. You cannot pass universally typed values to functions. Stefan

Andrew Coppin

8:57 p.m.

Stefan O'Rear wrote:

...

All users should worry about is Quantifiers.

A quantifier is an operator on types which defines a variable in some way.

OK...

...

id has type :: ∀α. α → α

toUpper (can) have type :: ∃α. α → α

So... you're saying that id:: x -> x works for *every* possible choice of x, but toUpper :: x -> x works for *one* possible choice of x? (BTW... How in the hell do you get symbols like that in plain ASCII??)

...

If you're at all familiar with mathematics logic, don't hesistate to apply your intuitions about forall and exists - type systems and logics really are the same things.

I have wide interests in diverse areas of science, mathematics and computing, covering everything from cryptology to group theory to data compression - but formal logic is something I've never been able to bend my mind around. :-(

...

If you have a value of existential type, you can only do things with it that you can do with any type, because you don't know the actual type. Existential types hide information from the users.

If you have a value of universal type, you can do things with it as if it had any matching type of your choice, because it doesn't know and can't care about the actual use type. Universal types hide information from the implementors.

I stand in awe of people who actually understand what "universal" and "existential" actually mean... To me, these are just very big words that sound impressive. So, are you saying that if x is existential, it must work for any possible x, but if x is universal, I can choose what x is?

...

In Haskell 98, existential quantification is not supported at all, and universal quantification is not first class - values can have universal types if and only if they are bound by let. You cannot pass universally typed values to functions.

Erm...

Jonathan Cast

9:02 p.m.

On Monday 09 July 2007, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
All users should worry about is Quantifiers.

A quantifier is an operator on types which defines a variable in some way.

OK...

...
id has type :: ∀α. α → α

toUpper (can) have type :: ∃α. α → α

So... you're saying that id:: x -> x works for *every* possible choice of x, but toUpper :: x -> x works for *one* possible choice of x?

Remember the quantifiers! id :: forall x. x -> x works for every choice of x, but toUpper :: exists x. x -> x works for only one choice of x.

...

(BTW... How in the hell do you get symbols like that in plain ASCII??)

...
If you're at all familiar with mathematics logic, don't hesistate to apply your intuitions about forall and exists - type systems and logics really are the same things.

I have wide interests in diverse areas of science, mathematics and computing, covering everything from cryptology to group theory to data compression - but formal logic is something I've never been able to bend my mind around. :-(

...
If you have a value of existential type, you can only do things with it that you can do with any type, because you don't know the actual type. Existential types hide information from the users.

If you have a value of universal type, you can do things with it as if it had any matching type of your choice, because it doesn't know and can't care about the actual use type. Universal types hide information from the implementors.

I stand in awe of people who actually understand what "universal" and "existential" actually mean... To me, these are just very big words that sound impressive.

So, are you saying that if x is existential, it must work for any possible x, but if x is universal, I can choose what x is?

As the consumer of the value. For the producer, it works the other way around.

...

...
In Haskell 98, existential quantification is not supported at all, and universal quantification is not first class - values can have universal types if and only if they are bound by let. You cannot pass universally typed values to functions.

Erm...

Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Martin Percossi

11 Jul 11 Jul

5:17 p.m.

Jonathan Cast wrote:

...

toUpper :: exists x. x -> x works for only one choice of x.

Are you sure that's not: "toUpper :: exists x. x -> x works for *at least one* choice of x" ? I'm not sure about the "haskell" meaning, but the "logic" meaning is definitely this. For example: forall x:Integer. 4*x is even <=> all multiples of four are even -- duh! exists x:Integer. 4*x is even <=> it's possible to find a multiple of four that is even -- MEGA DUH!: we know that ALL multiples of four are even, so obviously it's possible to find AT LEAST ONE that's even: *any* one, in fact! It obviously doesn't change any consequences that Stefan draws: namely that that the user of a value of that type is not entitled to assume anything (i.e. any interface) about the value -- he only knows that such a type exists. Regards, Martin Please check out my music: http://www.youtube.com/user/thetonegrove

Jonathan Cast

8:13 p.m.

On Wednesday 11 July 2007, Martin Percossi wrote:

...

Jonathan Cast wrote:

...
toUpper :: exists x. x -> x works for only one choice of x.

Are you sure that's not:

"toUpper :: exists x. x -> x works for *at least one* choice of x"

Not quite. When you give a constructive proof of exists x. x -> x, you only prove it at one value of x, and a value of type exists x. x -> x is just such a proof. When you go and use that proof, you can thus only use it at one type. Thus, properly speaking, a value of type exists x. x -> x should be thought of as a pair of a type x and a (monomorphic) function of type x -> x. So when you eliminate the existential quantifier, you get a function of type x -> x for precisely one (unknown) type. That type is the same every time, in fact, although the compiler won't let you use this fact (doing so would turn the existential quantifier into a dependent sum). <snip> Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Stefan O'Rear

9 Jul 9 Jul

9:25 p.m.

On Mon, Jul 09, 2007 at 09:57:14PM +0100, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
id has type :: ∀α. α → α

toUpper (can) have type :: ∃α. α → α

So... you're saying that id:: x -> x works for *every* possible choice of x, but toUpper :: x -> x works for *one* possible choice of x?

[JonCast answered this one]

...

(BTW... How in the hell do you get symbols like that in plain ASCII??)

You can't, but the most commonly used replacement for ASCII (Unicode-UTF8) supports them just fine. As for actually *entering* the characters, I have a file with the code numbers of the characters I use most often: 039B Λ big lambda 03BB λ little lambda 2203 ∃ existensial quant 2200 ∀ universal quant 2192 → right arrow 03B2 β beta 22A5 ⊥ bottom 00F6 ö o-umlaut (alpha isn't on there, but I guessed (correctly) it would be right before beta)

...

...
If you're at all familiar with mathematics logic, don't hesistate to apply your intuitions about forall and exists - type systems and logics really are the same things.

I have wide interests in diverse areas of science, mathematics and computing, covering everything from cryptology to group theory to data compression - but formal logic is something I've never been able to bend my mind around. :-(

Don't worry - you can understand the material equally well from either direction. Personally I didn't really understand logic until seeing type systems and then the Curry-Howard isomorphism (types are propositions, programs are proofs).

...

...
If you have a value of existential type, you can only do things with it that you can do with any type, because you don't know the actual type. Existential types hide information from the users.

If you have a value of universal type, you can do things with it as if it had any matching type of your choice, because it doesn't know and can't care about the actual use type. Universal types hide information from the implementors.

I stand in awe of people who actually understand what "universal" and "existential" actually mean... To me, these are just very big words that sound impressive.

So, are you saying that if x is existential, it must work for any possible x, but if x is universal, I can choose what x is?

[JonCast answered this one]

...

...
In Haskell 98, existential quantification is not supported at all, and universal quantification is not first class - values can have universal types if and only if they are bound by let. You cannot pass universally typed values to functions.

Erm...

Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form: ST s α Meaning that they return a value of type α, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread". Now, the type of the function used to escape ST is: runST :: ∀ α. (∀ s. ST s α) → α The action you pass must be universal in s, so inside your action you don't know what thread, thus you cannot access any other threads, thus runST is pure. This is very useful, since it allows you to implement externally pure things like in-place quicksort, and present them as pure functions ∀ e. Ord e ⇒ Array e → Array e; without using any unsafe functions. But that type of runST is illegal in Haskell-98, because it needs a universal quantifier *inside* the function-arrow! In the jargon, that type has rank 2; haskell 98 types may have rank at most 1. Stefan

Andrew Coppin

10 Jul 10 Jul

7:02 p.m.

Stefan O'Rear wrote:

...

On Mon, Jul 09, 2007 at 09:57:14PM +0100, Andrew Coppin wrote:

...
(BTW... How in the hell do you get symbols like that in plain ASCII??)

You can't, but the most commonly used replacement for ASCII (Unicode-UTF8) supports them just fine.

Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

...

Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form:

ST s α

Meaning that they return a value of type α, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread".

...so *that* is what that thing does...! (I thought it did something quite different.)

...

Now, the type of the function used to escape ST is:

runST :: ∀ α. (∀ s. ST s α) → α

The action you pass must be universal in s, so inside your action you don't know what thread, thus you cannot access any other threads, thus runST is pure. This is very useful, since it allows you to implement externally pure things like in-place quicksort, and present them as pure functions ∀ e. Ord e ⇒ Array e → Array e; without using any unsafe functions.

...so the 's' doesn't really "exist", it's just random hackery of the type system to implement uniqueness?

...

But that type of runST is illegal in Haskell-98, because it needs a universal quantifier *inside* the function-arrow! In the jargon, that type has rank 2; haskell 98 types may have rank at most 1.

...kinda wishing I hadn't asked... o_O

Jonathan Cast

7:18 p.m.

On Tuesday 10 July 2007, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
On Mon, Jul 09, 2007 at 09:57:14PM +0100, Andrew Coppin wrote:

...
(BTW... How in the hell do you get symbols like that in plain ASCII??)

You can't, but the most commonly used replacement for ASCII (Unicode-UTF8) supports them just fine.

Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

Are you serious? Unicode has been a (more-or-less) working reality on Linux for several years now. . .

...

...
Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form:

ST s α

Meaning that they return a value of type α, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread".

...so *that* is what that thing does...! (I thought it did something quite different.)

...
Now, the type of the function used to escape ST is:

runST :: ∀ α. (∀ s. ST s α) → α

The action you pass must be universal in s, so inside your action you don't know what thread, thus you cannot access any other threads, thus runST is pure. This is very useful, since it allows you to implement externally pure things like in-place quicksort, and present them as pure functions ∀ e. Ord e ⇒ Array e → Array e; without using any unsafe functions.

...so the 's' doesn't really "exist", it's just random hackery of the type system to implement uniqueness?

Exactly.

...

...
But that type of runST is illegal in Haskell-98, because it needs a universal quantifier *inside* the function-arrow! In the jargon, that type has rank 2; haskell 98 types may have rank at most 1.

...kinda wishing I hadn't asked... o_O

Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Andrew Coppin

7:49 p.m.

Jonathan Cast wrote:

...

On Tuesday 10 July 2007, Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

Are you serious? Unicode has been a (more-or-less) working reality on Linux for several years now. . .

Last time I looked, everything treats "text" as being 8 bits per character. (Or, more commonly, 7, and if the MSB isn't 0, weird things happen...) That's why (for example) HTML has lots of weird constructs such as "…" in it, instead of just typing in the actual character you want. (And let's be clear here: SGML and all those decendents are all using "<" and ">" - the mathematical greater and less operations - when what they *really* mean are angle brackets, a quite distinct glyph.) Last time I checked, nobody was keen on using 64 bits per character...

...

...
...so the 's' doesn't really "exist", it's just random hackery of the type system to implement uniqueness?

Exactly.

Hmm. Like the IO monad's RealWorld object, which isn't really there? Say, maybe what this means is that in fact there IS no spoon, and it is really YOU that bends? (Or at least, your mind...)

Alex Queiroz

7:53 p.m.

Hallo, On 7/10/07, Andrew Coppin wrote:

...

Last time I looked, everything treats "text" as being 8 bits per character. (Or, more commonly, 7, and if the MSB isn't 0, weird things happen...) That's why (for example) HTML has lots of weird constructs such as "…" in it, instead of just typing in the actual character you want. (And let's be clear here: SGML and all those decendents are all using "<" and ">" - the mathematical greater and less operations - when what they *really* mean are angle brackets, a quite distinct glyph.) Last time I checked, nobody was keen on using 64 bits per character...

You must look out more. I use áéíóúç in web pages all the time. -- -alex http://www.ventonegro.org/

Alexis Hazell

11 Jul 11 Jul

4:02 a.m.

On Wednesday 11 July 2007 05:49, Andrew Coppin wrote:

...

Last time I checked, nobody was keen on using 64 bits per character...

Hence the UTF-8 encoding: http://en.wikipedia.org/wiki/Utf-8 Alexis.

Bulat Ziganshin

10:44 a.m.

New subject: Re[2]: Type system madness

Hello Andrew, Tuesday, July 10, 2007, 11:49:37 PM, you wrote:

...

...
...
...so the 's' doesn't really "exist", it's just random hackery of the type system to implement uniqueness?

Exactly.

...

Hmm. Like the IO monad's RealWorld object, which isn't really there?

ST and IO monads are the same beast. in ST, s is free to allow to create endless amount of independent threads while in IO it fixed to one type and describes evolution of one thread, synchronized with real world. look at http://haskell.org/haskellwiki/IO_inside for info about IO monad trickery -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Andrew Coppin

7:07 p.m.

Bulat Ziganshin wrote:

...

Hello Andrew,

...
Hmm. Like the IO monad's RealWorld object, which isn't really there?

ST and IO monads are the same beast. in ST, s is free to allow to create endless amount of independent threads while in IO it fixed to one type and describes evolution of one thread, synchronized with real world. look at http://haskell.org/haskellwiki/IO_inside for info about IO monad trickery

OMG! stToIO exists...!

Lennart Augustsson

8:13 p.m.

New subject: Re[2]: Type system madness

Well, Haskell defines the IO type to be abstract, so if IO and ST happen to be the same it's implementation dependent. -- Lennart On 7/11/07, Bulat Ziganshin wrote:

...

Hello Andrew,

Tuesday, July 10, 2007, 11:49:37 PM, you wrote:

...
...
...
...so the 's' doesn't really "exist", it's just random hackery of the type system to implement uniqueness?

Exactly.

...
Hmm. Like the IO monad's RealWorld object, which isn't really there?

ST and IO monads are the same beast. in ST, s is free to allow to create endless amount of independent threads while in IO it fixed to one type and describes evolution of one thread, synchronized with real world. look at http://haskell.org/haskellwiki/IO_inside for info about IO monad trickery

-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jonathan Cast

8:37 p.m.

On Wednesday 11 July 2007, Lennart Augustsson wrote:

...

Well, Haskell defines the IO type to be abstract, so if IO and ST happen to be the same it's implementation dependent.

And if IO uses a RealWorld type, that's implementation dependent too. But it's still useful to understand both RealWorld as used by IO and the same mechanism as used by ST. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Lennart Augustsson

9:21 p.m.

Yes, that's one way to define IO. But it's not the only way. On 7/11/07, Jonathan Cast wrote:

...

On Wednesday 11 July 2007, Lennart Augustsson wrote:

...
Well, Haskell defines the IO type to be abstract, so if IO and ST happen to be the same it's implementation dependent.

And if IO uses a RealWorld type, that's implementation dependent too. But it's still useful to understand both RealWorld as used by IO and the same mechanism as used by ST.

Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jonathan Cast

9:30 p.m.

On Wednesday 11 July 2007, you wrote:

...

Yes, that's one way to define IO. But it's not the only way.

Right. Aren't we saying the same thing? I mean, sure, the *one true way* to define IO is data IO alpha = ReturnIO alpha | JoinAtomically (STM (IO alpha)) | HOpenBind String (Handle -> IO alpha) | HCloseThen Handle (IO alpha) | HPutThen Handle Char (IO alpha) | HGetBind Handle (Char -> IO alpha) | ForkIOBind (IO ()) (ThreadId -> IO alpha) | UnsafeInterleaveIO (IO (IO alpha)) | ... but it's still reasonable to explain that GHC doesn't do it that way and that *in GHC* newtype IO alpha = IO (State# RealWorld -> (# alpha, State# RealWorld #)) newtype ST s alpha = ST (State# (STState s) -> (# alpha, State# (STState s) #)) no? I don't see your objection to it. Especially if it causes light bulbs to go off over people's heads. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Henning Thielemann

12 Jul 12 Jul

7:12 a.m.

On Tue, 10 Jul 2007, Jonathan Cast wrote:

...

On Tuesday 10 July 2007, Andrew Coppin wrote:

...
Stefan O'Rear wrote:

...
...
Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form:

ST s α

Meaning that they return a value of type α, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread".

What about putting the runST monad explanation to the Wiki? It seems to be an FGA (frequently given answer). :-)

Stefan O'Rear

7:18 a.m.

On Thu, Jul 12, 2007 at 09:12:14AM +0200, Henning Thielemann wrote:

...

On Tue, 10 Jul 2007, Jonathan Cast wrote:

...
On Tuesday 10 July 2007, Andrew Coppin wrote:

...
Stefan O'Rear wrote:

...
...
Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form:

ST s α

Meaning that they return a value of type α, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread".

What about putting the runST monad explanation to the Wiki? It seems to be an FGA (frequently given answer). :-)

I think it already is, in the Research Papers section. :-) Stefan

Henning Thielemann

1:50 p.m.

On Thu, 12 Jul 2007, Stefan O'Rear wrote:

...

On Thu, Jul 12, 2007 at 09:12:14AM +0200, Henning Thielemann wrote:

...
On Tue, 10 Jul 2007, Jonathan Cast wrote:

...
On Tuesday 10 July 2007, Andrew Coppin wrote:

...
Stefan O'Rear wrote:

...
...
Consider the ST monad, which lets you use update-in-place, but is escapable (unlike IO). ST actions have the form:

ST s Î±

Meaning that they return a value of type Î±, and execute in "thread" s. All reference types are tagged with the thread, so that actions can only affect references in their own "thread".

What about putting the runST monad explanation to the Wiki? It seems to be an FGA (frequently given answer). :-)

I think it already is, in the Research Papers section. :-)

I put your paragraph there: http://www.haskell.org/haskellwiki/Monad/ST

Albert Y. C. Lai

10 Jul 10 Jul

8:05 p.m.

Andrew Coppin wrote:

...

Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one". Windows has been well supporting Unicode since 2000. That is pretty much of the real world. The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly. The whole scheme works so well and so transparently that you didn't even notice it. "No one notices when things are right." Alex Queiroz wrote:

...

You must look out more. I use áéíóúç in web pages all the time.

I even use Chinese. (And no, not those big5 or gb2312 funny business.)

Andrew Coppin

8:13 p.m.

Albert Y. C. Lai wrote:

...

Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one".

Windows has been well supporting Unicode since 2000. That is pretty much of the real world.

The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly.

The whole scheme works so well and so transparently that you didn't even notice it.

"No one notices when things are right."

That is, indeed, impressive.

...

Alex Queiroz wrote:

...
You must look out more. I use áéíóúç in web pages all the time.

I even use Chinese. (And no, not those big5 or gb2312 funny business.)

Interesting... I tried to put a pound sign on my web page, and it came out garbled, so I had to replace it with "£"... (BTW, I always wondered how the Asian and Chinese people do any work with computers, given that the ASCII character set doesn't even include any characters in their alphabet...)

Hugh Perkins

8:24 p.m.

We can consider three "families" of character sets: - ASCII: 127 characters, some of which are escape codes like "bell" etc - regional encodings: china uses GB2312, Europe uses ISO-8859-1, America uses ... something - unicode: UTF-8, UTF-16 The regional encodings are optimized for their region, and they only support characters from their own region, so the chinese character set (GB2312) contains all the chinese characters, and the english letters, but it doesnt contain for example French characters like é or ç. Similarly ISO-8859-1 contains the characters for all the european langauges (I think), but it doesnt contain the Chinese characters. Unicode contains the characters from *all* the worlds languages combined. UTF-16 encodes this uses 2 or more bytes. UTF-8 encodes this using 1 or more bytes. Basically the characters 0-127 are identical between ASCII and UTF-8, then numbers from 128 onwards are a flag to say that you need to read another byte or so to get the full information to know the character (something like that). UTF-16 kindof sucks because its not compatible with ASCII, and it uses twice as many bytes for English characters. On the other hand its what Windows NT uses. UTF-8 is compatible with ASCII, but it can use more bytes to encode the data for certain non-English characters than UTF-16. On 7/10/07, Andrew Coppin wrote:

...

(BTW, I always wondered how the Asian and Chinese people do any work with computers, given that the ASCII character set doesn't even include any characters in their alphabet...)

Paul Moore

11 Jul 11 Jul

8:53 a.m.

On 10/07/07, Andrew Coppin wrote:

...

Interesting... I tried to put a pound sign on my web page, and it came out garbled, so I had to replace it with "£"...

You may need to specify a "content encoding" in the HTML header. For that, you need to know the encoding your HTML file is saved in. Unicode works fine, but encodings can be a bit of a minefield... Paul.

Albert Y. C. Lai

4:40 p.m.

Paul Moore wrote:

...

On 10/07/07, Andrew Coppin wrote:

...
Interesting... I tried to put a pound sign on my web page, and it came out garbled, so I had to replace it with "£"...

You may need to specify a "content encoding" in the HTML header. For that, you need to know the encoding your HTML file is saved in. Unicode works fine, but encodings can be a bit of a minefield...

Lest I am painted as unhelpful(*), http://www.vex.net/~trebla/u.html exemplifies what can be done and how to do it. In particular, you must always specify a content encoding in the HTML header, and you must always order your editor to write out UTF-8. (*) Whatever happened to the good old spirit of just saying "RTFM"?

Andrew Coppin

7:10 p.m.

Albert Y. C. Lai wrote:

...

Lest I am painted as unhelpful(*), http://www.vex.net/~trebla/u.html exemplifies what can be done and how to do it. In particular, you must always specify a content encoding in the HTML header, and you must always order your editor to write out UTF-8.

When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

Alex Queiroz

7:23 p.m.

Hallo, On 7/11/07, Andrew Coppin wrote:

...

When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

Are you sure it's not UTF-16? Cheers, -- -alex http://www.ventonegro.org/

Brandon S. Allbery KF8NH

10:33 p.m.

On Jul 11, 2007, at 15:23 , Alex Queiroz wrote:

...

On 7/11/07, Andrew Coppin wrote:

...
When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

Are you sure it's not UTF-16?

GNOME's gedit, for one, has a tendency to put byte order marks at the beginning of every line in UTF8 mode. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Albert Y. C. Lai

10:52 p.m.

Brandon S. Allbery KF8NH wrote:

...

GNOME's gedit, for one, has a tendency to put byte order marks at the beginning of every line in UTF8 mode.

Somehow I have never got a single BOM. My http://www.vex.net/~trebla/u.html was written out by GNOME gedit. Version 2.14.4.

Brandon S. Allbery KF8NH

10:59 p.m.

On Jul 11, 2007, at 18:52 , Albert Y. C. Lai wrote:

...

Brandon S. Allbery KF8NH wrote:

...
GNOME's gedit, for one, has a tendency to put byte order marks at the beginning of every line in UTF8 mode.

Somehow I have never got a single BOM. My http://www.vex.net/ ~trebla/u.html was written out by GNOME gedit. Version 2.14.4.

Hm. Might be the version (it's been a year or so since I used it) or the fact that I was in fact using mixed direction text at the time. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Steve Schafer

7:32 p.m.

On Wed, 11 Jul 2007 20:10:00 +0100, you wrote:

...

When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

Which means that your processor doesn't properly understand UTF-8. A BOM character isn't required for UTF-8 (it really only makes sense with UTF-16), but a UTF-8-aware processor should skip right over it if it's there. Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/

Albert Y. C. Lai

9:15 p.m.

Andrew Coppin wrote:

...

When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

I know Windows Notepad puts a BOM at the beginning of UTF-8 files. http://www.vex.net/~trebla/w.htm is written out by Notepad and has the beginning BOM. Firefox and IE display it just fine. Windows Notepad, GNOME gedit, Emacs, Vim, and Eclipse are also very graceful about it. If you rename it to w.lhs, GHC reads it as a fine Haskell source file, as I sneaked in a little Haskell hello-world as an HTML comment, e.g., "runghc w.lhs" does wonder. So much for BOM foiling any processing. Any more FUD to debunk? Wanna hear something about purely functional languages incapacitated for I/O? Static typing leading to excessive type declarations? Automatic garbage collection irrelevant to the real world?

Andrew Coppin

12 Jul 12 Jul

6:01 p.m.

Albert Y. C. Lai wrote:

...

Andrew Coppin wrote:

...
When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

I know Windows Notepad puts a BOM at the beginning of UTF-8 files. http://www.vex.net/~trebla/w.htm is written out by Notepad and has the beginning BOM. Firefox and IE display it just fine. Windows Notepad, GNOME gedit, Emacs, Vim, and Eclipse are also very graceful about it. If you rename it to w.lhs, GHC reads it as a fine Haskell source file, as I sneaked in a little Haskell hello-world as an HTML comment, e.g., "runghc w.lhs" does wonder. So much for BOM foiling any processing.

Any more FUD to debunk? Wanna hear something about purely functional languages incapacitated for I/O? Static typing leading to excessive type declarations? Automatic garbage collection irrelevant to the real world?

Let me put it this way: It makes all my Tcl scripts stop working, and it makes my Haskell-based processor go nuts too...

Philip Armstrong

8:24 p.m.

On Thu, Jul 12, 2007 at 07:01:31PM +0100, Andrew Coppin wrote:

...

Let me put it this way: It makes all my Tcl scripts stop working, and it makes my Haskell-based processor go nuts too...

Given that (IIRC) the BOM is just a valid unicode non-breaking space, your scripts really ought to cope... Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Philip Armstrong

8:51 p.m.

On Thu, Jul 12, 2007 at 09:24:24PM +0100, Philip Armstrong wrote:

...

On Thu, Jul 12, 2007 at 07:01:31PM +0100, Andrew Coppin wrote:

...
Let me put it this way: It makes all my Tcl scripts stop working, and it makes my Haskell-based processor go nuts too...

Given that (IIRC) the BOM is just a valid unicode non-breaking space, your scripts really ought to cope...

Oh wait, is the problem that the scripts are expecting ascii, and are breaking on the non-breaking space? That makes a certain amount of (annoying) sense. Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Steve Schafer

8:58 p.m.

On Thu, 12 Jul 2007 21:24:24 +0100, you wrote:

...

Given that (IIRC) the BOM is just a valid unicode non-breaking space, your scripts really ought to cope...

Choking on the BOM is probably just a symptom of a deeper problem. My bet is that removing the BOM would simply delay the failure until the first non-ASCII character was encountered. Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/

Philip Armstrong

9:20 p.m.

On Thu, Jul 12, 2007 at 04:58:43PM -0400, Steve Schafer wrote:

...

On Thu, 12 Jul 2007 21:24:24 +0100, you wrote:

...
Given that (IIRC) the BOM is just a valid unicode non-breaking space, your scripts really ought to cope...

Choking on the BOM is probably just a symptom of a deeper problem. My bet is that removing the BOM would simply delay the failure until the first non-ASCII character was encountered.

Indeed. However, I can imagine that the author might well want to use unicode characters in string literals and comments, where they would be entirely inocuous (since a utf-8 string is a valid ascii string) but the BOM at the beginning of the file breaks things. Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Ketil Malde

8:04 a.m.

On Wed, 2007-07-11 at 20:10 +0100, Andrew Coppin wrote:

...

When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy. -k

Andrew Coppin

6:15 p.m.

Ketil Malde wrote:

...

On Wed, 2007-07-11 at 20:10 +0100, Andrew Coppin wrote:

...
When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

Yay! Haskell's text I/O system is buggy. :-P

dons＠cse.unsw.edu.au

13 Jul 13 Jul

12:48 a.m.

andrewcoppin:

...

Ketil Malde wrote:

...
On Wed, 2007-07-11 at 20:10 +0100, Andrew Coppin wrote:

...
When I tell the editor to save UTF-8, it inserts some weird "BOM" character at the start of the file - and thus, any attempt at programatically processing that file instantly fails. :-(

While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

Yay! Haskell's text I/O system is buggy. :-P

By the way Andrew, have you noticed that you're generating 50% of the traffic on this list? Perhaps we can work a bit more on improving the signal/noise ratio. My inbox can only take so much of this... ;) -- Don

Brandon S. Allbery KF8NH

1:10 a.m.

On Jul 12, 2007, at 20:48 , Donald Bruce Stewart wrote:

...

By the way Andrew, have you noticed that you're generating 50% of the traffic on this list? Perhaps we can work a bit more on improving the signal/noise ratio. My inbox can only take so much of this... ;)

I can blather more, if you'd like.... (Hey, this mailing list is more interesting than 85% of the non-spam in my inbox :) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Andrew Coppin

6:52 p.m.

Donald Bruce Stewart wrote:

...

By the way Andrew, have you noticed that you're generating 50% of the traffic on this list? Perhaps we can work a bit more on improving the signal/noise ratio. My inbox can only take so much of this... ;)

o_O My God... even the Haskell mailing list is complaining I talk about Haskell too much... That's *advanced*! (Everybody in the main forum I inhabit has now taken to auto-deleting any post that mentions the word "Haskell". Except for Mr C++, who seems to seek out such threads so he can tell me how superior C++ is to Haskell...) Oh well, the problem is easily fixed... *sigh*

Stefan O'Rear

7:16 p.m.

On Fri, Jul 13, 2007 at 07:52:01PM +0100, Andrew Coppin wrote:

...

Donald Bruce Stewart wrote:

...
By the way Andrew, have you noticed that you're generating 50% of the traffic on this list? Perhaps we can work a bit more on improving the signal/noise ratio. My inbox can only take so much of this... ;)

o_O

My God... even the Haskell mailing list is complaining I talk about Haskell too much... That's *advanced*!

Don does not speak for the whole community, I for one am fine with answering all these questions :) Specifically, Don really wants you to get off of the mailing list and ask all these questions on IRC; he's been trying to hint this to you for a while. My extreme social cluelessness allows me to explain it to your face. (not that I agree with him; IRC is not well suited to things requiring long explanations like "what is a quantifier")

...

(Everybody in the main forum I inhabit has now taken to auto-deleting any post that mentions the word "Haskell". Except for Mr C++, who seems to seek out such threads so he can tell me how superior C++ is to Haskell...)

Try not to care what other people think. Stefan

Andrew Coppin

8:22 p.m.

Stefan O'Rear wrote:

...

Don does not speak for the whole community, I for one am fine with answering all these questions :)

I guess when somebody as important as Don says something, you take notice...

...

Specifically, Don really wants you to get off of the mailing list and ask all these questions on IRC; he's been trying to hint this to you for a while. My extreme social cluelessness allows me to explain it to your face. (not that I agree with him; IRC is not well suited to things requiring long explanations like "what is a quantifier")

True. And IRC also has the limitation that you only access the people who are logged in right *now*... On the other hand, IRC has the advantage that if somebody goes into a long explanation and you didn't really understand step #2 there, you can ask them to stop and go back to that part without having them write a huge explanation that you won't understand anyway... so they both have advantages. (And then there was that time I tried to use a release of Gtk2hs which, unknown to me, was brand new and had a small installer glitch in it... Mr Coutts was most helpful on that one.) And let's not forget Lambdabot... LOL!

...

...
(Everybody in the main forum I inhabit has now taken to auto-deleting any post that mentions the word "Haskell". Except for Mr C++, who seems to seek out such threads so he can tell me how superior C++ is to Haskell...)

Try not to care what other people think.

LOL! If only that were in fact physically possible... Is it OK to quote xkcd in responce to this one? http://www.xkcd.com/c154.html ;-)

Jonathan Cast

14 Jul 14 Jul

2:18 a.m.

On Friday 13 July 2007, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
Don does not speak for the whole community, I for one am fine with answering all these questions :)

I guess when somebody as important as Don says something, you take notice...

...
Specifically, Don really wants you to get off of the mailing list and ask all these questions on IRC; he's been trying to hint this to you for a while. My extreme social cluelessness allows me to explain it to your face. (not that I agree with him; IRC is not well suited to things requiring long explanations like "what is a quantifier")

True. And IRC also has the limitation that you only access the people who are logged in right *now*...

On the other hand, IRC has the advantage that if somebody goes into a long explanation and you didn't really understand step #2 there, you can ask them to stop and go back to that part without having them write a huge explanation that you won't understand anyway... so they both have advantages. (And then there was that time I tried to use a release of Gtk2hs which, unknown to me, was brand new and had a small installer glitch in it... Mr Coutts was most helpful on that one.) And let's not forget Lambdabot... LOL!

...
...
(Everybody in the main forum I inhabit has now taken to auto-deleting any post that mentions the word "Haskell". Except for Mr C++, who seems to seek out such threads so he can tell me how superior C++ is to Haskell...)

Try not to care what other people think.

LOL! If only that were in fact physically possible...

Why not? I do it all the time...

...

Is it OK to quote xkcd in responce to this one?

It's always OK to quote xkcd :)

...

http://www.xkcd.com/c154.html

;-)

Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Andrew Coppin

6:17 p.m.

Jonathan Cast wrote:

...

On Friday 13 July 2007, Andrew Coppin wrote:

...
Stefan O'Rear wrote:

...
Try not to care what other people think.

LOL! If only that were in fact physically possible...

Why not? I do it all the time...

Clearly you don't know me... I spend 80% of my life worrying about what everybody else thinks. :-S But THAT is a whole OTHER topic - and very off-topic here. ;-)

...

...
Is it OK to quote xkcd in responce to this one?

It's always OK to quote xkcd :)

:-D

Bulat Ziganshin

13 Jul 13 Jul

6:49 a.m.

New subject: Re[2]: Type system madness

Hello Andrew, Thursday, July 12, 2007, 10:15:00 PM, you wrote:

...

...
While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

...

Yay! Haskell's text I/O system is buggy. :-P

definitely. for example, on windows it doesn't support unicode filenames nor files bigger than 4gb, so i use my own lib, a thin layer around Windows API -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Andrew Coppin

7:01 p.m.

Bulat Ziganshin wrote:

...

Hello Andrew,

...
Yay! Haskell's text I/O system is buggy. :-P

definitely. for example, on windows it doesn't support unicode filenames nor files bigger than 4gb

...OK, that's quite worrying...

...

so i use my own lib, a thin layer around Windows API

Has a bug been reported for this? Have you (or anyone else) thought about offering up code to fix it?

Bulat Ziganshin

14 Jul 14 Jul

7:16 p.m.

New subject: Re[2]: Type system madness

Hello Andrew, Friday, July 13, 2007, 11:01:24 PM, you wrote:

...

...
definitely. for example, on windows it doesn't support unicode filenames nor files bigger than 4gb so i use my own lib, a thin layer around Windows API

...

Has a bug been reported for this? Have you (or anyone else) thought about offering up code to fix it?

yes, i developed alternative i/o library which solves one of these problems and made a plan of development wider i/o libarry which solves second one too (not only for i/o but for all filesystem-related calls): http://haskell.org/haskellwiki/Library/Streams http://haskell.org/haskellwiki/Library/IO -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Ketil Malde

13 Jul 13 Jul

8:26 a.m.

On Thu, 2007-07-12 at 19:15 +0100, Andrew Coppin wrote:

...

...
While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

...

Yay! Haskell's text I/O system is buggy. :-P

Works for me, but feel free to file a bug or provide a more specific report. -k

Stefan O'Rear

4:36 p.m.

On Fri, Jul 13, 2007 at 10:26:38AM +0200, Ketil Malde wrote:

...

On Thu, 2007-07-12 at 19:15 +0100, Andrew Coppin wrote:

...
...
While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

...
Yay! Haskell's text I/O system is buggy. :-P

Works for me, but feel free to file a bug or provide a more specific report.

He's not trying to report a bug; he's just complaining about base's long-known lack of support for non-latin1 encodings. (IIUC) Stefan

Aaron Denney

5:36 p.m.

On 2007-07-13, Stefan O'Rear wrote:

...

He's not trying to report a bug; he's just complaining about base's long-known lack of support for non-latin1 encodings. (IIUC)

Which is a bug. Base needs to support (in an /obvious/ way) (1) direct I/O of octets (bytes), with no character interpretation set (2) I/O of text in UTF-8. In addition, it would be nice to support (3) (On Unix) use of locale to determine text encoding but users can work around this themselves, and will often need to, even if (3) is supported. (2) can also be layered atop (1), but something is wrong if you have to write your own layer to do simple text input and output. It's even worse if you can't without going to the FFI. (1) can currently be done, but it's not at all clear how to do so, or once you have figured out how to do so, why it works. (This may be a bit out of date, but seeing this brought up again, I think not.) -- Aaron Denney -><-

dons＠cse.unsw.edu.au

14 Jul 14 Jul

1:45 a.m.

wnoise:

...

On 2007-07-13, Stefan O'Rear wrote:

...
He's not trying to report a bug; he's just complaining about base's long-known lack of support for non-latin1 encodings. (IIUC)

Which is a bug. Base needs to support (in an /obvious/ way) (1) direct I/O of octets (bytes), with no character interpretation set

Data.ByteString

...

(2) I/O of text in UTF-8.

not in base, but see utf8-string on hackage.haskell.org.

...

In addition, it would be nice to support (3) (On Unix) use of locale to determine text encoding but users can work around this themselves, and will often need to, even

Hmm, there's System.Locale, but I've not used it for anything other than dates.

...

if (3) is supported.

(2) can also be layered atop (1), but something is wrong if you have to write your own layer to do simple text input and output. It's even worse if you can't without going to the FFI.

Yes, there's been a few encoding layers on top of Data.ByteString written for other non-latin1 encodings.

...

(1) can currently be done, but it's not at all clear how to do so, or once you have figured out how to do so, why it works.

(This may be a bit out of date, but seeing this brought up again, I think not.)

I think its a little out of date, given Data.ByteString and utf8-string? -- Don

Aaron Denney

2:10 a.m.

On 2007-07-14, Donald Bruce Stewart wrote:

...

wnoise:

...
On 2007-07-13, Stefan O'Rear wrote:

...
He's not trying to report a bug; he's just complaining about base's long-known lack of support for non-latin1 encodings. (IIUC)

Which is a bug. Base needs to support (in an /obvious/ way) (1) direct I/O of octets (bytes), with no character interpretation set

Data.ByteString

And does this work for Non-GHC yet? And when does it get added to Haskell' and guaranteed to work?

...

...
(2) I/O of text in UTF-8.

not in base, but see utf8-string on hackage.haskell.org.

Yes, this a decent layering of (2), on top of (1), for GHC only, depending on it to reading the bytes, and interpreting them as Latin-1.

...

...
(1) can currently be done, but it's not at all clear how to do so, or once you have figured out how to do so, why it works.

(This may be a bit out of date, but seeing this brought up again, I think not.)

I think its a little out of date, given Data.ByteString and utf8-string?

It's not obvious that ByteString is the place to look for I/O, so it's not yet good enough. It should be as easy to use as character I/O, and as easy to find. -- Aaron Denney -><-

dons＠cse.unsw.edu.au

2:15 a.m.

wnoise:

...

On 2007-07-14, Donald Bruce Stewart wrote:

...
wnoise:

...
On 2007-07-13, Stefan O'Rear wrote:

...
He's not trying to report a bug; he's just complaining about base's long-known lack of support for non-latin1 encodings. (IIUC)

Which is a bug. Base needs to support (in an /obvious/ way) (1) direct I/O of octets (bytes), with no character interpretation set

Data.ByteString

And does this work for Non-GHC yet? And when does it get added to Haskell' and guaranteed to work?

Yes, Data.ByteString is available for GHC, Hugs and nhc98. Unsure about YHC, but it wouldn't be hard presuming the FFI support is up to speed.

...

...
...
(2) I/O of text in UTF-8.

not in base, but see utf8-string on hackage.haskell.org.

Yes, this a decent layering of (2), on top of (1), for GHC only, depending on it to reading the bytes, and interpreting them as Latin-1.

Yeah, we can also layer it on Data.ByteString, which uses the FFI to avoid relying on any latin-1 behviour.

...

...
...
(1) can currently be done, but it's not at all clear how to do so, or once you have figured out how to do so, why it works.

(This may be a bit out of date, but seeing this brought up again, I think not.)

I think its a little out of date, given Data.ByteString and utf8-string?

It's not obvious that ByteString is the place to look for I/O, so it's not yet good enough. It should be as easy to use as character I/O, and as easy to find.

Agreed. -- Don

Stefan O'Rear

2:20 a.m.

On Sat, Jul 14, 2007 at 12:15:34PM +1000, Donald Bruce Stewart wrote:

...

wnoise:

...
On 2007-07-14, Donald Bruce Stewart wrote:

...
not in base, but see utf8-string on hackage.haskell.org.

Yes, this a decent layering of (2), on top of (1), for GHC only, depending on it to reading the bytes, and interpreting them as Latin-1.

Yeah, we can also layer it on Data.ByteString, which uses the FFI to avoid relying on any latin-1 behviour.

Actually, it uses hGetBuf, which is in base and already specified to return raw bytes. ================== hGet h i = createAndTrim i $ \p -> hGetBuf h p i ================== Stefan

Andrew Coppin

13 Jul 13 Jul

7:05 p.m.

Ketil Malde wrote:

...

On Thu, 2007-07-12 at 19:15 +0100, Andrew Coppin wrote:

...
...
While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

...
Yay! Haskell's text I/O system is buggy. :-P

Works for me, but feel free to file a bug or provide a more specific report.

I was actually commenting on the other guy's remark that "anything that chokes on a BOM can be considered buggy" - not entirely seriously. ;-) If there is a "bug" to be reported, it is merely that [the GHC implementation of] Haskell appears to interpret files as containing "8-bit ASCII", rather than doing real character encodings. I have no idea whether anybody has filed a bug report / feature request for this. (Come to think of it, I have no idea how to check either...)

Stefan O'Rear

7:11 p.m.

On Fri, Jul 13, 2007 at 08:05:36PM +0100, Andrew Coppin wrote:

...

Ketil Malde wrote:

...
On Thu, 2007-07-12 at 19:15 +0100, Andrew Coppin wrote:

...
...
While BOMs (Byte Order Mark) are pretty irrelevant to byte-oriented encodings like UTF-8, I think programs that fail on their presence can be considered buggy.

...
Yay! Haskell's text I/O system is buggy. :-P

Works for me, but feel free to file a bug or provide a more specific report.

I was actually commenting on the other guy's remark that "anything that chokes on a BOM can be considered buggy" - not entirely seriously. ;-)

If there is a "bug" to be reported, it is merely that [the GHC implementation of] Haskell appears to interpret files as containing "8-bit ASCII", rather than doing real character encodings. I have no idea whether

There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

...

anybody has filed a bug report / feature request for this. (Come to think of it, I have no idea how to check either...)

http://hackage.haskell.org/trac/ghc/query Stefan

Andrew Coppin

7:44 p.m.

Stefan O'Rear wrote:

...

On Fri, Jul 13, 2007 at 08:05:36PM +0100, Andrew Coppin wrote:

...
I was actually commenting on the other guy's remark that "anything that chokes on a BOM can be considered buggy" - not entirely seriously. ;-)

If there is a "bug" to be reported, it is merely that [the GHC implementation of] Haskell appears to interpret files as containing "8-bit ASCII", rather than doing real character encodings. I have no idea whether

There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

Indeed - ASCII is actually a 7-bit standard. But all known systems use 8 bits/character, and use the extra bit in various random ways. Everybody seems to *call* this "8-bit ASCII", despite that being a rather silly name. I have no idea what "ISO-8859-1" is. (But let's not start another thread about that...)

...

...
anybody has filed a bug report / feature request for this. (Come to think of it, I have no idea how to check either...)

http://hackage.haskell.org/trac/ghc/query

Ah... OK.

Brandon S. Allbery KF8NH

10:45 p.m.

On Jul 13, 2007, at 15:11 , Stefan O'Rear wrote:

...

There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

Hm, shouldn't it really be ISO-8859-15? (The difference being that -1 predates the euro symbol.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Stefan O'Rear

10:57 p.m.

On Fri, Jul 13, 2007 at 06:45:03PM -0400, Brandon S. Allbery KF8NH wrote:

...

On Jul 13, 2007, at 15:11 , Stefan O'Rear wrote:

...
There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

Hm, shouldn't it really be ISO-8859-15? (The difference being that -1 predates the euro symbol.)

Base assumes that bytes 0-255 correspond to Unicode codepoints 0-255; according to gucharmap, the Euro sign is 0x20AC. << € >> U+20AC EURO SIGN Stefan

Andrew Coppin

14 Jul 14 Jul

6:38 p.m.

Stefan O'Rear wrote:

...

On Fri, Jul 13, 2007 at 06:45:03PM -0400, Brandon S. Allbery KF8NH wrote:

...
On Jul 13, 2007, at 15:11 , Stefan O'Rear wrote:

...
There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

Hm, shouldn't it really be ISO-8859-15? (The difference being that -1 predates the euro symbol.)

Base assumes that bytes 0-255 correspond to Unicode codepoints 0-255;

Does that actually match *any* known encoding? (I'm no Unicode expert, but I thought that 0 - 127 matches ASCII, but the rest is Unicode-specific?)

Aaron Denney

6:46 p.m.

On 2007-07-14, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
On Fri, Jul 13, 2007 at 06:45:03PM -0400, Brandon S. Allbery KF8NH wrote:

...
On Jul 13, 2007, at 15:11 , Stefan O'Rear wrote:

...
There is no such thing as 8-bit ASCII - base assumes files contain ISO-8859-1.

Hm, shouldn't it really be ISO-8859-15? (The difference being that -1 predates the euro symbol.)

Base assumes that bytes 0-255 correspond to Unicode codepoints 0-255;

Does that actually match *any* known encoding? (I'm no Unicode expert, but I thought that 0 - 127 matches ASCII, but the rest is Unicode-specific?)

Latin-1, AKA ISO-8859-1. -- Aaron Denney -><-

Andrew Coppin

11 Jul 11 Jul

7:04 p.m.

Paul Moore wrote:

...

On 10/07/07, Andrew Coppin wrote:

...
Interesting... I tried to put a pound sign on my web page, and it came out garbled, so I had to replace it with "£"...

You may need to specify a "content encoding" in the HTML header. For that, you need to know the encoding your HTML file is saved in. Unicode works fine, but encodings can be a bit of a minefield...

Indeed. I thought it was just saved as "ASCII"...

Henning Thielemann

12 Jul 12 Jul

7:15 a.m.

On Tue, 10 Jul 2007, Albert Y. C. Lai wrote:

...

Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one".

Windows has been well supporting Unicode since 2000. That is pretty much of the real world.

The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly.

I don't see a greek letter alpha here, but scrambled code in 'pine' here.

Jonathan Cast

7:33 a.m.

On Thursday 12 July 2007, Henning Thielemann wrote:

...

On Tue, 10 Jul 2007, Albert Y. C. Lai wrote:

...
Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one".

Windows has been well supporting Unicode since 2000. That is pretty much of the real world.

The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly.

I don't see a greek letter alpha here, but scrambled code in 'pine' here. ^^^^

There's your problem right there. Get either a terminal or a mail program that knows UTF-8. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Henning Thielemann

7:41 a.m.

New subject: Unicode support (Was: Type system madness)

On Thu, 12 Jul 2007, Jonathan Cast wrote:

...

On Thursday 12 July 2007, Henning Thielemann wrote:

...
On Tue, 10 Jul 2007, Albert Y. C. Lai wrote:

...
Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one".

Windows has been well supporting Unicode since 2000. That is pretty much of the real world.

The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly.

I don't see a greek letter alpha here, but scrambled code in 'pine' here.

There's your problem right there. Get either a terminal or a mail program that knows UTF-8.

I do now understand how "well supported" is meant. If a program doesn't support UTF-8/Unicode, that's not the problem of Unicode, but the problem of the program and its users. If we restrict the range of considered applications to those which support UTF-8 then UTF-8 is globally supported. This leads me to an idea: We declare exclusively Haskell programs being "real programs" then we can safely claim that Haskell is the only language, where real programs can be written in. :-]

Jonathan Cast

8:05 a.m.

New subject: Unicode support (Was: Type system madness)

On Thursday 12 July 2007, Henning Thielemann wrote:

...

On Thu, 12 Jul 2007, Jonathan Cast wrote:

...
On Thursday 12 July 2007, Henning Thielemann wrote:

...
On Tue, 10 Jul 2007, Albert Y. C. Lai wrote:

...
Andrew Coppin wrote:

...
Wait... I thought Unicode was still an experimental prototype? Since when does it work in the real world??

That myth is as old as "Haskell is an experimental prototype". "Old" as in "that's an old one".

Windows has been well supporting Unicode since 2000. That is pretty much of the real world.

The only reason you see α as the Greek letter alpha and not scrambled code is that I send it as Unicode and your Windows and Thunderbird also support Unicode and therefore they display it to you properly.

I don't see a greek letter alpha here, but scrambled code in 'pine' here.

There's your problem right there. Get either a terminal or a mail program that knows UTF-8.

I do now understand how "well supported" is meant. If a program doesn't support UTF-8/Unicode, that's not the problem of Unicode, but the problem of the program and its users. If we restrict the range of considered applications to those which support UTF-8 then UTF-8 is globally supported. This leads me to an idea: We declare exclusively Haskell programs being "real programs" then we can safely claim that Haskell is the only language, where real programs can be written in. :-]

The last release of Pine came out 28 September 2005; the last release to add new features came out 10 May 2004; the last time the major version number was bumped was 8 July 1998. I can appreciate clinging to old, comfortable software; it took quite a bit to get me to abandon nmh. But I did it, because that software simply doesn't work on the modern internet. A certain level of seriousness is required when making software choices, after all. And some software is just too old to be taken seriously. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Miguel Mitrofanov

11 Jul 11 Jul

6:22 a.m.

New subject: Re[2]: Type system madness

AC> Wait... I thought Unicode was still an experimental prototype? AC> Since when does it work in the real world?? What? There was time when Unicode was not working???? Sorry... couldn't help saying that...

Albert Y. C. Lai

9 Jul 9 Jul

9:28 p.m.

Andrew Coppin wrote:

...

I stand in awe of people who actually understand what "universal" and "existential" actually mean... To me, these are just very big words that sound impressive.

I offer to relieve that with http://www.vex.net/~trebla/allsome.txt I think of formal logic as clarifying thought and semantics, cleaning up the mess caused by idiosyncracies in natural languages (both syntax and semantics) such as English. But not many people realize they are in a mess needing cleanup.

Daniil Elovkov

10 Jul 10 Jul

7:25 a.m.

2007/7/10, Andrew Coppin :

...

I stand in awe of people who actually understand what "universal" and "existential" actually mean... To me, these are just very big words that sound impressive.

The following is only my own understanding, please correct me if it's totally wrong! (and sorry for confusion if it is) Another thing that might help is looking at non-functional values: forall a. [a] is the _intersection_ of types [a] where 'a' runs over all possible types. That is, the only non-bottom value of forall a. [a] is the empty list []. So, [4,5] doesn't belong to this type, nor does ['H','e','y']. exists a. [a] constains [4,5] and "Hey" and []. So, it's tempting to say, that it is the sum of types [a] where 'a' runs over all possible types, but I may be lacking theoretic background here... However, in Haskell both of those are designated by the forall word, because as a consumer you treat them in the same way. That is, you can't (safely) make any assumptions about 'a'. In case of forall it simply doesn't make sense, in case of exists Haskell doesn't give you the means to know what 'a' was really used when the value was created. So, if you have types forall a. Class a => [a] and exists a. Class a => [a], in both cases all that you can do with the value is 1) what you can do with lists 2) what you can do with instances of Class (that's for elements of the list)

Jim Burton

9 Jul 9 Jul

10:13 p.m.

Andrew Coppin wrote:

...

OK, can somebody explain to me *really slowly* exactly what the difference between an existential type and a rank-N type is? [...]

If you get a chance, I'd recommend reading Types and Programming Languages by Benjamin Pierce. It's an excellent book that builds up to complicated Type acrobatics from the untyped lambda calculus, and most of what's in there (though the implementations are in O'Caml) has a corresponding name in Haskell (plus dreaded extensions :-)). I'm reading it at the moment and needing to frequently backtrack having got completely lost, but it's worth it. -- View this message in context: http://www.nabble.com/Type-system-madness-tf4051778.html#a11511500 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

David Menendez

11:15 p.m.

On 7/9/07, Andrew Coppin wrote:

...

OK, can somebody explain to me *really slowly* exactly what the difference between an existential type and a rank-N type is?

One important difference is that Hugs supports existential quantification, but not rank-N types. (It does support rank-2 types, which are more common.) The ExistentialQuantification and PolymorphicComponents extensions have to do with what's allowed when defining datatypes. The ExistentialQuantification extension allows you to define datatypes like this: data Stream a = forall b. MkStream b (b -> a) (b -> b) s_head (Stream b h t) = h b s_tail (Stream b h t) = Stream (t b) h t A Stream has a seed of SOME type, and functions which get the current element or update the seed. The type of MkStream is a rank-1 type: MkStream :: forall a b. b -> (b -> a) -> (b -> b) -> Stream a (Normally, the "forall a b." would be implicit, because it's always at the beginning for rank-1 types, and Haskell can distinguish type variables from constructors.) A "destructor" for Stream would have a rank-2 type: unMkStream :: forall a w. (forall b. b -> (b -> a) -> (b -> b) -> w) -> Stream a -> w unMkStream k (Stream b h t) = k b h t (The destructor illustrates how pattern-matching works. "either" and "maybe" are examples of destructors in the Prelude.) Functions which look inside the MkStream constructor have to be defined for ALL possible seed types. -- PolymorphicComponents (a.k.a. universal quantification) lets you use rank 1 values as components of a datatype. data Iterator f = MkIterator { it_head :: forall a. f a -> a , it_tail :: forall a. f a -> f a } An Iterator has two functions that return the head or tail of a collection, which may have ANY type. Now the constructor is rank 2: MkIterator :: forall f. (forall a. f a -> a) -> (forall a. f a -> f a) -> Iterator f The field selectors are rank 1: it_head :: forall f a. Iterator f -> f a -> a it_tail :: forall f a. Iterator f -> f a -> f a And the destructor is rank 3: unMkIterator :: forall f w. ((forall a. f a -> a) -> (forall a. f a -> f a) -> w) -> Iterator f -> w It's rank 3, because the type "forall a. f a -> a" is rank 1, and it's the argument to a function (which is rank 2), that is the argument to another function (which is rank 3). Because Hugs only supports rank-2 polymorphism, it won't accept unMkIterator. GHC's rank-N polymorphism means that it will, because it will accept types of any rank. Hope this helps.

6573

Age (days ago)

6578

Last active (days ago)

List overview

Download

73 comments

22 participants

participants (22)

Aaron Denney
Albert Y. C. Lai
Alex Queiroz
Alexis Hazell
Andrew Coppin
Brandon S. Allbery KF8NH
Bulat Ziganshin
Daniil Elovkov
David Menendez
dons＠cse.unsw.edu.au
Henning Thielemann
Hugh Perkins
Jim Burton
Jonathan Cast
Ketil Malde
Lennart Augustsson
Martin Percossi
Miguel Mitrofanov
Paul Moore
Philip Armstrong
Stefan O'Rear
Steve Schafer