
I would like to sugest a new basic type in Haskell. What if we had something like this (with any other quoting character):
«Je ne parle pas français. (...) ¿Hablas español?»
This would be of type Utf8. I think now it is not a bad idea, since Haskell source code is supposed to be utf-8. The internal representation of this datatype would be a null terminated utf-8 byte vector. ...
Stream fusion on Haskell Unicode strings - Tom Harper http://www.wellquite.org/non-blog/AngloHaskell2008/tom%20harper.pdf (...)
Actually, what I suggest is quite different, in points I see as worthwhile: * His focus is on speed and memory, my goal is more elegant and safe code. * His approach consolidates Prelude. My approach allows complete elimination of Prelude. If we had a Utf8 basic type, we could have modules with many different basic types, and many different ideas on how to 'read «something» :: <sometype>'. In the future, we could write a module to implement some sort of not yet invented numeral type, which other module would allow to be readed from Chinese kanji. * He wants to preserve many properties of [Char]. I think Utf8 type should have no standard properties at all. See next argument on why this would avoid some unsafe code. * He insists on the idea of text as something over char. Well, I'm probably alone there, but I think this was nice, but today we could have better approachs. Except for source code, text is a block of information, not a sequence of anything. I explicitly would like a type we could not map over, because we can't do that — text is built from so many things, there's no basic unit we can apply functions to. Even something like "printing of a table of all characters and their unicode numbers" is impossible, since a lot of unicode is not printable. "Are these blocks of text equal?" also do not work like that, since different sets of bytes can have the same meaning. If you want some piece of text to obey specific properties, you should have to extract it to a proper type. Sorry if this is insane for some reason. Thanks, Maurício