I don't actually need UTF-16 code in these strings. I would rather filter them out before writing such strings to a file.
What would be a simple filter to do this?
Albert Y. C. Lai
trebla at vex.net
wrote:
On 11-12-04 07:08 AM, dokondr wrote:
> In GHC 7.0.3 / Mac OS X when trying to:
>
> writeFile "someFile" "(Hoping You Have A iPhone When I Do This) Lol
> Sleep Is When You Close These ---> \55357\56384"
>
> I get:
> commitBuffer: invalid argument (Illegal byte sequence)
>
> The string I am trying to write can also be seen here:
> http://twitter.com/#!/search/Hoping%20You%20Have%20A%20iPhone%20When%20I%20Do%20This%20lang%3Aen
> <http://twitter.com/#%21/search/Hoping%20You%20Have%20A%20iPhone%20When%20I%20Do%20This%20lang%3Aen>
\55357 and \56384 would be surrogates D83D and DC40 for use in UTF-16
only. Haskell's Char is not a UTF-16 code unit (unlike early versions of
Java and probably current ones). GHC is correct in rejecting them.
Haskell's Char is a Unicode character directly. If you want the
character U+1F440 "EYES", write \128064 directly (or \x1f440, or \x1F440).
Use http://www.unicode.org/charts/ to find out what you are getting
into. You can enter a hexadecimal number or choose a category.
Yes, you can set the text encoding on the handle you're reading this
text from [1]. The default text encoding is determined by the
environment, which is why I asked about LANG.
If you're entering literal strings, see Albert Lai's answer.
Erik
[1] http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-IO.html#g:23
On Sun, Dec 4, 2011 at 19:13, dokondr <dokondr@gmail.com> wrote:
> Is there any other way to solve this problem without changing LANG
> environment variable?
>
>
> On Sun, Dec 4, 2011 at 8:27 PM, Erik Hesselink <hesselink@gmail.com> wrote:
>>
>> What is the value of your LANG environment variable? Does it still
>> give the error if you set it to e.g. "en_US.UTF-8"?
>>
>> Erik
>>
>> On Sun, Dec 4, 2011 at 13:12, dokondr <dokondr@gmail.com> wrote:
>> > Correct url of a "bad" string:
>> >
>> > http://twitter.com/#!/search/Hoping%20You%20Have%20A%20iPhone%20When%20I%20Do%20This%20lang%3Aen
>> >
>> >
>> > On Sun, Dec 4, 2011 at 3:08 PM, dokondr <dokondr@gmail.com> wrote:
>> >>
>> >> Hi,
>> >> In GHC 7.0.3 / Mac OS X when trying to:
>> >>
>> >> writeFile "someFile" "(Hoping You Have A iPhone When I Do This) Lol
>> >> Sleep
>> >> Is When You Close These ---> \55357\56384"
>> >>
>> >> I get:
>> >> commitBuffer: invalid argument (Illegal byte sequence)
>> >>
>> >> The string I am trying to write can also be seen here:
>> >>
>> >>
>> >> http://twitter.com/#!/search/Hoping%20You%20Have%20A%20iPhone%20When%20I%20Do%20This%20lang%3Aen
>> >>
>> >> It looks like 'writeFile' can not write unicode characters.
>> >> Any workarounds?
>> >>
>> >> Thanks!
>> >> Dmitri
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe@haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> >
>
>
>
>