Re: [Haskell-cafe] HDBC 2.1, UTF8 and Umlauts

Guenther Schmidt wrote:
Hi John,
thanks for taking the time. It actually is \252 that turned into something else because of my email client, damn the thing.
OK, perhaps we have some confusion here. Are you saying that you entered the Unicode characters directly into your Haskell source as literals? In other words, you did not type: backslash two five two but instead just typed the umlaut on the keyboard? If so, that won't work directly -- I think. Maybe somebody can correct me on this, but my hunch is that would save the umlaut as UTF-8 when you save the .hs file. Then you will get a String which is supposed to have decoded Unicode data, instead having encoded UTF-8 data. You could wrap it with Codec.Binary.UTF8.String.decodeString from utf8-string and see if that helps. If it does, that'll be your problem. It's a complicated topic, I know. And the scary thing is that Unicode makes this all *easier*.
I'll do some further investigating and give you some more details when I have them, thanks in advance.
Günther
John Goerzen schrieb:
On Mon, May 04, 2009 at 04:44:04PM +0200, Guenther Schmidt wrote:
Hi John,
I'm trying stuff like:
dbc <- connectSqlite3 "somedatabase" run dbc "insert into someTable values (?)" [toSql "Günni"].
SO what do you get back after adding:
commit r <- quickQuery' dbc "select * from someTable" print r
Just knowing it's garbled doesn't help. Need to know *how* it's garbled.
But the problem is that \374 isn't Unicode at all. It's ISO-8859-1. You're not actually giving it Unicode data to start with. I believe the proper sequence is \252.
For all I know, \374 may not even be a valid Unicode encoding (haven't tested it).
Try \252.
I also tried:
dbc <- connectSqlite3 "somedatabase" run dbc "insert into someTable values ('Günni')" [].
So since this is Haskell code I presume it's in UTF-8, my emacs stores all my *.hs files as UTF-8
In either case the "ü" becomes garbled.
With the previous version of HDBC, 1.1.6, this worked just fine.
It also garbles any Umlauts coming *out*, the source is an UTF-8 sqlite3 db file.
Günther
John Goerzen schrieb:
GüŸnther Schmidt wrote:
Hi guys,
for some reason, any way I try, all the Umlauts get garbled with HDBC 2.1. HDBC 1.16 worked fine with any backend (ODBC, Sqlite3, ... what have you).
Anybody else had similar problems and knows how to solve this?
You need to be more specific, but it is likely you are trying to send something to HDBC that isn't encoded in UTF-8. HDBC 2.x has a global preference for UTF-8 now, actually partly to resolve complaints like this.
If you are feeding it ISO-8859-1 data or somesuch, try giving it UTF-8 instead.
-- John

On Mon, 2009-05-04 at 11:19 -0500, John Goerzen wrote:
Guenther Schmidt wrote:
Hi John,
thanks for taking the time. It actually is \252 that turned into something else because of my email client, damn the thing.
OK, perhaps we have some confusion here.
Are you saying that you entered the Unicode characters directly into your Haskell source as literals? In other words, you did not type:
backslash two five two
but instead just typed the umlaut on the keyboard?
If so, that won't work directly -- I think.
It should work if one is using an editor that saves files as UTF-8.
Maybe somebody can correct me on this, but my hunch is that would save the umlaut as UTF-8 when you save the .hs file.
Which is what we want. Since version 6.6, GHC treats .hs files as UTF-8.
Then you will get a String which is supposed to have decoded Unicode data, instead having encoded UTF-8 data.
That's what used to happen prior to GHC version 6.6. As long as your editor is set to use UTF-8 then String literals containing Unicode will work fine. Duncan

At Mon, 04 May 2009 11:19:38 -0500, John Goerzen wrote:
but instead just typed the umlaut on the keyboard?
If so, that won't work directly -- I think. Maybe somebody can correct me on this, but my hunch is that would save the umlaut as UTF-8 when you save the .hs file. Then you will get a String which is supposed to have decoded Unicode data, instead having encoded UTF-8 data.
As far as I know, that is how it would work. You type an umlaut on the keyboard in the source code, and at runtime you have a value of type 'String' which contains the Unicode value for an umlaut. I am confused why that would not work directly? (Though, I have never used HDBC). It seems to be that the only way to keep things sensible is if all values of type 'String' *only* ever contain Unicode chars. If you need a utf-8 encoded 'string' then it should use a different type -- perhaps a newtype wrapper around a plain old bytestring. - jeremy
participants (3)
-
Duncan Coutts
-
Jeremy Shaw
-
John Goerzen