ByteString and Text - any conventions?

While working on https://github.com/nurpax/sqlite-simple there have been occasions when I've tried to decide whether to use ByteStrings or Text strings. I note that postgresql-simple and mysql-simple use ByteStrings exclusively in the API. Has a convention formed in the Haskell community on which strings should be used in APIs that pass strings around? In my case, blobs will anyway be passed around as ByteStrings. But SQLite's C strings are UTF8 and some might prefer to Text over ByteStrings. Or support both in the API? This matters in the low-level bindings too, where result accessors need to return either ByteStrings or Text objects for TEXT fields. Currently direct-sqlite uses the String type but Irene is thinking of changing the type to a more efficient representation (https://github.com/IreneKnapp/direct-sqlite/issues/3). Sqlite-simple links against both bytestring and text already, so from a purely package dependency point of view the choice doesn't really matter. But sqlite-direct will need to choose one or the other for its string type in the SQLText s constructor. Any thoughts? Cheers, Janne

Personally, I use ByteStrings for binary data, and Text for textual data.
There are a bunch of APIs which use ByteStrings when Text might be a better
choice because when they were designed the only choices were ByteString and
String.
In some cases, it could be sensible to design a low-level API around
ByteString, and build a Text API on top of that. For example, in a SQL
database, you may need communicate with the database via utf-8 encoded
binary data. So, the low-level binding would use ByteString. But, when you
are actually constructing the queries, you probably want to use Text most
of the time. The OverloadedStrings instance for ByteString only supports
ascii and just truncates utf-8 values resulting in invalid data. But, for
Text it does the right thing.
In general, relying on developers to correctly remember and encode the
ByteStrings is a poor idea. Generally, when you use Text, the developer
doesn't have to think at all, and the encodings just work. Yay for types!
The only argument for working with ByteStrings everywhere is that it might
be faster since you don't have to do the ByteString <-> Text conversion.
But I don't think there is any data to show that that conversion time is
significant. At the very least, it would be sensible to use a newtype
wrapper like, newtype UTF8 = UTF8 { toByteString :: ByteString }, to ensure
that you get at least some type checking. I am pretty sure this wrapper
exists somewhere already.
- jeremy
On Sun, Aug 12, 2012 at 1:18 PM, Janne Hellsten
While working on https://github.com/nurpax/sqlite-simple there have been occasions when I've tried to decide whether to use ByteStrings or Text strings.
I note that postgresql-simple and mysql-simple use ByteStrings exclusively in the API.
Has a convention formed in the Haskell community on which strings should be used in APIs that pass strings around?
In my case, blobs will anyway be passed around as ByteStrings. But SQLite's C strings are UTF8 and some might prefer to Text over ByteStrings. Or support both in the API?
This matters in the low-level bindings too, where result accessors need to return either ByteStrings or Text objects for TEXT fields. Currently direct-sqlite uses the String type but Irene is thinking of changing the type to a more efficient representation (https://github.com/IreneKnapp/direct-sqlite/issues/3).
Sqlite-simple links against both bytestring and text already, so from a purely package dependency point of view the choice doesn't really matter. But sqlite-direct will need to choose one or the other for its string type in the SQLText s constructor.
Any thoughts?
Cheers,
Janne
_______________________________________________ database-devel mailing list database-devel@haskell.org http://www.haskell.org/mailman/listinfo/database-devel

On Sun, Aug 12, 2012 at 11:18 AM, Janne Hellsten
I note that postgresql-simple and mysql-simple use ByteStrings exclusively in the API.
No, that's not how they work. The underlying representation in some places might be a ByteString, but the actual APIs don't use ByteStrings, they go through typeclasses. http://hackage.haskell.org/packages/archive/mysql-simple/latest/doc/html/Dat...

As an aside some people (Joey Adams IIRC, maybe others) have suggested that
postgresql-simple should change the conversions so that ByteString
represents postgresql's binary blobs, and not postgresql's text. (And
get rid of the Binary type in the process)
This seems a perfectly reasonable suggestion to me, though I'm not
particularly inclined to change the API either. If you'd like to adopt
this suggestion, be my guest.
I don't think that API compatibility between the *-simples should be high
priority; do whatever you think is best for sqlite-simple. Then
hopefully in a few years we'll have some well-developed database-specific
interfaces that can inform a next-generation HDBC interface.
Best,
Leon
On Mon, Aug 13, 2012 at 10:33 AM, Bryan O'Sullivan
On Sun, Aug 12, 2012 at 11:18 AM, Janne Hellsten
wrote: I note that postgresql-simple and mysql-simple use ByteStrings exclusively in the API.
No, that's not how they work. The underlying representation in some places might be a ByteString, but the actual APIs don't use ByteStrings, they go through typeclasses.
http://hackage.haskell.org/packages/archive/mysql-simple/latest/doc/html/Dat...
_______________________________________________ database-devel mailing list database-devel@haskell.org http://www.haskell.org/mailman/listinfo/database-devel

On Mon, Aug 13, 2012 at 11:18 PM, Leon Smith
As an aside some people (Joey Adams IIRC, maybe others) have suggested that postgresql-simple should change the conversions so that ByteString represents postgresql's binary blobs, and not postgresql's text. (And get rid of the Binary type in the process)
I like this idea. I went for this approach in my latest github version.
I don't think that API compatibility between the *-simples should be high priority; do whatever you think is best for sqlite-simple. Then hopefully in a few years we'll have some well-developed database-specific interfaces that can inform a next-generation HDBC interface.
I agree and I've somewhat departed from the postgresql-simple API already. I also abandoned the query parser that was used for query parameter substitution. Instead I bind query parameters directly using SQLite's prepared statement parameter binding. This is good in that SQLite is the only one parsing the query string, so perhaps there will be less potential for bugs as I'm not rolling my own parser or escaping. The downside is that supporting the 'In' type for ToField becomes much trickier (in fact, I don't currently know how to implement that..) But it's a lot simpler now. Janne
participants (4)
-
Bryan O'Sullivan
-
Janne Hellsten
-
Jeremy Shaw
-
Leon Smith