HDBC, postgresql, bytestrings and embedded NULLs

Hi all, It seems that (at least) the postgresql bindings do not allow pure binary data. I have a simple table: debug=# create table test (name bytea); byteas seems to be the backing type on the DB side for bytestrings. and then I run this: import Database.HDBC.PostgreSQL import Database.HDBC import Data.ByteString main = do db <- connectPostgreSQL "dbname=debug" stmt <- prepare db "INSERT INTO test (name) VALUES($1)" execute stmt [toSql $ pack [0]] execute stmt [toSql $ pack [65, 0, 66]] commit db What happens is that the inserted string is cut-off at the first NULL value: the first row is empty, and the second row contains just "A". http://www.postgresql.org/docs/8.4/static/datatype-binary.html says: “When entering bytea values, octets of certain values must be escaped (but all octet values can be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, convert it into its three-digit octal value and precede it by two backslashes”, and continues to list that NULL should be quoted as E'\\000'. However, I find no such quoting in the HDCB.Postgresql sources. Anyone else stumbled on this? thanks, iustin

On Fri, Jan 7, 2011 at 11:44 AM, Iustin Pop
Hi all,
It seems that (at least) the postgresql bindings do not allow pure binary data.
I have a simple table:
debug=# create table test (name bytea);
byteas seems to be the backing type on the DB side for bytestrings.
and then I run this:
import Database.HDBC.PostgreSQL import Database.HDBC import Data.ByteString
main = do db <- connectPostgreSQL "dbname=debug" stmt <- prepare db "INSERT INTO test (name) VALUES($1)" execute stmt [toSql $ pack [0]] execute stmt [toSql $ pack [65, 0, 66]] commit db
What happens is that the inserted string is cut-off at the first NULL value: the first row is empty, and the second row contains just "A".
http://www.postgresql.org/docs/8.4/static/datatype-binary.html says:
“When entering bytea values, octets of certain values must be escaped (but all octet values can be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, convert it into its three-digit octal value and precede it by two backslashes”, and continues to list that NULL should be quoted as E'\\000'. However, I find no such quoting in the HDCB.Postgresql sources.
Anyone else stumbled on this?
thanks, iustin
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Yes, I had a bug reported in persistent-postgresql that I traced back to this bug. I reported the bug, but never heard a response. Frankly, if I had time, I would write a low-level PostgreSQL binding so I could skip HDBC entirely. Michael

On 01/07/2011 05:24 AM, Michael Snoyman wrote:
On Fri, Jan 7, 2011 at 11:44 AM, Iustin Pop
wrote: Yes, I had a bug reported in persistent-postgresql that I traced back to this bug. I reported the bug, but never heard a response. Frankly, if I had time, I would write a low-level PostgreSQL binding so I could skip HDBC entirely.
I'm not seeing an open issue at https://github.com/jgoerzen/hdbc-postgresql/issues -- did you report it somewhere else? What would you gain by skipping HDBC? If there's a problem in the API, I'd like to fix it. -- John

On Fri, Jan 07, 2011 at 09:49:35AM -0600, John Goerzen wrote:
On 01/07/2011 05:24 AM, Michael Snoyman wrote:
On Fri, Jan 7, 2011 at 11:44 AM, Iustin Pop
wrote: Yes, I had a bug reported in persistent-postgresql that I traced back to this bug. I reported the bug, but never heard a response. Frankly, if I had time, I would write a low-level PostgreSQL binding so I could skip HDBC entirely. I'm not seeing an open issue at https://github.com/jgoerzen/hdbc-postgresql/issues -- did you report it somewhere else?
Ah, I didn't know it's hosted there. I'm going to fill my report, thanks! iustin

On 01/07/2011 09:49 AM, John Goerzen wrote:
On 01/07/2011 05:24 AM, Michael Snoyman wrote:
On Fri, Jan 7, 2011 at 11:44 AM, Iustin Pop
wrote: Yes, I had a bug reported in persistent-postgresql that I traced back to this bug. I reported the bug, but never heard a response. Frankly, if I had time, I would write a low-level PostgreSQL binding so I could skip HDBC entirely. I'm not seeing an open issue at https://github.com/jgoerzen/hdbc-postgresql/issues -- did you report it somewhere else?
Along the same lines, I am a volunteer and patches are accepted even more happily than bug reports. It's disheartening to see someone's volunteer work reduced to "Bah, it doesn't escape NULLs, so it sucks so much that I'll just go write my own." It would seem to me that contributing your skill to fixing issues with existing software would be a better thing than having to invent yet another database system. -- John

On Fri, Jan 7, 2011 at 6:01 PM, John Goerzen
On 01/07/2011 09:49 AM, John Goerzen wrote:
On 01/07/2011 05:24 AM, Michael Snoyman wrote:
On Fri, Jan 7, 2011 at 11:44 AM, Iustin Pop
wrote: Yes, I had a bug reported in persistent-postgresql that I traced back to this bug. I reported the bug, but never heard a response. Frankly, if I had time, I would write a low-level PostgreSQL binding so I could skip HDBC entirely. I'm not seeing an open issue at https://github.com/jgoerzen/hdbc-postgresql/issues -- did you report it somewhere else?
Along the same lines, I am a volunteer and patches are accepted even more happily than bug reports. It's disheartening to see someone's volunteer work reduced to "Bah, it doesn't escape NULLs, so it sucks so much that I'll just go write my own." It would seem to me that contributing your skill to fixing issues with existing software would be a better thing than having to invent yet another database system.
-- John
Sorry, I did not mean to imply that your work is worthless. What I meant is that I don't like having so many layers of indirection in my software: persistent itself is already a high-level wrapper that abstracts backends. Wrapping another library to do something similar just means there are now *two* places to check for bugs. In general I think it would be a good thing to have solid, low-level bindings to PostgreSQL. And I reported the bug with a direct email to you I believe, perhaps it went to your junk mail folder? I apologize for the implied insult, it was not intended. Michael

On Sat, Jan 8, 2011 at 11:55 AM, Michael Snoyman
In general I think it would be a good thing to have solid, low-level bindings to PostgreSQL.
Well, there is PostgreSQL and libpq on hackage: http://hackage.haskell.org/package/libpq http://hackage.haskell.org/package/PostgreSQL The PostgreSQL looks like it's in need of maintenance, and hasn't been updated in a few years. libpq is new, and looks promising. I haven't really used either one, so I can't really say too much about either. Best, Leon

On Mon, Jan 17, 2011 at 4:49 PM, Leon Smith
On Sat, Jan 8, 2011 at 11:55 AM, Michael Snoyman
wrote: In general I think it would be a good thing to have solid, low-level bindings to PostgreSQL.
Well, there is PostgreSQL and libpq on hackage:
http://hackage.haskell.org/package/libpq http://hackage.haskell.org/package/PostgreSQL
The PostgreSQL looks like it's in need of maintenance, and hasn't been updated in a few years. libpq is new, and looks promising. I haven't really used either one, so I can't really say too much about either.
Best, Leon
I've tried PostgreSQL before, and if I remember correctly I couldn't even build it. libpq looks interesting, I'd like to try it out. Unfortunately it depends on unix, which would be a problem for Windows users. If it looks like a good fit for persistent-postgresql, maybe I can convince the author to replace the unix dep with something else (unix-compat might be sufficient). Thanks for the pointer, Michael

On 01/17/2011 10:07 AM, Michael Snoyman wrote:
On Mon, Jan 17, 2011 at 4:49 PM, Leon Smith
wrote: On Sat, Jan 8, 2011 at 11:55 AM, Michael Snoyman
wrote: In general I think it would be a good thing to have solid, low-level bindings to PostgreSQL.
Well, there is PostgreSQL and libpq on hackage:
http://hackage.haskell.org/package/libpq http://hackage.haskell.org/package/PostgreSQL
The PostgreSQL looks like it's in need of maintenance, and hasn't been updated in a few years. libpq is new, and looks promising. I haven't really used either one, so I can't really say too much about either.
Best, Leon
I've tried PostgreSQL before, and if I remember correctly I couldn't even build it. libpq looks interesting, I'd like to try it out. Unfortunately it depends on unix, which would be a problem for Windows users. If it looks like a good fit for persistent-postgresql, maybe I can convince the author to replace the unix dep with something else (unix-compat might be sufficient).
I would also like to know what things people find are deficient in HDBC or HDBC-postgresql. If the API isn't good enough for some uses, that could be fixed. I would like to avoid a proliferation of database libraries as that is unnecessary duplication of work. HDBC does have an easy way for DB backends to implement more functionality than the HDBC API supports, or an alternative could also be to make HDBC-postgresql a thin binding over libpq or some such. -- John
Thanks for the pointer, Michael
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Mon, Jan 17, 2011 at 10:52 PM, John Goerzen
On 01/17/2011 10:07 AM, Michael Snoyman wrote:
On Mon, Jan 17, 2011 at 4:49 PM, Leon Smith
wrote: On Sat, Jan 8, 2011 at 11:55 AM, Michael Snoyman
wrote: In general I think it would be a good thing to have solid, low-level bindings to PostgreSQL.
Well, there is PostgreSQL and libpq on hackage:
http://hackage.haskell.org/package/libpq http://hackage.haskell.org/package/PostgreSQL
The PostgreSQL looks like it's in need of maintenance, and hasn't been updated in a few years. libpq is new, and looks promising. I haven't really used either one, so I can't really say too much about either.
Best, Leon
I've tried PostgreSQL before, and if I remember correctly I couldn't even build it. libpq looks interesting, I'd like to try it out. Unfortunately it depends on unix, which would be a problem for Windows users. If it looks like a good fit for persistent-postgresql, maybe I can convince the author to replace the unix dep with something else (unix-compat might be sufficient).
I would also like to know what things people find are deficient in HDBC or HDBC-postgresql. If the API isn't good enough for some uses, that could be fixed. I would like to avoid a proliferation of database libraries as that is unnecessary duplication of work. HDBC does have an easy way for DB backends to implement more functionality than the HDBC API supports, or an alternative could also be to make HDBC-postgresql a thin binding over libpq or some such.
I've brought up before my problem with the convertible package: it encourages usage of partial functions. I would prefer two typeclasses, one for guaranteed conversions and one for conversions which may fail. In fact, that is precisely why convertible-text[1] exists. As a related issue, there are a large number of data constructors in HDBC for SqlValue. I would not argue with the presence of any of them: for your purposes, every one of them is necessary. But for someone writing a cross-backend package with a more limited set of datatypes, it gets to be a problem. I know I can use convertible for this, but see my previous paragraph ;). I also don't like using the lazy result functions. I'm sure for many people, they are precisely what is needed. However, in my applications, I try to avoid it whenever possible. I've had bugs crop up because I accidently used the lazy instead of strict version of a function. I would prefer using an interface that uses enumerators[2]. For none of these do I actually think that HDBC should change. I think it is a great library with a well-thought-out API. All I'm saying is that I doubt there will ever be a single high-level API that will suit everyone's need, and I see a huge amount of value in splitting out the low-level code into a separate package. That way, *everyone* can share that code together, *everyone* can find the bugs in it, and *everyone* can benefit from improvements. Michael [1] http://hackage.haskell.org/package/convertible-text [2] http://hackage.haskell.org/package/enumerator

On 01/17/2011 03:16 PM, Michael Snoyman wrote:
I've brought up before my problem with the convertible package: it encourages usage of partial functions. I would prefer two typeclasses, one for guaranteed conversions and one for conversions which may fail. In fact, that is precisely why convertible-text[1] exists.
I would be open to making that change in convertible. The unfortunate reality with databases, however, is that many times we put things into strings for sending to the DB engine, and get things back from it in the form of strings, which must then be parsed into numeric types and the like. We can't, as a matter of type system principles, guarantee that a String can be converted to an Integer. How were you thinking the separation into these typeclasses would be applied in the context of databases/
As a related issue, there are a large number of data constructors in HDBC for SqlValue. I would not argue with the presence of any of them: for your purposes, every one of them is necessary. But for someone writing a cross-backend package with a more limited set of datatypes, it gets to be a problem. I know I can use convertible for this, but see my previous paragraph ;).
How about using an import...hiding statement? Perhaps even your own module that only re-exports the constructors you like?
I also don't like using the lazy result functions. I'm sure for many people, they are precisely what is needed. However, in my applications, I try to avoid it whenever possible. I've had bugs crop up because I accidently used the lazy instead of strict version of a function. I would prefer using an interface that uses enumerators[2].
It would be pretty simple to add an option to the API to force the use of the strict versions of functions in all cases (or perhaps to generate an exception if a lazy version is attempted.) Would that address the concern? Or perhaps separating them into separate modules? I took a quick look at the enumerators library, but it doesn't seem to have the necessary support for handling data that comes from arbitrary C API function calls rather than handles or sockets.
For none of these do I actually think that HDBC should change. I think it is a great library with a well-thought-out API. All I'm saying is that I doubt there will ever be a single high-level API that will suit everyone's need, and I see a huge amount of value in splitting out the low-level code into a separate package. That way, *everyone* can share that code together, *everyone* can find the bugs in it, and *everyone* can benefit from improvements.
Splitting out the backend code is quite reasonable, and actually that was one of the goals with the HDBC v2 API. I would have no objection if people take, say, HDBC-postgresql and add a bunch of non-HDBC stuff to it, or even break off the C bindings to a separate package and then make HDBC-postgresql an interface atop that. I hope that we can, however, agree upon one low-level database API. The Java, Python, and Perl communities, at least, have. Failing to do so produces unnecessary incompatibility. I would also hope that this database API would be good enough that there is rarely call to bypass it and use a database backend directly. -- John

On Mon, Jan 17, 2011 at 11:38 PM, John Goerzen
On 01/17/2011 03:16 PM, Michael Snoyman wrote:
I've brought up before my problem with the convertible package: it encourages usage of partial functions. I would prefer two typeclasses, one for guaranteed conversions and one for conversions which may fail. In fact, that is precisely why convertible-text[1] exists.
I would be open to making that change in convertible. The unfortunate reality with databases, however, is that many times we put things into strings for sending to the DB engine, and get things back from it in the form of strings, which must then be parsed into numeric types and the like. We can't, as a matter of type system principles, guarantee that a String can be converted to an Integer. How were you thinking the separation into these typeclasses would be applied in the context of databases/
As a related issue, there are a large number of data constructors in HDBC for SqlValue. I would not argue with the presence of any of them: for your purposes, every one of them is necessary. But for someone writing a cross-backend package with a more limited set of datatypes, it gets to be a problem. I know I can use convertible for this, but see my previous paragraph ;).
How about using an import...hiding statement? Perhaps even your own module that only re-exports the constructors you like?
In Persistent, we already have a very good idea of what the datatype will be (integral, date/time, etc), and therefore dealing with the raw bytestring would probably be preferable for us. (This is a bit of a simplification, but close enough to reality.) I'm not actually requesting you change HDBC to return ByteStrings, I'm simply stating that, while a high-level API using Haskell datatypes is correct for *most* uses, there are use cases whee a low-level API makes more sense. And the many-constructor issue isn't a matter of convenience or being overwhelmed by constructors, it's a matter of correctness: if I want to map a SqlValue onto a UTCTime value, I need to check a number of different constructors, as opposed to using a ByteString and reading out the value directly. In Persistent, I always know which type of DATE or DATETIME was used for creating tables, so I know which version to anticipate. In theory, this argument applies to HDBC's constructors as well; however, I think I was bitten by a bug previously to do with timezoning issues.
I also don't like using the lazy result functions. I'm sure for many people, they are precisely what is needed. However, in my applications, I try to avoid it whenever possible. I've had bugs crop up because I accidently used the lazy instead of strict version of a function. I would prefer using an interface that uses enumerators[2].
It would be pretty simple to add an option to the API to force the use of the strict versions of functions in all cases (or perhaps to generate an exception if a lazy version is attempted.) Would that address the concern? Or perhaps separating them into separate modules?
Again, I *personally* think that would be better, but I'm sure many other HDBC users would consider this a change for the worse: there are a lot of Haskellers who have no problem with lazy IO, and I don't want to adversely affect their programming on a whim.
I took a quick look at the enumerators library, but it doesn't seem to have the necessary support for handling data that comes from arbitrary C API function calls rather than handles or sockets.
It does support this, for prior art see yaml[1] or yajl-enumerator[2]. I'd be happy to help you with this, as having some enumerator experience is a big help here.
For none of these do I actually think that HDBC should change. I think it is a great library with a well-thought-out API. All I'm saying is that I doubt there will ever be a single high-level API that will suit everyone's need, and I see a huge amount of value in splitting out the low-level code into a separate package. That way, *everyone* can share that code together, *everyone* can find the bugs in it, and *everyone* can benefit from improvements.
Splitting out the backend code is quite reasonable, and actually that was one of the goals with the HDBC v2 API. I would have no objection if people take, say, HDBC-postgresql and add a bunch of non-HDBC stuff to it, or even break off the C bindings to a separate package and then make HDBC-postgresql an interface atop that.
I hope that we can, however, agree upon one low-level database API. The Java, Python, and Perl communities, at least, have. Failing to do so produces unnecessary incompatibility.
I would also hope that this database API would be good enough that there is rarely call to bypass it and use a database backend directly.
I agree 100%. The question is whether or not we can all agree on what is low-level. I think the difference between Haskell and other languages here is that, while everyone else basically is satisfied with a cursor approach, in Haskell we have at least two other options: enumerators and lazy IO. I would be in favor of trying to design something very low-level (read: more low-level than HDBC) that can represent any database backend. On the other hand, I don't think such a system would be used to allow your code to switch from SQLite to PostgreSQL, since they represent data so differently. In other words, I would like a SQLite package that presents a SQLite-specific API and the same for PostgreSQL, but for them to be designed to be very similar from the beginning to make the job of implementing higher-level APIs easier. Michael [1] http://hackage.haskell.org/package/yaml [2] http://hackage.haskell.org/package/yajl-enumerator
participants (4)
-
Iustin Pop
-
John Goerzen
-
Leon Smith
-
Michael Snoyman