Re: [Haskell-cafe] HDBC, postgresql, bytestrings and embedded NULLs

18 Jan 2011

      On Mon, Jan 17, 2011 at 11:38 PM, John Goerzen  wrote:
...
On 01/17/2011 03:16 PM, Michael Snoyman wrote:
...
I've brought up before my problem with the convertible package: it
encourages usage of partial functions. I would prefer two typeclasses,
one for guaranteed conversions and one for conversions which may fail.
In fact, that is precisely why convertible-text[1] exists.
I would be open to making that change in convertible.  The unfortunate
reality with databases, however, is that many times we put things into
strings for sending to the DB engine, and get things back from it in the
form of strings, which must then be parsed into numeric types and the like.
 We can't, as a matter of type system principles, guarantee that a String
can be converted to an Integer.  How were you thinking the separation into
these typeclasses would be applied in the context of databases/
...
As a related issue, there are a large number of data constructors in
HDBC for SqlValue. I would not argue with the presence of any of them:
for your purposes, every one of them is necessary. But for someone
writing a cross-backend package with a more limited set of datatypes,
it gets to be a problem. I know I can use convertible for this, but
see my previous paragraph ;).
How about using an import...hiding statement?  Perhaps even your own module
that only re-exports the constructors you like?
In Persistent, we already have a very good idea of what the datatype
will be (integral, date/time, etc), and therefore dealing with the raw
bytestring would probably be preferable for us. (This is a bit of a
simplification, but close enough to reality.) I'm not actually
requesting you change HDBC to return ByteStrings, I'm simply stating
that, while a high-level API using Haskell datatypes is correct for
*most* uses, there are use cases whee a low-level API makes more
sense.

And the many-constructor issue isn't a matter of convenience or being
overwhelmed by constructors, it's a matter of correctness: if I want
to map a SqlValue onto a UTCTime value, I need to check a number of
different constructors, as opposed to using a ByteString and reading
out the value directly. In Persistent, I always know which type of
DATE or DATETIME was used for creating tables, so I know which version
to anticipate. In theory, this argument applies to HDBC's constructors
as well; however, I think I was bitten by a bug previously to do with
timezoning issues.
...
...
I also don't like using the lazy result functions. I'm sure for many
people, they are precisely what is needed. However, in my
applications, I try to avoid it whenever possible. I've had bugs crop
up because I accidently used the lazy instead of strict version of a
function. I would prefer using an interface that uses enumerators[2].
It would be pretty simple to add an option to the API to force the use of
the strict versions of functions in all cases (or perhaps to generate an
exception if a lazy version is attempted.)  Would that address the concern?
 Or perhaps separating them into separate modules?
Again, I *personally* think that would be better, but I'm sure many
other HDBC users would consider this a change for the worse: there are
a lot of Haskellers who have no problem with lazy IO, and I don't want
to adversely affect their programming on a whim.
...
I took a quick look at the enumerators library, but it doesn't seem to have
the necessary support for handling data that comes from arbitrary C API
function calls rather than handles or sockets.
It does support this, for prior art see yaml[1] or yajl-enumerator[2].
I'd be happy to help you with this, as having some enumerator
experience is a big help here.
...
...
For none of these do I actually think that HDBC should change. I think
it is a great library with a well-thought-out API. All I'm saying is
that I doubt there will ever be a single high-level API that will suit
everyone's need, and I see a huge amount of value in splitting out the
low-level code into a separate package. That way, *everyone* can share
that code together, *everyone* can find the bugs in it, and *everyone*
can benefit from improvements.
Splitting out the backend code is quite reasonable, and actually that was
one of the goals with the HDBC v2 API.  I would have no objection if people
take, say, HDBC-postgresql and add a bunch of non-HDBC stuff to it, or even
break off the C bindings to a separate package and then make HDBC-postgresql
an interface atop that.
I hope that we can, however, agree upon one low-level database API.  The
Java, Python, and Perl communities, at least, have.  Failing to do so
produces unnecessary incompatibility.
I would also hope that this database API would be good enough that there is
rarely call to bypass it and use a database backend directly.
I agree 100%. The question is whether or not we can all agree on what
is low-level. I think the difference between Haskell and other
languages here is that, while everyone else basically is satisfied
with a cursor approach, in Haskell we have at least two other options:
enumerators and lazy IO. I would be in favor of trying to design
something very low-level (read: more low-level than HDBC) that can
represent any database backend.

On the other hand, I don't think such a system would be used to allow
your code to switch from SQLite to PostgreSQL, since they represent
data so differently. In other words, I would like a SQLite package
that presents a SQLite-specific API and the same for PostgreSQL, but
for them to be designed to be very similar from the beginning to make
the job of implementing higher-level APIs easier.

Michael

[1] http://hackage.haskell.org/package/yaml
[2] http://hackage.haskell.org/package/yajl-enumerator