
On Mon, Jan 17, 2011 at 11:38 PM, John Goerzen
On 01/17/2011 03:16 PM, Michael Snoyman wrote:
I've brought up before my problem with the convertible package: it encourages usage of partial functions. I would prefer two typeclasses, one for guaranteed conversions and one for conversions which may fail. In fact, that is precisely why convertible-text[1] exists.
I would be open to making that change in convertible. The unfortunate reality with databases, however, is that many times we put things into strings for sending to the DB engine, and get things back from it in the form of strings, which must then be parsed into numeric types and the like. We can't, as a matter of type system principles, guarantee that a String can be converted to an Integer. How were you thinking the separation into these typeclasses would be applied in the context of databases/
As a related issue, there are a large number of data constructors in HDBC for SqlValue. I would not argue with the presence of any of them: for your purposes, every one of them is necessary. But for someone writing a cross-backend package with a more limited set of datatypes, it gets to be a problem. I know I can use convertible for this, but see my previous paragraph ;).
How about using an import...hiding statement? Perhaps even your own module that only re-exports the constructors you like?
In Persistent, we already have a very good idea of what the datatype will be (integral, date/time, etc), and therefore dealing with the raw bytestring would probably be preferable for us. (This is a bit of a simplification, but close enough to reality.) I'm not actually requesting you change HDBC to return ByteStrings, I'm simply stating that, while a high-level API using Haskell datatypes is correct for *most* uses, there are use cases whee a low-level API makes more sense. And the many-constructor issue isn't a matter of convenience or being overwhelmed by constructors, it's a matter of correctness: if I want to map a SqlValue onto a UTCTime value, I need to check a number of different constructors, as opposed to using a ByteString and reading out the value directly. In Persistent, I always know which type of DATE or DATETIME was used for creating tables, so I know which version to anticipate. In theory, this argument applies to HDBC's constructors as well; however, I think I was bitten by a bug previously to do with timezoning issues.
I also don't like using the lazy result functions. I'm sure for many people, they are precisely what is needed. However, in my applications, I try to avoid it whenever possible. I've had bugs crop up because I accidently used the lazy instead of strict version of a function. I would prefer using an interface that uses enumerators[2].
It would be pretty simple to add an option to the API to force the use of the strict versions of functions in all cases (or perhaps to generate an exception if a lazy version is attempted.) Would that address the concern? Or perhaps separating them into separate modules?
Again, I *personally* think that would be better, but I'm sure many other HDBC users would consider this a change for the worse: there are a lot of Haskellers who have no problem with lazy IO, and I don't want to adversely affect their programming on a whim.
I took a quick look at the enumerators library, but it doesn't seem to have the necessary support for handling data that comes from arbitrary C API function calls rather than handles or sockets.
It does support this, for prior art see yaml[1] or yajl-enumerator[2]. I'd be happy to help you with this, as having some enumerator experience is a big help here.
For none of these do I actually think that HDBC should change. I think it is a great library with a well-thought-out API. All I'm saying is that I doubt there will ever be a single high-level API that will suit everyone's need, and I see a huge amount of value in splitting out the low-level code into a separate package. That way, *everyone* can share that code together, *everyone* can find the bugs in it, and *everyone* can benefit from improvements.
Splitting out the backend code is quite reasonable, and actually that was one of the goals with the HDBC v2 API. I would have no objection if people take, say, HDBC-postgresql and add a bunch of non-HDBC stuff to it, or even break off the C bindings to a separate package and then make HDBC-postgresql an interface atop that.
I hope that we can, however, agree upon one low-level database API. The Java, Python, and Perl communities, at least, have. Failing to do so produces unnecessary incompatibility.
I would also hope that this database API would be good enough that there is rarely call to bypass it and use a database backend directly.
I agree 100%. The question is whether or not we can all agree on what is low-level. I think the difference between Haskell and other languages here is that, while everyone else basically is satisfied with a cursor approach, in Haskell we have at least two other options: enumerators and lazy IO. I would be in favor of trying to design something very low-level (read: more low-level than HDBC) that can represent any database backend. On the other hand, I don't think such a system would be used to allow your code to switch from SQLite to PostgreSQL, since they represent data so differently. In other words, I would like a SQLite package that presents a SQLite-specific API and the same for PostgreSQL, but for them to be designed to be very similar from the beginning to make the job of implementing higher-level APIs easier. Michael [1] http://hackage.haskell.org/package/yaml [2] http://hackage.haskell.org/package/yajl-enumerator