Ok, I stubbed out an approach to implementing an async-based binding to libpq, sans implementation of course:

https://github.com/lpsmith/postgresql-libpq/commit/96c82197b558019de44cc4b5ef728b95de99f0ab

What do you think?

The libpq source may be helpful. Take a look at PQexecFinish in fe-exec.c, which reveals some interesting considerations:

* When PQexec is given multiple SQL commands, it discards all but the last PGresult, but it combines error messages from these results.

* When PQexec encounters a result status of PGRES_COPY_*, it stops returning results, to allow the application to perform the transfer.

One thing to bear in mind is that threadWait* can be interrupted on Unix (that's one motivation for using the async API in the first place). You'll have to figure out what 'exec' should do when interrupted while waiting for a result. Perhaps it should use PQcancel to cancel the request, then continue waiting for results. If interrupted yet again, a subsequent 'exec' will need to call PQgetResult to discard results from the interrupted query before calling PQsendQuery again.

Yeah, I did plan to read the libpq source; in any case thanks for the comments, they are helpful.

Async exceptions do present a thorny semantic problem; but at least my use cases would basically assume the connection is dead. I mean, didn't we have this conversation with respect to (timeout (readMVar foo)) and (timeout (readChan foo)), in that you should basically then treat the resource as a dead resource due to the fact that there is some small possibility that you got interrupted after you took the resource but before you've actually returned it?

Calling PQcancel and waiting it out seem problematic from the point of view of my use cases as well: namely detecting a flaky network and cases where the backend was restarted in a timely fashion. (Although, come to think of it, the latter case should be quickly detectable without timeouts, so perhaps I should investigate deeper into why my clients are taking so long to reconnect to a restarted server.)

However, perhaps there are other good justifications for attempting more complex behavior, along the lines you suggest.

Best,
Leon

On Fri, Jun 28, 2013 at 5:12 AM, Joey Adams <joeyadams3.14159@gmail.com> wrote:

On Thu, Jun 27, 2013 at 11:40 PM, Leon Smith <leon.p.smith@gmail.com> wrote:

Yeah, eventually adding some level of understanding of the formats would be nice, but honestly my immediate need is a higher-performance (and somewhat specialized) replication solution; I only need a tiny bit of understanding of the data I'm shipping from server to server. But at the same time, a complete solution would need to deal with all three formats and the numerous flavors thereof, not just CSV. I don't doubt that the current FromField/ToField interface isn't particularly well suited to dealing with copy data, though. =)

In response to a proposal to add COPY escaping functionality to libpq, Tom Lane pointed out that "one of the key requirements for anything dealing with COPY is that it be fast" [1]. postgresql-copy-escape as-is provides a faster way to insert data than multi-row INSERT. We don't need to support all of the formats for this. I didn't even implement COPY TO support since PostgreSQL returns result sets pretty efficiently through the normal interface.

Your case is even simpler: send COPY data from one server to another. This shouldn't require packing/unpacking fields.

[1]: http://www.postgresql.org/message-id/19641.1331821069@sss.pgh.pa.us

As for threadWaitWrite, I understand that, but the code is actually dead code at the moment because postgresql-simple doesn't use libpq connections in asynchronous mode; it's just there for when postgresql-simple starts using async libpq calls. Which is something I would like to do, but it would also be a disaster on Windows at the moment. I'm thinking perhaps that the most sensible thing to do would be to reimplement the blocking libpq calls in haskell using nonblocking libpq calls, and then only use this module on unix systems. Hopefully that would cut down on the CPP hackery.

The libpq source may be helpful. Take a look at PQexecFinish in fe-exec.c, which reveals some interesting considerations:

* When PQexec is given multiple SQL commands, it discards all but the last PGresult, but it combines error messages from these results.

* When PQexec encounters a result status of PGRES_COPY_*, it stops returning results, to allow the application to perform the transfer.

One thing to bear in mind is that threadWait* can be interrupted on Unix (that's one motivation for using the async API in the first place). You'll have to figure out what 'exec' should do when interrupted while waiting for a result. Perhaps it should use PQcancel to cancel the request, then continue waiting for results. If interrupted yet again, a subsequent 'exec' will need to call PQgetResult to discard results from the interrupted query before calling PQsendQuery again.