
Takusen permits on-demand processing on three different levels. It is specifically designed for database processing in bounded memory with predictable resource utilization and no resource leaks. But first, about getContents. It has been mentioned a while ago that getContents should be renamed to unsafeGetContents. I strongly support that suggestion. I believe getContents should be used sparingly (I personally never used it), and I believe it cannot give precise resource guarantees and is a wrong model for database interfaces. I will not dwell on the fact that getContents permits IO to occur while evaluating pure code -- which is just wrong. There is a practical consequence of this supposedly theoretical impurity: error handling. As the manual states ``A semi-closed handle becomes closed: ... if an I/O error occurs when reading an item from the handle; or once the entire contents of the handle has been read.'' That is, it is not possible to tell if all the data from the channel have been read or an I/O error has interfered. It is not possible to find out any details about that I/O error. That alone disqualifies getContents from any serious use. Even more egregious is resource handling and that business with semi-closed handles, which is a resource leak. Interfacing with a database requires managing lots of resources: database connection, prepared statement handle, statement handle, result set, database cursor, transaction, input buffers. Takusen was specifically designed to be able to tell exactly when a resource is no longer needed and can be _safely_ disposed of. That guarantee is not available with getContents -- the resources associated with the handle are disposed when the consumer of getContents is finished with it. Since the consumer may be pure code, it is impossible to tell when the evaluation finishes. It may be in a totally different part of the code. To get more predictability, we have to add seq and deepSeq -- thus defeating the laziness we supposedly have gained with getContents, and hoping that two wrongs somehow make it right. Regarding Takusen: it is designed for incremental processing of database data, on three levels: -- unless the programmer said that the query will yield small amount of data, we don't ask the database for all of the result set at the same time. We ask to deliver data in increments of 10 or 100 rows (the programmer may tune the amount). The retrieved chunk is placed into pre-allocated buffers. -- the retrieved chunk is given to an iteratee one row at a time. The iteratee may at each point specify that it has had enough. The processing immediately stops, no further chunks are retrieved and all resources of the query are disposed of. -- Alternatively, Takusen offers the cursor-based interface, with getNext and getCurrent methods. The rows are retrieved on-demand in chunks. The interface is designed to restrict operations on a cursor to a region of code. Once the region is exited (normally or by exception), all associated resources are disposed of because they are statically guaranteed to be unavailable outside the region. Because the moments of resource allocation and deallocation are so well known, Takusen can take care of all of it. The programmer will never have to worry about resource leaks, deallocations, etc. A bit of experience: I have implemented a Web application server in Haskell, using Takusen as a back end. The server runs as a FastCGI dynamic server, retrieving a chunk of rows from the database, formatting the rows (e.g., in XML), sending them up the FastCGI interface and ultimately to the client, coming back for the next chunk. The advantage of that stream-wise processing is low latency, low memory consumption, and client limiting the database retrieval rate. Typical requests routinely ask for thousands of database rows; the server runs continuously serving hundred of requests, in constant memory. The executable is 2.6 MB in size (GHC 6.4.2); the running process takes VmSize of 6608 kB, including VmRSS of 3596 kB and VmData of 1412 kB. The code has not a single unsafePerformIO (and aside from an S-expression parsing code I inherited) I used not a single strictness annotation. The line count (including comments) is 7500 lines in 30 files.