
Disclaimer: I don't know Takusen very well, and I'm talking of Sqlite backend. I noticed that in Takusen there're just two instances to implement to make any Haskell type db-serializable: DBBind / SqliteBind for serialization and DBType for deserialization. I wanted to implement blob serialization for PackedStrings, but I noticed that both DBBind and DBType classes are not public. Defining custom serialization (not just Show / Read string serialization) is very useful in applicative code, and not just in Takusen backend code. There's any way to do it without modifying Takusen Sqlite backend? Salvatore Insalaco

I noticed that in Takusen there're just two instances to implement to make any Haskell type db-serializable: DBBind / SqliteBind for serialization and DBType for deserialization.
I wanted to implement blob serialization for PackedStrings, but I noticed that both DBBind and DBType classes are not public. Defining custom serialization (not just Show / Read string serialization) is very useful in applicative code, and not just in Takusen backend code.
There's any way to do it without modifying Takusen Sqlite backend?
Salvatore Insalaco
Sorry, not at present. It shouldn't be hard to modify the code though, and PackedString support is something we should really be thinking about anyway (as well as Blobs, which I think should have a different type). Although DBType is a class in InternalEnumerator (so you'd think it might be easy to extend with new types), in practice we do not expose the internal Query and ColumnBuffer objects that you also need to create new instances of the class, so you're stuck with what we provide. This is intentional; you usually need to access the low-level buffers and other resources to write an instance on DBType, and we don't want to expose these details to users. By hiding this stuff, we can ensure that resources are properly cleaned up when queries and commands complete. So, please take this as an invitation to modity the Sqlite implementation to handle PackedStrings (and Blobs, if you want to). BTW, do you really need to marshall PackedStrings to blobs? The Sqlite library uses CStrings, and I assume that CString to PackedString marshaling is fairly efficient, so that would seem to be a better choice. (I have no experience of PackedStrings, so there might be good reasons to prefer blobs, and I'd like to know what they are.) Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

2007/7/27, Bayley, Alistair
BTW, do you really need to marshall PackedStrings to blobs? The Sqlite library uses CStrings, and I assume that CString to PackedString marshaling is fairly efficient, so that would seem to be a better choice. (I have no experience of PackedStrings, so there might be good reasons to prefer blobs, and I'd like to know what they are.)
The main reason is that I'm working on a Sqlite back-end for Darcs, that will be used to store file contents. I think to choose Takusen as back-end library mainly because it has the blob functions of Sqlite already mapped. In Darcs PackedStrings are used to store file contents in memory. I don't think that it would be very efficient to store files of megabytes in a text column (there could be encoding problems too). Also, in Sqlite 3.4, they introduced new functions for incremental reading / writing of Blobs. I could use them in the future. Thank you a lot for helping! I'll surely send you the patches, even if the PackedString support will be a little Darcs-specific (I don't think that requiring it for compiling Takusen is a good idea). Salvatore Insalaco

From: Salvatore Insalaco [mailto:kirby81@gmail.com]
The main reason is that I'm working on a Sqlite back-end for Darcs, that will be used to store file contents. I think to choose Takusen as back-end library mainly because it has the blob functions of Sqlite already mapped.
Umm, yes... I wrote the blob FFI imports for Sqlite, as I planned to prototype Takusen's blob support in Sqlite first, but never got any further with it. It is completely untested.
In Darcs PackedStrings are used to store file contents in memory. I don't think that it would be very efficient to store files of megabytes in a text column (there could be encoding problems too).
Also, in Sqlite 3.4, they introduced new functions for incremental reading / writing of Blobs. I could use them in the future.
Seems reasonable. I recall Oleg saying something privately a while ago about an API for large objects. He may have some ideas for this. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

2007/7/27, Bayley, Alistair
Also, in Sqlite 3.4, they introduced new functions for incremental reading / writing of Blobs. I could use them in the future.
Seems reasonable. I recall Oleg saying something privately a while ago about an API for large objects. He may have some ideas for this.
A good idea could be to do the marshalling / unmarshalling of Blob as CStringLen. It can then be used to construct PackedStrings and ByteStrings in O(1), or doing some custom computations with it. A CStringLen, even if contains Ptr CChar, can easily be converted to any pointer type. Another idea could be using custom bind function for blobs, but the first solution is surely easier. Salvatore

From: Salvatore Insalaco [mailto:kirby81@gmail.com]
2007/7/27, Bayley, Alistair
: Also, in Sqlite 3.4, they introduced new functions for incremental reading / writing of Blobs. I could use them in the future.
Seems reasonable. I recall Oleg saying something privately a while ago about an API for large objects. He may have some ideas for this.
A good idea could be to do the marshalling / unmarshalling of Blob as CStringLen. It can then be used to construct PackedStrings and ByteStrings in O(1), or doing some custom computations with it. A CStringLen, even if contains Ptr CChar, can easily be converted to any pointer type.
Another idea could be using custom bind function for blobs, but the first solution is surely easier.
It was my intention to do it the other way around: marshall blob to Ptr (), and then you can cast this to a Ptr CChar. Obviously you'd need to retain the size information, so a blob basically becomes a (Ptr (), Int) pair, just like a CStringLen... At least this way you've got a type which says explicitly "this thing is a blob", and then if you know better, i.e. it's really a CString, you can cast it. Actually, it wouldn't cost much to have both marshalling functions in the low-level API, even if they both call the same Sqlite blob retrieval function. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

2007/7/27, Bayley, Alistair
It was my intention to do it the other way around: marshall blob to Ptr (), and then you can cast this to a Ptr CChar. Obviously you'd need to retain the size information, so a blob basically becomes a (Ptr (), Int) pair, just like a CStringLen...
At least this way you've got a type which says explicitly "this thing is a blob", and then if you know better, i.e. it's really a CString, you can cast it.
You're right, I'll work in this direction. Eventually I will add also the CStringLen marshalling, as it is a pretty common case (BS, FPS, PS) and it could be handy. Salvatore

I have been using Takusen with PostgreSQL to store and retrieve hundreds of multi-megabyte binary objects. A client may request literally hundred of such objects in one request; the Haskell (FastCGI) application server will send these objects in one multi-part message. The handling of the entire request is done in *constant* and small memory, at low latency and at the rate that is limited by client's network connection. The server handles hundreds of such requests without allocating memory: the Haskell server uses only one 16KB buffer for all of its I/O. Incidentally, with the exception of occasional existential and extended pattern guards, all the server code is in Haskell98. I have been using LO objects of PostgreSQL. That is not the only design choice: an alternative is to create a table where each row holds a chunk (e.g., 16K) of data as a byte array. The row will have two more columns: the blob id and the chunk ordinal counter. This design lets one incrementally write binary data (using COPYIN technique) and read data (using COPYOUT or the regular SQL queries). The COPYIN feature lets us write to the database in user-defined chunks. Alas, the converse, COPYOUT, can read only whole rows, which precludes storing all of the data in one row. If we segment the data in chunks spread across several rows, we regain incrementality. I have a hunch this method may be preferable, although I have not tried it. The drawback of LO objects is the need for frequent vacuuming, which may take really a while if many large objects are being created and deleted. The blob interface is designed to permit incremental reading and writing blobs. In fact, the server never stores the whole blob in memory. Enclosed are the implementations of store_lo and consume_lo functions. They rely on the notions of `generalized' input and output ports and the generalized copier. I've been meaning to describe them properly but don't seem to get around to it. I could refer to the comments in the file http://okmij.org/ftp/Haskell/NewerCGI.hs Frequently mentioned EMonadIO is a class of monad that permit both i/o operations and throwing _and_ catching of arbitrary errors. Most of the transformations of IO are in that class. EMonadIO lets me write gthrow, gcatch, ghandle, gbracket, etc. without even thinking in which monad I currently am. -- Read data from a LO (a kind of PostgreSQL blobs). A blob is identified -- by its Oid. We determine the size of the blob, create a generalized -- input port for reading from the blob, and pass the size and the -- port to the user function. The function will probably use BCopy -- to copy data from the blob to somewhere else. The function should -- not store the generalized input port anywhere as the port can't be -- used after the function returns. We could have enforced that with a -- marker and Typeable, as we do in Takusen. -- This function must be invoked in a transaction (it would cause a -- database error otherwise). -- We don't bracket the call to the user function as any exceptions -- are fatal anyway. consume_lo :: EMonadIO m => Connection -> Oid -> (Int -> Input -> m a) -> m a consume_lo (Connection db) oid f = do lofd <- liftIO $ check_pos "lo_open" $ flo_open db oid eINV_READ -- get the size of LO by seeking to the end, and coming back size <- liftIO $ check_pos "lo_lseek" $ flo_lseek db lofd 0 eSEEK_END liftIO $ check_pos "lo_lseek" $ flo_lseek db lofd 0 eSEEK_SET let inp = Input (\ptr len -> liftIO . liftM fromIntegral . check_pos "lo_read" $ flo_read db lofd ptr (fromIntegral len)) r <- f (fromIntegral size) inp liftIO $ check_pos "lo_close" $ flo_close db lofd return r `gcatch` \e -> print_exc e >> liftIO (closeDb db) >> gthrow e where check_pos str a = a >>= \r -> if r >= 0 then return r else throwPG r str -- Write data to a LO (a kind of PostgreSQL blobs). A blob is identified -- by its Oid. We create a generalized output port for writing to the blob, -- and pass it to the user function. The function will probably use BCopy -- to copy data to the blob from somewhere else. The function should -- not store the generalized output port anywhere as the port can't be -- used after the function returns. We could have enforced that with a -- marker and Typeable, as we do in Takusen. -- This function must be invoked in a transaction (it would cause a -- database error otherwise). -- We don't bracket the call to the user function as any exceptions -- are fatal anyway. store_lo :: EMonadIO m => Connection -> Oid -> (Output -> m a) -> m a store_lo (Connection db) oid f = do lofd <- liftIO $ check_pos "lo_open" $ flo_open db oid eINV_WRITE let outp = Output (\ptr len -> liftIO (check_size len =<< check_pos "lo_write" (flo_write db lofd ptr (fromIntegral len)))) r <- f outp liftIO $ check_pos "lo_close" $ flo_close db lofd return r `gcatch` \e -> print_exc e >> liftIO (closeDb db) >> gthrow e where check_pos str a = a >>= \r -> if r >= 0 then return r else throwPG r str check_size len lwritten = if len == fromIntegral lwritten then return () else throwPG (-1) "lo_write wrote less than expected"

Salvatore Insalaco wrote:
I noticed that in Takusen there're just two instances to implement to make any Haskell type db-serializable: DBBind / SqliteBind for serialization and DBType for deserialization.
FWIW, I have two patches lying around (attached) that I wanted to send to the Takusen maintainers anyway. They (the patches) implement (only) instance DBType Data.ByteString for Oracle and Sqlite backends. They are rudimentarily tested ("hey, seems to work!"), anyway a review might be in order because I am not sure I understand the internals good enough -- for all I know I might have introduced space leaks or whatnot. Cheers Ben
participants (4)
-
Bayley, Alistair
-
Benjamin Franksen
-
oleg@pobox.com
-
Salvatore Insalaco