Database interface - would like advice on oracle library binding

(2nd attempt; mailman thinks I'm not a list member, but it still keeps sending me mail.) Still making slow progress on an Oracle database binding... now I'm trying to fit the API I have into some sort of abstract interface (like the one(s) discussed previously: http://haskell.org/pipermail/haskell-cafe/2003-August/004957.html ). 1. Is the left-fold the best/only interface to expose? I think yes, but that doesn't allow fine-grained control over cursors i.e. being able to open many cursors at once and interleave fetches from them. Or does it? 2. I'm finding it hard to write a doQuery function that takes an extraction function that isn't a pig to write. Some advice would be useful here... (and a long-ish explanation follows): The Oracle Call Interface (OCI) requires that I allocate buffers for the result of a single row fetch, before the first row is fetched. So a query involves: - prepare statement etc - allocate buffers (one for each column - call OCI C function "DefineByPos") - fetch row - extract/marshal data from buffer into Haskell types (which are then processed by fold function) - fetch, marshal (repeat until no more rows) - free buffers i.e. the same buffers are re-used for each row. The problem for me is how to specify the left-fold function in terms of the low-level API. If I want to supply extraction functions (to extract Haskell values from result buffer), how do I manage the buffer allocation in the doQuery function? The buffer allocate/free code also needs to know the column positions and types, in the same manner as the extract functions. I want to be able to write code like this: results <- doQuery dbconn sqltext [] \row results -> do name <- stringv row 1 address <- stringv row 2 return (name,address):results .. where the stringv function extracts/marshals a Haskell String from the result buffer. The intermediate approach I currently have means I have to pass an IO action into the doQuery function that, when evaluated, allocates the buffer and returns two more actions: - an action that extracts the row as a tuple - another action that frees the buffer The doQuery function evaluates the initial action (to allocate the buffer), uses the extract action to build the result (at present a list), and when there are no more rows, uses the free action to free the buffer. This approach is quite awkward (especially w.r.t. writing extract functions), and it's hard for me to see how to build a better interface. Hard, because of the memory management requirements. -------------------- Here's a chunk of the code. A lot of it is OCI plumbing, but I hope you can see how awkward it is to create an extract function (see ex3 at the bottom). Given pointers to the buffer, extract a string of variable length (you have to terminate it yourself).
fetchStringVal :: OCIColInfo -> IO String fetchStringVal (_, bufptr, nullindptr, retsizeptr) = do retsize <- liftM cShort2Int (peek retsizeptr) nullind <- liftM cShort2Int (peek nullindptr) -- unused pokeByteOff (castPtr bufptr) retsize nullByte val <- peekCString (castPtr bufptr) return val
Free a single column's buffer.
freeColBuffer :: OCIColInfo -> IO () freeColBuffer (_, bufptr, nullindptr, retsizeptr) = do free bufptr free retsizeptr free nullindptr
Create a buffer for a string column, and return the extract and free IO actions.
getExtractFnString :: Int -> ErrorHandle -> StmtHandle -> IO (IO String, IO ()) getExtractFnString posn err stmt = do c <- defineCol err stmt posn 2000 oci_SQLT_CHR return ((fetchStringVal c), (freeColBuffer c))
doQuery uses the extractFns action to create the result buffer, and the two actions (extract and free) which are passed to doQuery2.
doQuery2 :: ErrorHandle -> StmtHandle -> IO a -> IO () -> [a] -> IO [a] doQuery2 err stmt extractData freeMem results = do rc <- fetch err stmt if rc == oci_NO_DATA then do freeMem return results else do v <- extractData doQuery2 err stmt extractData freeMem (v:results)
doQuery :: Session -> String -> (ErrorHandle -> StmtHandle -> IO (IO a, IO ())) -> IO [a] doQuery (Sess env err con) qry extractFns = do stmt <- getStmt env err prepare err stmt qry execute err con stmt (extractData, freeMem) <- extractFns err stmt doQuery2 err stmt extractData freeMem []
The interface provided by doQuery means I have to write extract functions like this. Here's one for a select that returns three String columns. It's quite awkward...
ex3 :: ErrorHandle -> StmtHandle -> IO (IO (String, String, String), IO ()) ex3 err stmt = do (fetchcol1, freecol1) <- getExtractFnString 1 err stmt (fetchcol2, freecol2) <- getExtractFnString 2 err stmt (fetchcol3, freecol3) <- getExtractFnString 3 err stmt return ( do { s1 <- fetchcol1; s2 <- fetchcol2; s3 <- fetchcol3; return (s1, s2, s3) } , do { freecol1; freecol2; freecol3 } )
***************************************************************** The information in this email and in any attachments is confidential and intended solely for the attention and use of the named addressee(s). This information may be subject to legal professional or other privilege or may otherwise be protected by work product immunity or other legal rules. It must not be disclosed to any person without our authority. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorised to and must not disclose, copy, distribute, or retain this message or any part of it. *****************************************************************

Bayley, Alistair writes: : | Still making slow progress on an Oracle database binding... now I'm trying | to fit the API I have into some sort of abstract interface (like the one(s) | discussed previously: | http://haskell.org/pipermail/haskell-cafe/2003-August/004957.html ). | | | 1. Is the left-fold the best/only interface to expose? I think yes, | but that doesn't allow fine-grained control over cursors i.e. being | able to open many cursors at once and interleave fetches from | them. Or does it? It looks like the interleaving would be limited to a nested loop structure: a cursor could be processed in full during one extraction for another cursor. Application-side nested loop structures are often a poor substitute for server-side joins. | 2. I'm finding it hard to write a doQuery function that takes an | extraction function that isn't a pig to write. Some advice would be | useful here... (and a long-ish explanation follows): : Here's my attempt to summarise the piggishness you describe: The interface to Oracle requires that you initialise a cursor by allocating a suitably typed buffer for each column prior to fetching the first row, and finalise a cursor by freeing those buffers after fetching the last row. This means that we must iterate over the columns 3 times. We would prefer to express this iteration only once, and have the other 2 happen automatically within the library. (As opposed to what ex3 does, which is to iterate for getExtractFnString, iterate for fetchcolN, and iterate for freecolN.) Here's one approach: find the OCI equivalent of JDBC's ResultSetMetaData, and use it to drive the allocation and freeing of buffers. Here's another: Add a mode attribute to the abstract data type which encompasses ErrorHandle and StmtHandle. (I'll persist in referring to that ADT as Cursor.) Expect extraction functions to be written along these lines: \cursor result -> do field1 <- getInt cursor field2 <- getString cursor field3 <- getString cursor return ((field1, field2, field3):result, True) Make getInt (and friends) behave differently depending on the mode of the cursor they're passed: either allocate a buffer and return _|_, decode and return the current column of the current row, or free a buffer and return _|_. doQuery could then apply the extraction function once in Allocate mode after opening the cursor, once per fetched row in Decode mode, and once in Free mode at the end. There's nothing to stop an extraction function from varying the number of get___ functions it applies, or trying to match their results when not in Decode mode. These weakness could be mitigated by: Pointing out that some database connection standards (JDBC, and for all I know also ODBC) don't guarantee that you can still get at a row's 1st column after you've looked at its 2nd column, i.e. there's a precedent for such fragility. Complicating the extraction functions by giving them the type (Cursor -> b -> IO (IO (b, Bool))) , expecting that all the get___ functions are applied in the outer IO layer, and undertaking that the inner IO layer will only be used in Decode mode. Regards, Tom

1. Is the left-fold the best/only interface to expose? I think yes, but that doesn't allow fine-grained control over cursors i.e. being able to open many cursors at once and interleave fetches from them. Or does it?
I'd like to remark first that many real databases let us avoid opening many cursors at once. It seems to be more efficient to do as much as possible within the engine -- using the full power of SQL or Stored Procedures -- and relay only the final results through the boundary. We can use SELECT ... UNION, correlated subqueries, SELECT rowid, self (outer) joins, etc. There are cases where SQL is powerless -- a transitive closure comes to mind. Stored procedures may sometimes help. A stored procedure may return a 'row' and thus can be used as a "generator" in Icon or Ruby's sense of the word. Still, opening several cursors may be unavoidable. The left fold approach may still help -- we _can_ mechanically invert a left fold combinator (to be precise, a half-baked version of it) into a lazy list. Please see a separate message "how to invert the left fold"
This approach is quite awkward (especially w.r.t. writing extract functions), and it's hard for me to see how to build a better interface. Hard, because of the memory management requirements.
I believe the extract functions can be constructed automatically -- similar to the way Quickcheck constructs test cases. I believe that instead of
results <- doQuery dbconn sqltext [] \row results -> do name <- stringv row 1 address <- stringv row 2 return (name,address):results
we can do results <- doQuery dbconn sqltext [] process where process:: String -> String -> [(String,String)] -> [(String,String)] process name address results = (name,address):results doQuery should be able to figure our that the function 'process' takes two arguments of type string. Therefore, doQuery needs to extract two columns from the current row, both of type string. Note, 'process' is a pure function. There is no reason for it to be monadic (if it doesn't have to be). Because doQuery is a left-fold iterator, it can keep reusing buffers until the iteration finishes. We can keep the pointers to column buffers in a polytipic list, and we can keep the corresponding deallocation actions in a list of IO (). When we prepare a statement, we create both lists. When we finish doQuery, we scan the list of destructor actions and execute them to free all the buffers. It's hard for me to write the corresponding code because I don't have Oracle handy (besides, I like Informix better). Is it possible to come up with a "stub" that uses flat files? We are only interested in fetching rows. It doesn't matter if these rows come from a file or from a database. That would make prototyping the interface quite easier.
participants (3)
-
Bayley, Alistair
-
oleg@pobox.com
-
Tom Pledger