[Haskell-cafe] Re: Strictness, order of IO operations: NewCGI & HDBC

newer
getting a DB library working with...

Tim Smith

9 Oct 2006 9 Oct '06

10:01 p.m.

Hello, Haskell Cafe. I posted a question a while ago about this, but didn't receive any responses. I'd like to try again. I've got a test case which uses John Goerzen's HDBC.ODBC. The problem I have is that it appears too lazy - using the results of a query after disconnecting causes an "unknown exception". If I use the results before disconnecting, it works fine. module Main where import Data.List (intersperse) import qualified Database.HDBC as DB import Database.HDBC.ODBC (connectODBC) main :: IO () main = do dbh <- connectODBC "DSN=test" res <- DB.getTables dbh -- print (show ((concat . intersperse ", ") res)) DB.disconnect dbh print (show ((concat . intersperse ", ") res)) Compiling and running this will show: $ ./db-discon db-discon: unknown exception If I uncomment the first 'print' line, then it works as expected: $ ./db-discon "\"d1, foo, odbctest\"" "\"d1, foo, odbctest\"" Am I just expecting the wrong thing from Haskell? Is there a technical reason why HDBC can't synchronize the IO so that everything is resolved before the disconnect? Or is this a bug in HDBC? Thanks, Timothy -- If you're not part of the solution, you're part of the precipitate.

Show replies by date

John Goerzen

9 Oct 9 Oct

10:13 p.m.

On Mon, Oct 09, 2006 at 04:01:02PM -0600, Tim Smith wrote:

...

main = do dbh <- connectODBC "DSN=test" res <- DB.getTables dbh -- print (show ((concat . intersperse ", ") res)) DB.disconnect dbh print (show ((concat . intersperse ", ") res))

Am I just expecting the wrong thing from Haskell? Is there a

Yes. Remember the Haskell mantra: no computation is performed before its result is demanded. Since you are not demanding the list of tables until the print statement, the code to get the list of tables is not executed until then. Actually, follow that logic through. Think about when the connection to the database is established. If I understand things properly, it won't happen until your call to disconnect, since nothing demands the handle until then. Note that this is normally not the case, since the first use of it will demand that the connection happens. I think the easiest way around this is to add this line after the call to getTables: return $ seq res res though you may also be able to say: evaluate res (provided you have imported Control.Exception) But the very best way is to simply not disconnect until after you've printed.

...

technical reason why HDBC can't synchronize the IO so that everything is resolved before the disconnect? Or is this a bug in HDBC?

It's a Feature of Haskell, not a bug. This is the same feature that lets you process infinite lists, treat multi-GB files as strings, and, in fact, treat multi-GB SQL result sets as simple lists. Haskell only loads each line of the file, or row of result, into RAM when it is demanded. (Note that some databases are less lazy than Haskell in this respect, so this only works if your database API can return partial results!) I have tried to put warnings into the HDBC docs where I think people are particularly likely to run afoul of this -- quickQuery springs to mind. Note that the API docs for getTables, at http://darcs.complete.org/hdbc/doc/Database-HDBC.html#v%3AgetTables mention that the data is returned in the same manner as fetchAllRows. Click on the link to fetchAllRows and you see: Lazily fetch all rows from an executed Statement. You can think of this as hGetContents applied to a database result set. The result of this is a lazy list, and each new row will be read, lazily, from the database as the list is processed. When you have exhausted the list, the Statement will be finished. Please note that the careless use of this function can lead to some unpleasant behavior. In particular, if you have not consumed the entire list, then attempt to finish or re-execute the statement, and then attempt to consume more elements from the list, the result will almost certainly not be what you want. But then, similar caveats apply with hGetContents. Bottom line: this is a very convenient abstraction; use it wisely. -- John

Greg Fitzgerald

20 Oct 20 Oct

7:07 p.m.

New subject: Strictness, order of IO operations: NewCGI & HDBC

Does DB.getTables use 'unsafeInterleaveIO'? I would think that if an unsafe operation was *not* used, DB.disconnectcould *not* execute before DB.getTables has returned every row. Either way, by the Principle of Least Surprisehttp://en.wikipedia.org/wiki/Principle_of_least_astonishment, I think Tim's original code ought to be made to work, despite not leveraging laziness. If you are going to tuck away an unsafeInterleaveIO, it seems reasonable that an explicit disconnect should force those deferred operations to be evaluated. Maybe the same should be done for hGetContents/hClose too? Thanks, Greg On 10/9/06, John Goerzen wrote:

...

On Mon, Oct 09, 2006 at 04:01:02PM -0600, Tim Smith wrote:

...
main = do dbh <- connectODBC "DSN=test" res <- DB.getTables dbh -- print (show ((concat . intersperse ", ") res)) DB.disconnect dbh print (show ((concat . intersperse ", ") res))

Am I just expecting the wrong thing from Haskell? Is there a

Yes. Remember the Haskell mantra: no computation is performed before its result is demanded. Since you are not demanding the list of tables until the print statement, the code to get the list of tables is not executed until then.

Actually, follow that logic through. Think about when the connection to the database is established. If I understand things properly, it won't happen until your call to disconnect, since nothing demands the handle until then. Note that this is normally not the case, since the first use of it will demand that the connection happens.

I think the easiest way around this is to add this line after the call to getTables:

return $ seq res res

though you may also be able to say:

evaluate res

(provided you have imported Control.Exception)

But the very best way is to simply not disconnect until after you've printed.

...
technical reason why HDBC can't synchronize the IO so that everything is resolved before the disconnect? Or is this a bug in HDBC?

It's a Feature of Haskell, not a bug.

This is the same feature that lets you process infinite lists, treat multi-GB files as strings, and, in fact, treat multi-GB SQL result sets as simple lists. Haskell only loads each line of the file, or row of result, into RAM when it is demanded. (Note that some databases are less lazy than Haskell in this respect, so this only works if your database API can return partial results!)

I have tried to put warnings into the HDBC docs where I think people are particularly likely to run afoul of this -- quickQuery springs to mind.

Note that the API docs for getTables, at http://darcs.complete.org/hdbc/doc/Database-HDBC.html#v%3AgetTables mention that the data is returned in the same manner as fetchAllRows. Click on the link to fetchAllRows and you see:

Lazily fetch all rows from an executed Statement.

You can think of this as hGetContents applied to a database result set.

The result of this is a lazy list, and each new row will be read, lazily, from the database as the list is processed.

When you have exhausted the list, the Statement will be finished.

Please note that the careless use of this function can lead to some unpleasant behavior. In particular, if you have not consumed the entire list, then attempt to finish or re-execute the statement, and then attempt to consume more elements from the list, the result will almost certainly not be what you want.

But then, similar caveats apply with hGetContents.

Bottom line: this is a very convenient abstraction; use it wisely.

-- John

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Brandon Moore

7:33 p.m.

New subject: Strictness, order of IO operations: NewCGI & HDBC

...

Does DB.getTables use 'unsafeInterleaveIO'?

I would think that if an unsafe operation was *not* used, DB.disconnect could *not* execute before DB.getTables has returned every row.

Either way, by the Principle of Least Surprise http://en.wikipedia.org/wiki/Principle_of_least_astonishment, I think Tim's original code ought to be made to work, despite not leveraging laziness.

If you are going to tuck away an unsafeInterleaveIO, it seems reasonable that an explicit disconnect should force those deferred operations to be evaluated. Maybe the same should be done for hGetContents/hClose too? I wonder how to arrange this. An ugly solution is to explicitly keep a

Greg Fitzgerald wrote: pointer to the next unevaluated entry, advancing it in the interleaved IO operation. A leaky solution is to keep a reference to the list, and force it all. This reminds me a lot of the selector thunk optimization, where the GC notices thunks that are just a pattern match and projection from a tuple (or other single-constructor type), and does the pattern match itself if it sees that the tuple is already evaluated. It seems a similar thing is safe with some other functions, like taking last of some list. For as much as the spine has been evaluated, the GC can reduce references to the early part of the list by unfolding the function a few steps. If a deepSeq could be treated the same way (and seq is like a degenerate case of waiting till something is evaluated but not matching on it), then it would be nice and elegant to cache a thunk of (deepSeq results) in the Handle, and force it when the handle is closed. On the other hand, it's more complicated if you also want the option of telling the DB to stop sending results if you see that the list has become garbage. I suppose you could probably sort it all out by keeping a weak reference to the deepSeq, and putting a finalizer on something else. Brandon

Greg Fitzgerald

8:30 p.m.

New subject: Strictness, order of IO operations: NewCGI & HDBC

...

An ugly solution is to explicitly keep a pointer to the next unevaluated entry, advancing it in the interleaved IO operation. A leaky solution is to keep a reference to the list, and force it all.

Another way to attack this is asking: Why doesn't the simple solution work? That is, not using unsafeInterleaveIO. It seems like any problem caused by not using unsafeInterleaveIO is fundamentally because the IO monad encapsulates orthogonal regions. Why can't I interleave fetching rows from the DB with writing each row to the console? Sure, it is *possible* that writing to the console is piped back into the same database, but if not, you're serializing things that don't need to be serialized. Thanks, Greg

John Goerzen

9 Oct 9 Oct

10:57 p.m.

I forgot to add a couple of things. First, you mentioned the unknown exception. If you were to replace the "main =" line with "main = handleSqlError $ ", then you would find a more useful error message. That will cause HDBC to automatically transform any uncaught HDBC exception into a standard Haskell I/O error. You can't do as much with it programmatically that way, but it will autoamtically be displayed in a useful fashion. Second, you could make sure you always get it right with regard to when to disconnect. For instance: main = handleSqlError $ bracket (DB.connectODBC "DSN=test") (DB.disconnect) procdb procdb dbh = do res <- DB.getTables dbh print (show ((concat . intersperse ", ") res)) bracket is defined in Control.Exception. While HDBC will automatically clean up connections when a HDBC Connection is garbage-collected, this is probably the most correct way to use it anyway, and is analogous to the example in the GHC library reference: bracket (openFile "filename" ReadMode) (hClose) (\handle -> do { ... }) It will ensure that DB.disconnect is always called, either after your code completes, or after an uncaught exception. But moreover, you isolate all of your code that requires the Connection within a single function (or functions that it calls). The Connection does not escape the bracket, so there is no danger you might use it after it has been closed. It may be a useful idiom for you if you are struggling with synchronization issues. You can also consider it for use with HDBC Statements, regular disk files, etc. Note that the following is NOT correct: bracket (connectAndReadTables) (printTables) (disconnect) -- John On Mon, Oct 09, 2006 at 04:01:02PM -0600, Tim Smith wrote:

...

"unknown exception". If I use the results before disconnecting, it works fine.

module Main where

import Data.List (intersperse) import qualified Database.HDBC as DB import Database.HDBC.ODBC (connectODBC)

main :: IO () main = do dbh <- connectODBC "DSN=test" res <- DB.getTables dbh -- print (show ((concat . intersperse ", ") res)) DB.disconnect dbh print (show ((concat . intersperse ", ") res))

Compiling and running this will show:

$ ./db-discon db-discon: unknown exception

If I uncomment the first 'print' line, then it works as expected:

$ ./db-discon "\"d1, foo, odbctest\"" "\"d1, foo, odbctest\""

Am I just expecting the wrong thing from Haskell? Is there a technical reason why HDBC can't synchronize the IO so that everything is resolved before the disconnect? Or is this a bug in HDBC?

Thanks,

Timothy -- If you're not part of the solution, you're part of the precipitate.

-- John Goerzen Author, Foundations of Python Network Programming http://www.amazon.com/exec/obidos/tg/detail/-/1590593715

Tim Smith

20 Oct 20 Oct

9:42 p.m.

John, Café patrons, Thanks a lot for your detailed reply. Sorry it's taken me so long to respond, but I wanted to test some things out first. I've spent many a dark hour thinking about this, and now come crawling back to the café for some help. The folks on #haskell helped some, but eventually gave up and asked me to try here instead. I can make this work with bracket (as you suggested in your second reply), if I want to use the database handle in the IO monad. I.e., if I want the meat of my program to be something like: -- See working program at: http://paste.lisp.org/display/28273 proc :: Connection -> IO a proc dbh = ... But I want to use the database handle inside the CGI monad, so I can query the database, process the CGI results, and perhaps return HTML output or redirect to a different URL, etc. I can do that if I avoid bracket, and use deepSeq to force the results (as you suggested in your first reply - although 'evaluate' and 'seq' weren't strong enough): -- See working program at: http://paste.lisp.org/display/28273#4 ... dbResult <- DB.handleSqlError action dbResult `deepSeq` return dbResult But I can't combine those two techniques. I tried breaking it down by writing: -- See code & compiler errors: http://paste.lisp.org/display/28273#4 -- Copied from GHC's definition of bracket myBracket :: IO a -> (a -> IO b) -> (a -> CGI.CGI c) -> CGI.CGI c myBracket acquire destroy use = I can't manage to resolve the type conflicts. It was suggested that I should just forget about calling disconnect. If the HDBC semantics are lazy IO, then HDBC should be responsible for closing the connection when it's no longer needed, they say. Practically, I guess it will work. But theoretically it bugs the hell out of me that I can't figure out how to combine these monads properly. I've been reading through All About Monads and other resources, and have the School of Expression book as well, but haven't gotten it sorted. It was also suggested that I might use MonadError instead of bracket. I haven't explored that yet. Has anyone found out how to lift bracket into another monad? In order to hopefully make it easier for someone to respond, I've made a test program which has the same structure, but uses StateT instead of CGI, and simple IO instead of HDBC. A solution which applies to this test program should apply directly to my CGI program as well. http://www.magnesium.net/~thim/tmp/transbracket.hs Thank you, Tim -- If you're not part of the solution, you're part of the precipitate.

6847

Age (days ago)

6858

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Brandon Moore
Greg Fitzgerald
John Goerzen
Tim Smith