
Hello Cafe! I'd greatly appreciate any ideas/comments on the design of the interface to the Network.HTTP library with a LazyByteString (LBS) backend. As has been discussed previously on this list [1] lazy evaluation can complicate resource management, which is especially critical if resources are seriously limited--in this case network sockets and/or file handles. In [2] Oleg shows how we can use left folds as a general mechanism to traverse collections and shows that this is superior to cursors, generators or streams. I agree with his arguments, however, I still don't see that this can be a good interface for the http library. For a start, what should the collection type be? Word8, Char8, (strict) ByteString (BS)? Unless we get some kind of fusion for the first two we're pretty much stuck with the latter. The interface could look like this (untested): import Data.ByteString (ByteString) -- i.e., strict BSs import qualified Data.ByteString as BS handleRequest :: MonadIO m => Request -> a -> (ByteString -> a -> m (Either a a)) -> m a Ignoring error handling for now, this might be implemented as: handleRequest req seed processor = do a <- iter seed bodySize closeConnection req return a where iter seed 0 = return seed iter seed n = do let c = min chunkSize n bs <- liftIO $ readBlock req c -- calls BS.hGet handle c next <- processor bs seed case next of Left rslt -> return rslt Right seed' -> iter seed' (n-c) -- insert catchError stuff here (requires signature change, of course) This should work fine and if the 'processor' function does not hold on to the chunk would run in constant space. Unfortunately, this has a big disadvantage. Most operations on the returned data will probably be stream-like functions, such as parsing the data into some kind of tree. [2] shows a method how to convert enumerators to streams, but the used stream type data MyStream m a = MyNil (Maybe a) | MyCons a (m (MyStream m a)) is incompatible with [a] which is used by lazy ByteStrings due to the embedding of m. I also don't know if fusion can work on monads. My current suggestion would therefore be a less "save"[*] solution (again, untested and modulo error handling). -- | Execute the request and call @f@ with the returned response body. -- The socket will be closed immediately after @f@ terminates. You must therefore -- make sure that any data you might want to returned has to be forced, e.g. using -- (length . take) lbs withRequest :: MonadIO => Request -> a -> (LazyByteString -> a -> m a) -> m a The implementation would lazily read the contents (implemented as described in [3]) and forcing it would be left to the function parameter. E.g. getHTML :: String -> IO HTMLParseTree getHTML addr = do r <- mkRequest addr tree <- withRequest r emptyTree parseHTML seq tree $ return tree -- ! (I'm afraid this is necessary) tricky :: LazyByteString -> String -> IO LazyByteString -- result will not really be lazy trickyt str addr = do r <- mkRequest addr withRequest r L.empty dropNTake -- we might have to force the result here where dropNTake s _ = L.take 100000 . L.drop 100000 If these 'seq's are really necessary, then this would be a pretty hard to use interface. So, any ideas / suggestions ? / Thomas [*] .. "save" in the sense that it does not enforce certain behavior by means of the type signature or API design. [1] .. http://thread.gmane.org/gmane.comp.lang.haskell.cafe/20528/focus=20635 [2] .. http://okmij.org/ftp/papers/LL3-collections-enumerators.txt [3] .. http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html -- "Remember! Everytime you say 'Web 2.0' God kills a startup!" - userfriendly.org, Jul 31, 2006