
Hello list, I am quite new to Haskell and I love the language and community, but I am frustrated by a space leak in a WAI 3.0 Application that for now just echoes the request entity back in the response. Specifically, I am having trouble understanding *why* I have the space leak. I intend to pass some or all of the request entity on to another web service whose response will influence the HTTP status code and headers of my service's response. At the moment, I am preparing the request entity as a lazy bytestring using lazy I/O just like Data.ByteString.Lazy.hGetContents does, at least until I can get around to learning pipes or conduit. When I use this technique to echo the request entity back in the response, it looks like two copies of the entire request entity are being accumulated in memory, presumably the original from the request and a copy for the response. The heap profile says it is all in "PINNED", which I am assuming are the bytestring buffers. However, the efficacy of this technique turns out to be irrelevant as I have been able to distill the problem down to a much simpler example: a WAI Application that responds with 100 MB of zeros read from /dev/zero using Data.ByteString.Lazy.hGetContents. I have prepared two variations that differ only in the composition of the operations. Both applications create identical responses, but version A accumulates the entire 100 MB entity in memory (heap profile shows a huge "PINNED" cost just over 100 MB in size) whereas version B streams the entity in constant space. Source code and a cabal file follow. I am using GHC 7.8.3. It is not necessary to make heap profiles - the symptoms are evident with "+RTS -s". ==== BEGIN Zeros.hs ==== {-# LANGUAGE OverloadedStrings #-} import Blaze.ByteString.Builder import Control.Concurrent import qualified Data.ByteString.Lazy as LBS import Data.Int import Network.HTTP.Types import Network.Wai import Network.Wai.Handler.Warp import System.IO -- | This version binds the large LBS of zeros /outside/ of the -- 'responseStream' body lambda. -- -- This version has the space leak. -- -- @ -- curl -v -o \/dev\/null localhost:3000\/zeros\/a -- @ zerosAppA :: Application zerosAppA _req respond = withZeros 100000000 $ \ largeLBS -> respond $ responseStream status200 [] $ \ write _flush -> write $ fromLazyByteString largeLBS -- | This version binds the large LBS of zeros /inside/ of the -- 'responseStream' body lambda. -- -- This version streams the response entity in constant space. -- -- @ -- curl -v -o \/dev\/null localhost:3000\/zeros\/b -- @ zerosAppB :: Application zerosAppB _req respond = respond $ responseStream status200 [] $ \ write _flush -> withZeros 100000000 $ \ largeLBS -> write $ fromLazyByteString largeLBS -- | Do something with /n/ bytes read lazily from @\/dev\/zero@. -- -- This part is common to both 'zerosAppA' and 'zerosAppB'. withZeros :: Int64 -> (LBS.ByteString -> IO a) -> IO a withZeros n f = withBinaryFile "/dev/zero" ReadMode $ \ h -> do zeros <- LBS.hGetContents h let largeLBS = LBS.take n zeros f largeLBS main :: IO () main = do _ <- forkIO $ run 3000 app putStrLn "Using port 3000. Press ENTER to exit..." _ <- getLine putStrLn "Exit." app :: Application app req respond = case pathInfo req of ["zeros", "a"] -> zerosAppA req respond ["zeros", "b"] -> zerosAppB req respond _ -> respond $ responseLBS status404 [] "Not found." ==== END Zeros.hs ==== ==== BEGIN zeros.cabal ==== name: zeros version: 0.1.0.0 build-type: Simple cabal-version: >=1.10 executable zeros main-is: Zeros.hs build-depends: base >=4.7 && <4.8, blaze-builder ==0.3.3.4, bytestring ==0.10.4.0, http-types ==0.8.5, wai ==3.0.2, warp ==3.0.2.3 default-language: Haskell2010 ghc-options: -Wall -rtsopts ==== END zeros.cabal ==== Why does version A not process the LBS in constant space? What in version A is preventing the GC from collecting the LBS chunks after they have been fed to Warp? What is it about version B that permits the LBS chunks to be collected? Although I believe the issue is not actually specific to WAI or Warp, I am unable to reproduce the space leak without them. But because I am new to Haskell, I suspect I have missed something obvious about lambda bindings, laziness (or strictness) of IO, and GC. Thanks. -- Thomas Koster