Space leak in WAI 3.0 application

19 Nov 2014

      Hello list,

I am quite new to Haskell and I love the language and community, but I
am frustrated by a space leak in a WAI 3.0 Application that for now
just echoes the request entity back in the response.
Specifically, I am having trouble understanding *why* I have the space
leak.

I intend to pass some or all of the request entity on to another web
service whose response will influence the HTTP status code and headers
of my service's response. At the moment, I am preparing the request
entity as a lazy bytestring using lazy I/O just like
Data.ByteString.Lazy.hGetContents does, at least until I can get around
to learning pipes or conduit.

When I use this technique to echo the request entity back in the
response, it looks like two copies of the entire request entity are
being accumulated in memory, presumably the original from the request
and a copy for the response. The heap profile says it is all in
"PINNED", which I am assuming are the bytestring buffers.

However, the efficacy of this technique turns out to be irrelevant as I
have been able to distill the problem down to a much simpler example: a
WAI Application that responds with 100 MB of zeros read from /dev/zero
using Data.ByteString.Lazy.hGetContents.

I have prepared two variations that differ only in the composition of
the operations.
Both applications create identical responses, but version A accumulates
the entire 100 MB entity in memory (heap profile shows a huge "PINNED"
cost just over 100 MB in size) whereas version B streams the entity in
constant space.

Source code and a cabal file follow. I am using GHC 7.8.3. It is not
necessary to make heap profiles - the symptoms are evident with
"+RTS -s".

==== BEGIN Zeros.hs ====

{-# LANGUAGE OverloadedStrings #-}

import Blaze.ByteString.Builder
import Control.Concurrent
import qualified Data.ByteString.Lazy as LBS
import Data.Int
import Network.HTTP.Types
import Network.Wai
import Network.Wai.Handler.Warp
import System.IO

-- | This version binds the large LBS of zeros /outside/ of the
-- 'responseStream' body lambda.
--
-- This version has the space leak.
--
-- @
-- curl -v -o \/dev\/null localhost:3000\/zeros\/a
-- @
zerosAppA :: Application
zerosAppA _req respond =
  withZeros 100000000 $ \ largeLBS ->
    respond $ responseStream status200 [] $ \ write _flush ->
      write $ fromLazyByteString largeLBS

-- | This version binds the large LBS of zeros /inside/ of the
-- 'responseStream' body lambda.
--
-- This version streams the response entity in constant space.
--
-- @
-- curl -v -o \/dev\/null localhost:3000\/zeros\/b
-- @
zerosAppB :: Application
zerosAppB _req respond =
  respond $ responseStream status200 [] $ \ write _flush ->
    withZeros 100000000 $ \ largeLBS ->
      write $ fromLazyByteString largeLBS

-- | Do something with /n/ bytes read lazily from @\/dev\/zero@.
--
-- This part is common to both 'zerosAppA' and 'zerosAppB'.
withZeros :: Int64 -> (LBS.ByteString -> IO a) -> IO a
withZeros n f =
  withBinaryFile "/dev/zero" ReadMode $ \ h -> do
    zeros <- LBS.hGetContents h
    let largeLBS = LBS.take n zeros
    f largeLBS

main :: IO ()
main = do
  _ <- forkIO $ run 3000 app
  putStrLn "Using port 3000. Press ENTER to exit..."
  _ <- getLine
  putStrLn "Exit."

app :: Application
app req respond =
  case pathInfo req of
    ["zeros", "a"] -> zerosAppA req respond
    ["zeros", "b"] -> zerosAppB req respond
    _              -> respond $ responseLBS status404 [] "Not found."

==== END Zeros.hs ====

==== BEGIN zeros.cabal ====

name:                 zeros
version:              0.1.0.0
build-type:           Simple
cabal-version:        >=1.10

executable zeros
  main-is:            Zeros.hs
  build-depends:      base          >=4.7 && <4.8,
                      blaze-builder ==0.3.3.4,
                      bytestring    ==0.10.4.0,
                      http-types    ==0.8.5,
                      wai           ==3.0.2,
                      warp          ==3.0.2.3
  default-language:   Haskell2010
  ghc-options:        -Wall -rtsopts

==== END zeros.cabal ====

Why does version A not process the LBS in constant space?

What in version A is preventing the GC from collecting the LBS chunks
after they have been fed to Warp?

What is it about version B that permits the LBS chunks to be collected?

Although I believe the issue is not actually specific to WAI or Warp,
I am unable to reproduce the space leak without them. But because I am
new to Haskell, I suspect I have missed something obvious about lambda
bindings, laziness (or strictness) of IO, and GC.

Thanks.

--
Thomas Koster

Thomas Koster

Gregory Collins

Thomas Koster

Bardur Arantsson

Thomas Koster

Bardur Arantsson

tags

participants (3)