
Oh, What a relief! Thank you for your clear explanation! --- Daniel Fischer wrote:
Am Montag 15 Februar 2010 16:44:51 schrieb Uchida Yasuo:
Hello,
I came across the following space leak problem today. How can I fix this? (Tested on Mac OS X 10.5.8, GHC 6.10.3)
-- test.hs module Main where
import System import qualified Data.ByteString.Lazy.Char8 as L
main = do args <- getArgs let n = read $ args !! 0 cs <- L.getContents let !a = L.take n cs
The problem is this. The Bang pattern does less than you probably think. The definition of lazy ByteStrings is
data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString
, so when you write
let !a = L.take n cs
, you force the constructor (null cs ? Empty : Chunk start rest), Since cs is not empty, it's Chunk, and that forces the first part of the ByteString, which will be as long as the prefix which stdin immediately delivers, but at most the default chunk size (32K or 64K, normally [minus two words for bookkeeping]).
If n is larger than a) the default chunk size or b) what L.getContents got immediately[*], a holds on to the (almost) entire input and you have a bad memory leak.
Fix: force a to be completely evaluated, e.g.
let !a = L.take n cs !l = L.length a
By evaluating the length, a doesn't keep references to cs and all can be garbage collected.
[*] how long the first chunk is, depends in this pipeline on scheduling, number of available cores/CPUs, OS buffer size.
__ Regards, Yasuo Uchida