
Am Montag 15 Februar 2010 16:44:51 schrieb Uchida Yasuo:
Hello,
I came across the following space leak problem today. How can I fix this? (Tested on Mac OS X 10.5.8, GHC 6.10.3)
-- test.hs module Main where
import System import qualified Data.ByteString.Lazy.Char8 as L
main = do args <- getArgs let n = read $ args !! 0 cs <- L.getContents let !a = L.take n cs
The problem is this. The Bang pattern does less than you probably think. The definition of lazy ByteStrings is data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString , so when you write let !a = L.take n cs , you force the constructor (null cs ? Empty : Chunk start rest), Since cs is not empty, it's Chunk, and that forces the first part of the ByteString, which will be as long as the prefix which stdin immediately delivers, but at most the default chunk size (32K or 64K, normally [minus two words for bookkeeping]). If n is larger than a) the default chunk size or b) what L.getContents got immediately[*], a holds on to the (almost) entire input and you have a bad memory leak. Fix: force a to be completely evaluated, e.g. let !a = L.take n cs !l = L.length a By evaluating the length, a doesn't keep references to cs and all can be garbage collected. [*] how long the first chunk is, depends in this pipeline on scheduling, number of available cores/CPUs, OS buffer size.
mapM_ (print . L.length) $ L.lines cs print a
-- gen.hs module Main where
main = do putStrLn $ take 1000000 $ cycle "foo" main
These are compiled with the following options:
$ ghc --make -O2 test $ ghc --make -O2 gen
The memory usage seems to depend on the argument(=17000) passed. On my MacBook(Core2 Duo 2.0GHz), 16000 works fine.