puzzling memory leak? (GHC)

I'm seeing a very puzzling behavior that would be considered a memory leak, or unwanted retention of objects, in other languages. If I've made a newbie mistake, could someone point out what I am doing wrong? The process size (resident size and not just virtual size) of my program just grows and grows, passing a few hundred MB's in just 30 seconds, until I'm forced to kill the process. I get the same problem with GHC 6.4 and 6.4.1 on Solaris 2.8 and MacOS X 10.4.2, and GHC 6.2.2 on Solaris 2.8. Here's what the program does. It reads a binary file containing a large number of varying-length structured records. Some of these records are corrupt (e.g., truncated), and it is the job of my program to write out a new file with corrupted records filtered out. I decided to take advantage of Haskell's lazy evaluation to separate out the task of input parsing from the task of filtering as follows: import qualified Data.Map as Map main = do contents <- getContents fixArts $ parseArts contents [] (Map.singleton "offset" 0) The invoked functions have the following signatures: parseArts :: [Char] -> [Char] -> HeaderMap -> [Either ParseError Arts] fixArts :: [Either ParseError Arts] -> IO () where type HeaderMap = Map.Map String Word data Arts = Arts { ar_version::Int, ar_flags::Word, ar_data_length::Int, ar_timestamp::Word, ar_src_ip::Word, ar_dst_ip::Word, ar_list_id::Word, ar_cycle_id::Word, ar_rtt::Word, ar_rtt_mod::Word, ar_hop_distance::Int, ar_dst_replied::Bool, ar_halt_reason::Int, ar_halt_data::Int, ar_reply_ttl::Int, ar_hops::[ArtsHop], ar_offset::Int, ar_bytes:: [Char] } data ArtsHop = ArtsHop { ar_hop_num::Int, ar_hop_addr::Word, ar_irtt::Word, ar_num_tries::Int } data ParseError = ParseError { pe_message::String, pe_offset::Int, pe_bytes::[Char] } Here's the body of fixArts: fixArts ((Left x):xs) = do hPutChar stderr '\n' hPutStrLn stderr $ "Parse error: " ++ pe_message x hPutHexDump stderr bytes offset fixArts xs where bytes = pe_bytes x offset = pe_offset x {-- XXX normally: do putStr (ar_bytes x); fixArts xs --} fixArts ((Right x):xs) = fixArts xs fixArts [] = return () -------------------------------- The function hPutHexDump simply prints out a hex dump of a [Char] (and commenting out the call makes no difference). Unless I'm badly mistaken, nothing about fixArts suggests that it would leak memory. So, it would appear parseArts is somehow to blame for the memory leak, but this is where it gets weird. If I just slightly change fixArts (see below), then I no longer get a memory leak (the process size stays at just over 3MB): fixArts ((Right x):xs) = do hPutArts stderr x fixArts xs The function hPutArts :: Handle -> Arts -> IO () simply prints out the fields of the Arts object. So why should this change fix the memory leak if parseArts is truly at fault? The function parseArts has the following basic form (though the actual implementation is much more involved): parseArts (x:xs) ... = (createArts x) : parseArts xs So, even without seeing all of the code, does anyone have any clues about what may be wrong? --Young

Young, This sounds like a text-book space-leak caused by lazy evaluation. In a lazy language, function evaluation is driven by the need to perform IO. Because the first version of your program never prints the parsed structure, the parser doesn't completely parse it. This implies that the system needs to hang onto all the input data (as well as all the partially evaluated calls to the parser) incase it needs it later on. The default string representation is Haskell is pretty abysmal, and having it use hundreds of megs to represent, say a 10 meg file is not too surprising. By modifying your fixArts function to print the parsed structure you've forced the system to actually finish evaluating the parser, which allows the input data to be garbage collected. My advice is that if you don't want to fool around with it, just swallow hard, then change fixArts to do a hPutArts to /dev/null. Either that or 1) go read about the DeepSeq library. 2) add strictness annotations to your structure definition. Ben.
fixArts ((Right x):xs) = do hPutArts stderr x fixArts xs

On Oct 9, 2005, at 11:32 PM, Ben Lippmeier wrote:
This sounds like a text-book space-leak caused by lazy evaluation.
In a lazy language, function evaluation is driven by the need to perform IO. Because the first version of your program never prints the parsed structure, the parser doesn't completely parse it. This implies that the system needs to hang onto all the input data (as well as all the partially evaluated calls to the parser) incase it needs it later on.
I naively believed (or hoped) that, so long as I the programmer took some care, the evaluator would be smart enough to discard unaccessed computations and values and thus avoid a space leak. Hence the thing I worried about was explicit references to data structures (say, the head of very long lists) that prevent objects from being reclaimed. Now it seems I need to also worry about the hidden aspects of the internal implementation of lazy evaluation. What intuition should be my guide about the "proper way" of employing lazy evaluation? It's not yet clear to me what you mean by "the system needs to hang onto all the input data ...". It seems counterintuitive for the evaluator to hang onto partially evaluated functions and input data when it can be proven (through static analysis) that references to that data can no longer exist in the remainder of the program execution; for example, consider case head $ drop 500000 $ parseArts ... of Right x -> hPutArts stderr x in which the first 500,000 elements in the result of parseArts can't ever be referenced after 'drop' executes, and yet the space leak still happens (presumably while trying to skip over those 500,000 elements). Have I understood you correctly that this is the unfortunate behavior of lazy evaluation?
The default string representation is Haskell is pretty abysmal, and having it use hundreds of megs to represent, say a 10 meg file is not too surprising.
Each file I process is 70MB, so inefficient representation could be a problem if input data is buffered.
My advice is that if you don't want to fool around with it, just swallow hard, then change fixArts to do a hPutArts to /dev/null.
Either that or 1) go read about the DeepSeq library. 2) add strictness annotations to your structure definition.
Thanks for the advice; I'll look into both. --Young

Young Hyun wrote:
Here's what the program does. It reads a binary file containing a large number of varying-length structured records. [...]
Unless I'm badly mistaken, nothing about fixArts suggests that it would leak memory. So, it would appear parseArts is somehow to blame for the memory leak, but this is where it gets weird. If I just slightly change fixArts (see below), then I no longer get a memory leak (the process size stays at just over 3MB):
fixArts ((Right x):xs) = do hPutArts stderr x fixArts xs
Most likely you think, your parser is building small datat structures, when in reality it builds large and complicated suspended computations, that never run and just take up space. As soon as you demand their result (hPutArts does this), they run and free up the space. GHC itself cannot help you much. Maybe it *could* determine that all these results are really demanded eventually and it won't hurt to compute them earlier, but in practice this is too hard to do. You'll need to give some hints. You should change the fields in your record to strict types (!Integer instead of Integer) and you should force evaluation of these structures by placing a `seq' at the appropriate place. Unfortunately, finding this place is not easy. Probably the best way is to simulate lazy evaluation in your head and see where it goes wrong.
parseArts (x:xs) ... = (createArts x) : parseArts xs
In this code the result of createArts is not demanded and simply put into a potentially large data structure. It might help to change it to the following: parseArts (x:xs) ... = ((:) $! createArts x) parseArts xs Udo. -- mitten in einer Diskussion über den "Evil Mangler" in GHC 5.04: <shapr> the evil mangler uses *perl* ?? <ChilliX> yes... <ChilliX> it is Evil after all

Hi Young, Just to give a somewhat different perspective, sometimes increasing laziness is helpful for avoiding space problems--not really so much space "leaks" as space wastage. In this case, you probably can't afford to hold "contents" in memory at any given time. So you need to be certain both that as it is consumed it is no longer needed (which is reflected in the strictness suggestions given by others), but you also (most likely) don't want to hold the entire list of Arts in memory either, since that'll also take a huge amount of memory. That requires that parseArts generates its output lazily and that fixArts consume it properly. Both of those appear to be the case from the code you outlined, but I figured I'd point out that strictness in the wrong circumstances can be as bad as laziness for your memory usage. Assuming the following is correct...
parseArts (x:xs) ... = (createArts x) : parseArts xs
then it would be worth checking that
main = do contents <- getContents fixArts [head $ parseArts contents [] (Map.singleton "offset" 0)]
takes very little memory. If this takes a large amount of memory, then I think you may have a problem with parseArts being insufficiently lazy. -- David Roundy http://www.darcs.net
participants (4)
-
Ben Lippmeier
-
David Roundy
-
Udo Stenzel
-
Young Hyun