Space leak with Data.Binary and decodeFile

Hello all, I've been observing a huge space leak with some code using Data.Binary that I cannot make sense of and I hope someone here can shed some light on this, so I'll try to explain my problem as clearly as possible. I qualify the space leak as huge because if I let the program run, it will soon consume the whole memory available (~3G) and finally will get killed by the system. The code I'm writing implements a search algorithm using an inverted index. This index is built from a Trie [Int] (from the bytestring-trie package) and an Array Int ByteString. The trie maps each referenced word to an integer list that is a list of indices into the array. Here is the code for the Index datatype and the obvious Binary instance: data Index = Index { entries :: Array Int ByteString , invidx :: Trie [Int } instance Binary Index where put (Index dirs idx) = put dirs >> put idx get = liftM2 get get I have no problems creating and seralizing this data structure to a file. The huge leak appears instead when I'm reading this data structure from a file and try to do something with it. This is the smallest test case I came up with that can reproduce the problem : main = do idx <- decodeFile "list.idx"; mapM_ (B.putStrLn . snd) (assocs (entries idx)) The space leak also appears when I try to touch the trie instead of the array. I've been trying tons of combinations involving adding or removing strictness annotations and seq calls in various places with no luck. I have also been adding SCC annotations and tried to profile the code. This seemed to suggest the space leak happens in the get method of the Array instance of Binary : instance (Binary i, Ix i, Binary e) => Binary (Array i e) where [...] get = do bs <- get n <- get -- read the length xs <- replicateM n get -- now the elems. return (listArray bs xs) The output of the profiler tells me that all the space gets allocated from the "replicateM n get" expression. Now for the really weird part: if I load my code in GHCi and type "main", I can observe the space leak. However, if I copy paste the definition of main instead, the code runs fine! This is the only circumstance I've seen this code work instead of eating all the RAM... I have been using GHC 6.10.1, binary 0.4.4 and bytestring-trie 0.1.2. If there's anything else that I can do to understand what's going on, I would gladfully hear about it. Please also tell me if I should provide more information. Thanks, -- Maxime Henrion
participants (1)
-
Maxime Henrion