
Hello haskell-cafe, I'm running into what I think is a space leak. I've modified external-sort-0.2 [0] to use tournament trees for merging, and although the block sorting works in constant space, merging seems to produce a space leak. The external sort works by lazily consuming an input list, chopping it up into fixed size blocks, sorting each block in memory and writing each sorted block out to disk. At this point we have a big, on-disk array of k sorted blocks. To produce the final, sorted output list, we lazily read back each block into a tournament tree (a heap of sorts), which should only compare the first element of block to figure out the tree root. The merging algorithm in full is in `tourMerge' [1]. I've confirmed by profiling (with -hc) that the merge step gobbles up memory. I've attached a self-contained program (two haskell source files). It requires binary, mersenne-random*, and ArrayRef from hackage. $ tar xvzf leak.tar.gz leak.hs Data/TournamentTree.hs $ ghc -O --make leak.hs $ ./leak 5000000 # generate 5 million entries in ./entries.tmp, sort them, and output to ./leak.out Writing 5000000 entries to './entries.tmp' Done. Henceforth, if you omit the number argument to leak it will sort './entries.tmp'. Sorting 5000000 entries from './entries.tmp' External block sort using 1000 blocks: -- 5000 entries per block -- 240000 bytes per block 1000--blocks sorted. Merging blocks and writing sorted entries to './leak.out' Depending on how much RAM you have, you may need to tweak the argument to leak and/or numBlocks (at the top of leak.hs) -- a block needs to be able to fit in RAM. If the argument to leak is too low, the sort will succeed and the leak won't be apparent. Otherwise, the OS will kill the process because it allocated too much. Anyone have an idea how I can fix the leak? Denis *Only required to generate the initial input file to sort; it's much faster than System.Random. [0] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/external-sort-0.2 [1] tourMerge is in leak.hs, but I've also inlined it here for convenience. TT is the Data.TournamentTree module, included in the tarball I've attached. tourMerge :: (Ord a) => [[a]] -> [a] tourMerge [] = [] tourMerge blks = let t = {-SCC "TMfromList" #-} TT.fromList blks in tM (TT.minElem t) (TT.deleteMinF t) where tM :: (Ord a) => [a] -> TT.Forest [a] -> [a] tM l f | null f = l | otherwise = let t = TT.tournament f next = TT.minElem t (x, xs) = {-# SCC "TTspan" #-} span (<= head next) l in x ++ (tM next (if null xs then TT.deleteMinF t else (TT.singleton xs : TT.deleteMinF t)))