
2009/4/2
On Thu, Apr 02, 2009 at 07:55:07PM -0400, Rick R wrote:
You could profile your app for memory usage. Then you could figure out just what function is blowing up the mem usage and figure out how to optimize it.
http://book.realworldhaskell.org/read/profiling-and-optimization.html
2009/4/2
I'm relatively new to haskell so as one does, I am rewriting an existing program in haskell to help learn the language.
However, it eats up all my RAM whenever I run the program.
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3175#a3175
Obviously I'm doing something wrong, but without my magical FP pants I don't know what that might be.
I ran some profiling as suggested,
[SNIP]
total time = 8.36 secs (418 ticks @ 20 ms) total alloc = 3,882,593,720 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
line PkgDb 89.7 93.5
COST CENTRE MODULE no. entries %time %alloc %time %alloc line PkgDb 305 109771 89.7 93.3 89.7 93.3
[SNIP]
The line function is part of the file parser
line :: Parser String line = anyChar `manyTill` newline
files' :: Parser Files files' = line `manyTill` newline
Perhaps I should also explain the structure of the file. It's for a simple package manager called pkgutils, used for CRUX[1]. The file contains information for all the packages installed and is structured as follows
<package name> <package version> <file> <file> ... <file>
<package name> ...
From profiling it shows that the memory is simple consumed by reading in all the lines, the graph from using -p -hd shows an almost Ologn2 growth of the heap as the collection of lines grows.
Is there a better way to do this?
In this case the syntax of your file seems pretty simple. Someone else suggested a streaming approach. Putting those ideas together, I defined testInput as follows: testInput = "<package name>\n" ++"<package version>\n" ++"<file>\n" ++"<file>\n" ++"...\n" ++"<file>\n" ++"\n" ++"<package name>\n" ++"...\n" Here is an interactive experiment I tried: GHCi> :t lines lines :: String -> [String] GHCi> lines testInput ["<package name>","<package version>","<file>","<file>","...","<file>","","<package name>","..."] Okay, looks we can use 'lines' from the Prelude to split the input into lines. GHCi> :t cycle cycle :: [a] -> [a] GHCi> :t cycle testInput cycle testInput :: [Char] Using cycle on the testInput like this will give us an infinite input that we can use to see if we have a streaming approach. GHCi> take 10 $ lines $ cycle testInput ["<package name>","<package version>","<file>","<file>","...","<file>","","<package name>","...","<package name>"] Success. So if you like, you could use something like hGetContents that reads a file lazily, run lines over the file and get a lazy stream of the file lines. Then you can use something like takeWhile to take lines until you hit an empty line to build up your parsing functions. You could experiment with bytestrings, but in this particular application I wouldn't expect to see a huge benefit. Here I think you can run in space proportional to the longest list of files that you encounter in the input. Hope that helps, Jason