
On Wed, Mar 23, 2011 at 9:32 AM, Tim Docker
Productivity 78.2% of total user, 76.5% of total elapsed
As a rule of thumb GC time should be less than 10%.
This seems strange. The maximum residency of 12MB sounds about correct for my data. But what's with the 59MB of "slop"? According to the ghc docs:
| The "bytes maximum slop" tells you the most space that is ever wasted | due to the way GHC allocates memory in blocks. Slop is memory at the | end of a block that was wasted. There's no way to control this; we | just like to see how much memory is being lost this way.
GHC requests memory from the OS in large blocks. This makes GC more efficient. The program might not end up using all the allocated memory in the end.
type MyMap = Map.Map BS.ByteString BS.ByteString
Try using HashMap from the unordered-collections package. It's typically 2-3x faster than Map for ByteString/Text keys.
foldLines :: (a -> String -> a) -> a -> Handle -> IO a foldLines f a h = do eof <- hIsEOF h if eof then (return a) else do l <- hGetLine h let a' = f a l a' `seq` foldLines f a' h
Your foldLines is not strict enough. Consider what happens if you call foldLines someF undefined someHandle when the file is empty. If foldLines was strict in the accumulator you'd expect the program to crash (from evaluating undefined), but it doesn't as 'return a' doesn't force 'a'. In addition, you'd like GHC to inline foldLines so the indirect function call to 'f' can be turned to a call to a known function. Here's a better definition: foldLines :: (a -> String -> a) -> a -> Handle -> IO a foldLines f a0 !h = go a0 where go !a = do eof <- hIsEOF h if eof then (return a) else do l <- hGetLine h go (f a l) {-# INLINE foldLines #-} Also, as others have mentioned, String is no good. Use ByteString and Text. Both come with functions to read lines (if I recall correctly). Johan