
Hi, I'm currently investigating darcs' performance issues. The first thing I looked at was ByteString.concat, which seems to allocate a lot of extra objects. Here is the standard definition:
concat :: [ByteString] -> ByteString concat [] = empty concat [ps] = ps concat xs = unsafeCreate len $ \ptr -> go xs ptr where len = P.sum . P.map length $ xs go a b | a `seq` b `seq` False = undefined go [] _ = return () go (PS p s l:ps) ptr = do withForeignPtr p $ \fp -> memcpy ptr (fp `plusPtr` s) (fromIntegral l) go ps (ptr `plusPtr` l) I had a bad feeling about (sum . map) inside len, so I replaced it (inside darcs) with a stricter-looking one:
myconcat :: [B.ByteString] -> B.ByteString myconcat [] = B.empty myconcat [ps] = ps myconcat xs = BI.unsafeCreate len $ \ptr -> go xs ptr where len = foldl' (\a b -> a + B.length b) 0 xs go a b | a `seq` b `seq` False = undefined go [] _ = return () go (BI.PS p s l:ps) ptr = do withForeignPtr p $ \fp -> BI.memcpy ptr (fp `plusPtr` s) (fromIntegral l) go ps (ptr `plusPtr` l)
Profiling the original one gives me 2,042,215,624 bytes allocated in the heap 3,131,672,496 bytes copied during GC 363,380,328 bytes maximum residency (43 sample(s)) 2,517,336 bytes maximum slop 643 MB total memory in use (50 MB lost due to fragmentation) while the second, with no other changes, gives 1,774,403,640 bytes allocated in the heap 2,656,774,296 bytes copied during GC 358,131,264 bytes maximum residency (42 sample(s)) 1,577,080 bytes maximum slop 614 MB total memory in use (50 MB lost due to fragmentation) Is it possible that the code using ByteString.concat is preventing the compiler from doing this optimisation? Thankyou, Luke Worth