Re: [Haskell-cafe] Re: Performance Tuning & darcs (a real shootout?)

24 Jan 2006


      On Jan 24, 2006, at 1:55 AM, Simon Marlow wrote:
...
You can get a quick picture of heap usage with +RTS -Sstderr, by  
the way.  To find out what's actually in that heap, you'll need  
heap profiling (as you know).
[snip]
...
Yes, GHC's heap is mmap()'d anonymously.  You really need to find  
out whether the space leak is mmap()'d by GHC's runtime, or by  
darcs itself - +RTS -Sstderr or profiling will tell you about GHC's  
memory usage.
Ah, I had been using little s, but I forgot about the existence of  
big S.  I'll try to include some profiles and the knowledge gained by  
using it.  I wish I could work on that right now but chances are it  
will be Monday or Tuesday before I get to look at it again.
...
I'd start by using heap profiling to track down what the space leak  
consists of, and hopefully to give you enough information to  
diagnose it.  Let's see some heap profiles!
Yes!
...
Presumably the space leak is just as visible with smaller patches,  
so you don't need the full 300M patch to investigate it.
This is true, I've had problems with even 30mb patches.  I guess I  
liked using the 300mb patch because it emphasized and exaggerated the  
performance and often I left the profile running on one machine while  
I went off studying the code on another.  But, it's a good suggestion  
for when I want to be able to iterate or get my results sooner.
...
I don't usually resort to -ddump-simpl until I'm optimising the  
inner loop, use profiling to find out where the inner loops  
actually *are* first.
Point taken.
...
...
Are there tools or techniques that can help me understand why the  
memory consumption peaks when applying a patch?  Is it foolish to  
think that lazy evaluation is the right approach?
Since you asked, I've never been that keen on mixing laziness and I/ 
O. Your experiences have strengthened that conviction - if you want  
strict control over resource usage, laziness is always going to be  
problematic.  Sure it's great if you can get it right, the code is  
shorter and runs in small constant space.  But can you guarantee  
that it'll still have the same memory behaviour with the next  
version of the compiler?  With a different compiler?
And I've heard others say that laziness adds enough unpredictability  
that it makes optimizing just that much trickier.  I guess this may  
be one of the cases where the "trickiness" outweighs the elegance.
...
...
I'm looking for advice or help in optimizing darcs in this case.   
I guess this could be viewed as a challenge for people that felt  
like the micro benchmarks of the shootout were unfair to Haskell.   
Can we demonstrate that Haskell provides good performance in the  
real-world when working with large files?  Ideally, darcs could  
easily work with a patch that is 10GB in size using only a few  
megs of ram if need be and doing so in about the time it takes  
read the file once or twice and gzip it.
I'd love to help you look into it, but I don't really have the  
time. I'm happy to help out with advice where possible, though.
Several people have spoken up and said, "I'd help but I'm busy"  
including droundy himself.  This is fine, when I said "help" I was  
thinking of advice like you gave.  It was a poor choice of phrasing  
on my part.  I can work ghc and stare at lines of code, but sometimes  
I need guidance since I'm mostly out of my league in this case.

Thanks,
Jason

Re: [Haskell-cafe] Re: Performance Tuning & darcs (a real shootout?)

Jason Dagit