[Haskell-cafe] Re: GHC predictability

13 May 2008

      Don Stewart wrote:
...
...
Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
But you know why, don't you?
What I'm trying to say [and saying very badly] is that Haskell is an 
almost terrifyingly subtle language. Seemingly insignificant chages can 
have drastic consequences. (Ever tried to run a program and find it goes 
into an infinite loop or a deadlock because you accidentally made some 
function slightly stricter than it needs to be?) I can't find it right 
now, but some paper I was reading showed two example programs, one of 
which is dramatically less performant than the other, and yet they look 
like they should be exactly equivilent, "and one might even expect a 
good compiler to 'optimise' one to the other [in the wrong direction]". 
After studying that chapter for about half an hour, I *still* can't wrap 
my brain around what the difference is. But I assume SPJ knows what he's 
talking about.
...
...
sat down and spent the best part of a day writing an MD5 
implementation. Eventually I got it so that all the test vectors work 
right. (Stupid little-endian nonsense... mutter mutter...) When I tried 
it on a file containing more than 1 MB of data... ooooohhhh dear...
Did you use Data.Binary or the other libraries for efficient access to gigabytes of data?
Data.ByteString for all I/O. The rest is just moving data around one 
byte at a time. Difficult to see how you'd do it any different. [I'll 
check to see if I still have the code, maybe somebody can do something 
to it.]

And we're not talking about "gigabytes" of data - we're talking about 
the program completely choking on a 5 MB file. Even plain String I/O 
couldn't possibly take *minutes* to read 5 MB!
...
...
Of course, the first step in any serious attempt at performance improvement
is to actually profile the code to figure out where the time 
is being spent.
Well, actually, for our 'mean()' case, it means just using the right structures.
Arrays for example:
Doesn't that mean all the data has to be present in memory at once?

I would expect [optimistically?] that if you have a definition for 
"mean" that performs only 1 traversal of the list, you'd end up with 
something approximating

  x = 0
  for n = 1 to 1e9
    x = x + n
  return x

If you allocate an array, you're limited to what will fit into RAM at once.
...
(The compiler is optimising this to:
Main_zdszdwfold_info:
      leaq        32(%r12), %rax
      cmpq        %r15, %rax
      movq        %rax, %r12
      ja  .L10
      movsd       .LC0(%rip), %xmm0
      ucomisd     %xmm5, %xmm0
      jae .L12
      movq        %rsi, (%rax)
      movq        $base_GHCziFloat_Dzh_con_info, -24(%rax)
      movsd       %xmm6, -16(%rax)
      movq        $base_GHCziBase_Izh_con_info, -8(%rax)
      leaq        -7(%rax), %rbx
      leaq        -23(%rax), %rsi
      jmp *(%rbp)
    .L12:
      movapd      %xmm6, %xmm0
      addq        $1, %rsi
      subq        $32, %r12
      addsd       %xmm5, %xmm0
      addsd       .LC2(%rip), %xmm5
      movapd      %xmm0, %xmm6
      jmp Main_zdszdwfold_info
Note even any garbage collection. This should be pretty much the same
performance-wise as unoptimised C.
Uh... I can't actually read x86 assembly, but I'll take your word. ;-) I 
guess it was kind of a bad example.
...
...
almost any nontrivial program you write 
spends 60% or more of its time doing GC rather than actual work.
Ok, you're doing something very wrong. GC time is typically less than 15% of the running
time of typical work programs I hack on.
I bet you're using lists inappropriately?
No offense, but from what I can tell, you're a super-ultra-hyper expert 
in Haskell hacking. It's not surprising that *you* can write fast code. 
What I'm debating is how easy it is for relative beginners to write 
performant code. Certainly I'm not attempting to suggest that Haskell 
programs cannot be fast - this is manifestly false. What I'm contesting 
is how easy it is to make them fast.

I guess because Haskell is such an implicit language, it's very easy to 
end up implicitly doing a hell of a lot of work without realising you're 
doing it. Certainly when I first started with Haskell, a lot of 
non-trivial programs I wrote were *horrifyingly* slow, or ate 
astonishing amounts of RAM. (My favourite one was that time I tried to 
use the "length" function to decide when to switch from quick-sort to 
insert sort. I'll leave you to guess what this did to the performance 
levels...) Today, my programs generally work better - but there's still 
a long, long way from being "fast".
...
I think there is a problem that few people are taking the time to understand
the compilation model of Haskell, while they've had the C runtime behaviour
drilled into their brains since college.
Until you sit down and understand what your Haskell code means, it will be very
hard to reason about optimisations, unfortunately.
You're probably right about all that. I would humbly suggest that what 
is somewhat lacking is a good, introductory, high-level text on what 
makes Haskell go fast and what makes it go slow. As with many things in 
the Haskell world, there are bits and pieces of information out there, 
but it's difficult to track them all down and present a coherant 
picture. We've got a section on the wiki containing scraps and snippets 
of information. There are various GHC-related papers [if you can find 
them online]. GHC has various profiling possibilities, but thus far I've 
found it difficult to digest the results. We need a good, thorough body 
of text on this subject, I feel. [Of course, that still means somebody 
has to write one...]

Maybe Real World Haskell will contain such a thing. Certainly making 
your code not crawl is a very significant real-world issue! ;-)
...
GHC does what it does well
Not contesting that. ;-)
...
and it's predictable. As long as you understand 
the operations you're trying to make predictions about.
That's the part I'm contesting. I feel I have a pretty solid basic 
understanding of what's going on, and yet benchmark results continue to 
astonish me...
...
I suggest installing ghc-core, and looking at how your code is compiled.
Try many examples, and have a good knowledge of the STG paper.
1. What is "ghc-core"?

2. Does anybody know how to actually read GHC's Core output anyway? To 
me, it looks almost exactly like very, very complicated Haskell source 
with a suspicious concentration of case expressions - but I understand 
that in the Core language, many constructs actually mean something quite 
different.

3. Any idea where the STG paper is? Is it still an accurate reflection 
of GHC's current technology?

[Haskell-cafe] Re: GHC predictability

Andrew Coppin