Re: [Haskell-cafe] Analyzing slow performance of a Haskell program

9 Aug 2011

      Hi Chris,

On Tue, Aug 9, 2011 at 12:47 PM, Chris Yuen  wrote:
...
1. Why are bangs needed on the length arrays?
If I remove them from below, performance drops 10%. I thought `unsafeIndex`
is straight in both arguments, no?
wordLength i = go i
  where
    go n
      | n < 10 = lengthOnes !! n
      | n < 20 = lengthTeens !! (n-10)
      | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
      | n < 1000 = (lengthOnes !! (n // 100)) + 7 + go (n % 100)
      | n < 1000000 = go (n // 1000) + 8 + go (n % 1000)
      | otherwise = go (n // 1000000) + 7 + go (n % 1000000)
    !lengthOnes = lengthVec ones
    !lengthTens = lengthVec tens
    !lengthTeens = lengthVec teens
(It's "strict", not "straight".)

The different lengths are not used in all branches and since Haskell
is a lazy (or to be pendantic: non-strict) language we cannot compute
them before knowing which branch will be evaluated. For example, given
that we have

    ones = ...
    tens = error "Boom!"

    test = wordLength 0

evaluating 'test' should not cause an exception to be raised as the
first (n < 10) branch is taken, but it would if lengthOnes was strict.

Delaying the evaluation has some costs, namely allocating a thunk for
e.g. `lengthVec ones` and later evaluate that thunk. By making the
lengths strict we can evaluate them earlier and avoid some allocation
and forcing of thunks.
...
2. Why the single element worker wrapper pattern (`go` functions) increases
performance?
If we change wordLength to
wordLength n
  | n < 10 = lengthOnes !! n
  | n < 20 = lengthTeens !! (n-10)
  | n < 100 = (lengthTens !! (n // 10)) + (lengthOnes !! (n % 10))
  | n < 1000 = (lengthOnes !! (n // 100)) + 7 + wordLength (n % 100)
  | n < 1000000 = wordLength (n // 1000) + 8 + wordLength (n % 1000)
  | otherwise = wordLength (n // 1000000) + 7 + wordLength (n % 1000000)
  where
    !lengthOnes = lengthVec ones
    !lengthTens = lengthVec tens
    !lengthTeens = lengthVec teens
The performance drops by another 10%. This really surprised me. `go i`
seemed obvious to me and I don't understand how it could make any
difference. The full source code is available to GHC so it shouldn't be
related to call-by-pointer problem? If this is the case, shouldn't we always
wrap a "go" function for **any** recursive functions?
Making wordLength non-recursive lets GHC inline it, which can
sometimes help performance (e.g. if the inlining enables more
optimizations). Inlining does increase code size (and sometimes
allocation if a closure has to be allocated to capture free
variables), so it's not always a good idea.

Cheers,
Johan

Re: [Haskell-cafe] Analyzing slow performance of a Haskell program

Johan Tibell