Re[4]: [Haskell-cafe] idea for avoiding temporaries

12 Mar 2007

      Hello Claus,

Monday, March 12, 2007, 3:35:43 AM, you wrote:
...
...
the problems was mainly due to 2 factors:
1) readArray m (i,j)
...
yes, indeed. since we are dealing in bulk operations, we might as well take advantage
of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.
no, i say here only about memory leaks. of course, unsafeRead omits
bounds checking but more important in this case is that readArray
created a temporary memory cells - index calculation of matrix turns out
to be not strict. it was biggest surprise for me - i thrown a lot of
time, adding 'seq' here and there before i even tried to replace this
call with unsafeRead
...
moral/note to self: bulk array operations are your friend (i knew that!-), but you need
to use that when defining them (unsafeRead et al are only for library writers, but library
writers ought to use them; and i was writing a small inlined library)
switching array operations to unsafe ones raised performance, but,
aside of this matrix indexing, don't changed memory usage
...
...
2) 'op' in 'l' which was passed as real closure and was not inlined
due to weakness of ghc optimizer
...
it irked me having to write the same loop twice, but i didn't consider the consequences.
an INLINE pragma on l almost seems sufficient to regain the loss, so i prefer that; but
writing out the loop twice is still a tiny tick faster..
afaik, ghc can't inline recursive functions. it will be great if ghc
can automatically make specialized version of function it can't
inline. so i'm wonder whether INLINE really helps anything?

... well, after conducting tests i see that INLINE works but manual
inlining works even better :))
...
...
we should help strictness analyzer by marking all the variables used in tight loops as strict.
...
ah, there were a few surprises in that one. i thought i had considered possible strictness
problems, but obviously, i missed a few relevant possibilities. annotating everything is the
the problem is that arrays also need to be annotated as strict. ghc is
not as smart as you think so it really needs that everything will be
annotated as strict in order to make strict calls (or may be you just
don't take into account that haskell is lazy language and everything by
default is lazy, that definitely don't help program to run faster :)
...
safe option, and clearly documents the intentions, but i cannot help thinking about which
annotations could be omitted:
...
- modArray: a and i are both used anyway
sure? they are just passed into the unsafeRead routine. whether they
are *definitely* used or not depends on strictness of unsafeRead in
these arguments
...
- i index in loop is definitely checked (but j isn't, and some others weren't, either; so
   better safe than sorry)
again, checking in the form of "unless" strictifies your operation
only if we know that 'unless' is strict in its first argument. using
guard makes i definitely strict (and in fact was used before 6.6 to
"declare" parameters as strict):

l i | i
...
- some of the parameters need not be annotated in this example, but should be if one
  wanted to reuse the code elsewhere
...
- the one i really don't get yet is the one on the accumulators (!s in l, in dotA/matA);
  i thought i had covered that by using $! in applying the loop, but annotating the
  formal loop parameter, apart from being nicer, also seems faster..
it is a very different things! when you use $! in call, you ensure
that parameter will be calculated *before* making actual call -
otherwise, 'l' will be called with a closure that calculates actual
value on demand. so this omits creating a closure and therefore
conserves memory. it is all we need to make memory usage as small as
possible

but unfortunately, this don't changes a 'l' itself. it remains lazy
in is argument, so our strict call will just add box around computed
value before passing it to 'l'

otoh, when ghc knows that 'l' is strict in its parameter (due to
strictness analysis or with help of our annotation), it creates l', a
strict version of l whose actual parameter has Int# type. all calls to
'l' just deboxify their parameter and call l'. of course, this means
that your entire computation of new index/accumulator will be done
using Int# values and result will be passed to l' without additional
boxing and unboxing. you see a difference :)))
...
moral/note to self: don't try to be clever, try to be clear..; strictness in formal
parameters is better than strictness in actual parameters; bang-patterns are good;-)
when i saw your code, i've thought that we should add more strictness to
fix memory leaks and unsafe* to become a bit faster. memory leak on
matrix read was surprise for me too
...
...
after that is done, we got 1000 times less temporary data allocated and 5x faster 
execution. now it's a bit faster than strict lists
...
is this now anywhere near what the number-crunchers use, when they use Haskell?-)
it will be interesting if someone like Don or Simon can further improve
it, i'm learned a lot of tricks from them :)
...
Bulat, thanks for looking into it and for isolating the issues so quickly!-)
claus
it was a challenge! i, as moderator of arrays wiki page, can't allow
any other data structures to be faster :)))

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@gmail.com

Re[4]: [Haskell-cafe] idea for avoiding temporaries

Bulat Ziganshin