
Hello Claus, Monday, March 12, 2007, 3:35:43 AM, you wrote:
the problems was mainly due to 2 factors: 1) readArray m (i,j)
yes, indeed. since we are dealing in bulk operations, we might as well take advantage of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.
no, i say here only about memory leaks. of course, unsafeRead omits bounds checking but more important in this case is that readArray created a temporary memory cells - index calculation of matrix turns out to be not strict. it was biggest surprise for me - i thrown a lot of time, adding 'seq' here and there before i even tried to replace this call with unsafeRead
moral/note to self: bulk array operations are your friend (i knew that!-), but you need to use that when defining them (unsafeRead et al are only for library writers, but library writers ought to use them; and i was writing a small inlined library)
switching array operations to unsafe ones raised performance, but, aside of this matrix indexing, don't changed memory usage
2) 'op' in 'l' which was passed as real closure and was not inlined due to weakness of ghc optimizer
it irked me having to write the same loop twice, but i didn't consider the consequences. an INLINE pragma on l almost seems sufficient to regain the loss, so i prefer that; but writing out the loop twice is still a tiny tick faster..
afaik, ghc can't inline recursive functions. it will be great if ghc can automatically make specialized version of function it can't inline. so i'm wonder whether INLINE really helps anything? ... well, after conducting tests i see that INLINE works but manual inlining works even better :))
we should help strictness analyzer by marking all the variables used in tight loops as strict.
ah, there were a few surprises in that one. i thought i had considered possible strictness problems, but obviously, i missed a few relevant possibilities. annotating everything is the
the problem is that arrays also need to be annotated as strict. ghc is not as smart as you think so it really needs that everything will be annotated as strict in order to make strict calls (or may be you just don't take into account that haskell is lazy language and everything by default is lazy, that definitely don't help program to run faster :)
safe option, and clearly documents the intentions, but i cannot help thinking about which annotations could be omitted:
- modArray: a and i are both used anyway
sure? they are just passed into the unsafeRead routine. whether they are *definitely* used or not depends on strictness of unsafeRead in these arguments
- i index in loop is definitely checked (but j isn't, and some others weren't, either; so better safe than sorry)
again, checking in the form of "unless" strictifies your operation
only if we know that 'unless' is strict in its first argument. using
guard makes i definitely strict (and in fact was used before 6.6 to
"declare" parameters as strict):
l i | i - some of the parameters need not be annotated in this example, but should be if one
wanted to reuse the code elsewhere - the one i really don't get yet is the one on the accumulators (!s in l, in dotA/matA);
i thought i had covered that by using $! in applying the loop, but annotating the
formal loop parameter, apart from being nicer, also seems faster.. it is a very different things! when you use $! in call, you ensure
that parameter will be calculated *before* making actual call -
otherwise, 'l' will be called with a closure that calculates actual
value on demand. so this omits creating a closure and therefore
conserves memory. it is all we need to make memory usage as small as
possible
but unfortunately, this don't changes a 'l' itself. it remains lazy
in is argument, so our strict call will just add box around computed
value before passing it to 'l'
otoh, when ghc knows that 'l' is strict in its parameter (due to
strictness analysis or with help of our annotation), it creates l', a
strict version of l whose actual parameter has Int# type. all calls to
'l' just deboxify their parameter and call l'. of course, this means
that your entire computation of new index/accumulator will be done
using Int# values and result will be passed to l' without additional
boxing and unboxing. you see a difference :))) moral/note to self: don't try to be clever, try to be clear..; strictness in formal
parameters is better than strictness in actual parameters; bang-patterns are good;-) when i saw your code, i've thought that we should add more strictness to
fix memory leaks and unsafe* to become a bit faster. memory leak on
matrix read was surprise for me too after that is done, we got 1000 times less temporary data allocated and 5x faster
execution. now it's a bit faster than strict lists is this now anywhere near what the number-crunchers use, when they use Haskell?-) it will be interesting if someone like Don or Simon can further improve
it, i'm learned a lot of tricks from them :) Bulat, thanks for looking into it and for isolating the issues so quickly!-)
claus it was a challenge! i, as moderator of arrays wiki page, can't allow
any other data structures to be faster :)))
--
Best regards,
Bulat mailto:Bulat.Ziganshin@gmail.com