Re: [Haskell-cafe] optimising for vector units

27 Jul 2004


      Jan-Willem Maessen - Sun Labs East  writes:
...
There are, I believe, a couple of major challenges:
   * It's easy to identify very small pieces of parallel work, but much
     harder to identify large, yet finite, pieces of work.  Only the
     latter are really worth parallelizing.
By the former, are you thinking of so small grain that it is handled
by out-of-order execution units in the CPU?  And/or the C compiler?
...
* If you don't compute speculatively, you'll never find enough work
     to do.
Although I'm not familiar with the issues, my point is that the number
of CPUs available, even in common household pee cees, is already more
than one (P4 hyper-threading), and could be something like eight in
the not-so-distant future.  It no longer matters (much) if you waste
`cycles, cycles are cheap.  (The next next IA64, Montecito is 1.7G
transistors, including 24Mb on-chip cache.  The P4 is big, but you
could fit thirty of them in that space.  No way Montecito is going to
have anywhere near 30x the performance)

So speculative execution, even if you end up throwing away 50% of the
work you do, could in theory make your program faster anyway.  This is
a headache for C programs; my hope would be that a functional language
would make it easier.
...
* If you compute speculatively, you need some way to *stop* working
     on useless, yet infinite computations.
And you need to choose which computations to start working on, I guess.
Predicting the future never was easy :-)

[perhaps getting off-topic, but hey, this is -cafe]

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

Re: [Haskell-cafe] optimising for vector units

Ketil Malde