
On Fri, 2008-07-25 at 10:38 +1000, Ben Lippmeier wrote:
I'd be more interested in the 8 x hardware threads per core, [1] suggests that (single threaded) GHC code spends over half its time stalled due to L2 data cache miss.
Right, that's what I think is most interesting and why I wanted to get this project going in the first place. If we spend so long blocked on memory reads that we're only utilising 50% of a core's time then there's lots of room for improvements if we can fill in that wasted time by running another thread. So that's the supposed advantage of multiplexing several threads per core. If Haskell is suffering more than other languages with the memory latency and low utilisation then we've also got most to gain with this multiplexing approach.
64 threads per machine is a good incentive for trying out a few `par` calls..
Of course then it means we need to have enough work to do. Indeed we need quite a bit just to break even because each core is relatively stripped down without all the out-of-order execution etc. Duncan