
Hey everyone, Just out of curiosity, what work is being done in the data parallel haskell / repa projects regarding cache locality? The reason I am asking is because, as I understand it, the biggest bottleneck on today's processors are cache misses, and the reason why optimized platform-specific linear algebra libraries perform well is because they divide the data into chunks that are optimally sized for the cache in order to maximize the number of operations performed per memory access. I wouldn't expect data parallel haskell/repa to automatically know what the perfect chunking strategy should be on each platform, but are there any plans being made at all to do something like this? (To be explicit, this isn't meant as a criticism; I'm just curious and am interested in seeing discussion on this topic by those more knowledgeable than I. :-) ) Thanks! Greg