
On 8/30/07, Bulat Ziganshin
it's the same as 8800GTX. please read CUDA manual first. these 128 threads are not independent, each 8 or 16 threads execute the same code
Hmmmm, yes you are right. The GPU contains 8 "multiprocessors", where each multiprocessor contains multiple processors that execute the same code at the same time ("data parallel"). There are 8 processors in each multiprocessor unit, which run at twice the clock speed, so in one clock cycle they can execute 16 threads; and in two clock cycles they can execute all 32 threads of a "warp" (group of threads running the same code). Sooo.... kindof an interesting architecture. To what extent do we think that this is representative of future "general-purpose" multi-core architectures? Looking at Haskell parallelization, things like maps, and folds of associative functions can be split across the threads of a single warp. On the other hand, things like independent lets that we want to run in parallel would need to be assigned to independent warps. On the whole, maps and folds may constitute the bulk of what we are trying to parallelize (certainly, SPJ's NDP focuses extensively on maps), so this is probably broadly compatible with the CUDA architecture?