My general experience with prefetching is that it is almost never a win when done just on trees, as in the usual mark-sweep or copy-collection garbage collector walk. Why? Because the time from the time you prefetch to the time you use the data is too variable. Stack disciplines and prefetch don't mix nicely.
If you want to see a win out of it you have to free up some of the ordering of your walk, and tweak your whole application to support it. e.g. if you want to use prefetching in garbage collection, the way to do it is to switch from a strict stack discipline to using a small fixed-sized queue on the output of the stack, then feed prefetch on the way into the queue rather than as you walk the stack. That paid out for me as a 10-15% speedup last time I used it after factoring in the overhead of the extra queue. Not too bad for a weekend project. =)
Without that sort of known lead-in time, it works out that prefetching is usually a net loss or vanishes into the noise.
As for the array ops, davean has a couple of cases w/ those for which the prefetching operations are a 20-25% speedup, which is what motivated Carter to start playing around with these again. I don't know off hand how easily those can be turned into public test cases though.
-Edward