Hi Will,
Right, I'm not an expert on low level things, but yes, each memory page can cache a different vector and even can work faster. Specially if the algoritm uses a few fields of a large structure. I was wrong on that.
But anyway, Unboxed need more native support to give Haskell more credibility in performance critical problems. Now it has some conversion overhead for user defined data. That may be optimized away but the whole thing is second class, via an instance instead of a language feature.
Maybe automatic deriving Unboxed instances can be the right compromise