
import Data.Array.Vector import Data.Bits main = print . productU . mapU (*2) . mapU (`shiftL` 2) $ replicateU (100000000 :: Int) (5::Int)
and turns it into a loop like this:
$wfold :: Int# -> Int# -> Int# $wfold = \ (ww_sWX :: Int#) (ww1_sX1 :: Int#) -> case ww1_sX1 of wild_B1 { __DEFAULT -> $wfold (*# ww_sWX 40) (+# wild_B1 1); 100000000 -> ww_sWX } .. So now, since we've gone to such effort to produce a tiny loop like, this, can't we unroll it just a little? Anyone think of a way to apply Claus' TH unroller, or somehow convince GCC it is worth unrolling this guy, so we get the win of both aggressive high level fusion, and aggressive low level loop optimisations?
I'm not sure this is what you're after (been too long since I read assembler;-), but it sounds as if you wanted to unroll the source of that fold, which seems to be a local definition in foldS? Since unrolling is not always a good idea, it would also be nice to have a way to control/initiate it from outside of the uvector package (perhaps a RULE to redirect the call from foldS to a foldSN, but foldS is hidden, and gets inlined away; but something like that). If that works, you'd then run into the issue of wanting to rearrange the *# and *# by variable and constant. Claus