My end goal is to have the user use transparently the fastest implementation available to their architecture/cpu providing they use the high level module. I've uploaded the cpu package which allows me to detect at runtime the aes instruction (and the architecture), but i've been distracted in implementing fast galois field arithmetics for GCM and XTS mode (with AES).

Yes!  A worthy goal!

I think the proposal here is that we do the build/integration work to get something good which is portable enough and install-reliable enough to replace 'random'.  Then people who don't care will be using a good implementation by default.

That was my goal when I had my own small shot at this, but what I came up with was *very* build-fragile.  (Depended on assembler being available, or on prebuilt binaries being included for that package.)  You can see the Setup.hs customization I attempted to do in intel-aes to compensate, but it's not enough.

Can we write a cabal-compatible, really robust installer that will test the users system and always fall back rather than failing?

  -Ryan

P.S. How are you doing the CPUID test for NI instructions?  I used the *intel provided* test for this (in intel-aes) but I still had reports of incorrect identification on certain AMD CPUs...