1. cprng-aes is painfully slow.
when using the haskell AES implementation yes. with AESNI it fly, and even more when
i'll have time to chunk the generation to bigger blocks (says 128 AES block at a time)

One data-point -- in "intel-aes" I needed to do bigger blocks to get decent performance.
 
2. It doesn't use NI instructions (or any C implementation, currently).
The NI instructions support are coming. and there's ton of already existing C implementation
that could just be added.

Oh, neat.  Could you share a pointer to some C code (with GCC aes intrinsics?) that can replace what the ASM does in the "intel-aes" package?