That, rather tangentially, reminds me: If we do start to teach the code generator about how to produce these sorts of things from simpler parts, e.g. via enabling something like LLVM's vectorization pass, or some internal future ghc compiler pass that checks for, say, Superword-Level Parallelism in the style of Jaewook Shin, then we need to differentiate between flags for what ghc/llvm is allowed to produce via optimization, etc. and what the end user is allowed to explicitly emit. e.g. in my own code I can safely call avx2 primitives after I set up guards to check that I'm on a CPU that supports them, but I can only currently emit that code after I tell GHC that I want it to allow the avx2 instructions. If I build a complicated dispatch mechanism in Haskell for picking the right ISA and emitting code for several of them, I'm going to need to tell ghc to let me build with all sorts of instruction sets that the machine the final executable runs on may not fully support. We should be careful not to conflate these two things.

-Edward

On Mon, Mar 13, 2017 at 2:44 PM, Ben Gamari <ben@well-typed.com> wrote:
Siddhanathan Shanmugam <siddhanathan+eml@gmail.com> writes:

>> It would be even better if we could *also* teach the native back end about
> SSE instructions. Is there anyone who might be willing to work on that?
>
> Yes. Though, it would be better if someone with more experience than me
> decides to pick this up instead.
>
I would be happy to advise if you would like to pick this up. I think it
would be great if the NCG were to learn about SSE and GHC could really
use more people knowledgable about its backend. The best way to learn is
by doing.

Cheers,

- Ben