
That, rather tangentially, reminds me: If we do start to teach the code
generator about how to produce these sorts of things from simpler parts,
e.g. via enabling something like LLVM's vectorization pass, or some
internal future ghc compiler pass that checks for, say, Superword-Level
Parallelism
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.4663&rep=rep1&type=pdf
in the style of Jaewook Shin, then we need to differentiate between flags
for what ghc/llvm is allowed to produce via optimization, etc. and what the
end user is allowed to explicitly emit. e.g. in my own code I can safely
call avx2 primitives after I set up guards to check that I'm on a CPU that
supports them, but I can only currently emit that code after I tell GHC
that I want it to allow the avx2 instructions. If I build a complicated
dispatch mechanism in Haskell for picking the right ISA and emitting code
for several of them, I'm going to need to tell ghc to let me build with all
sorts of instruction sets that the machine the final executable runs on may
not fully support. We should be careful not to conflate these two things.
-Edward
On Mon, Mar 13, 2017 at 2:44 PM, Ben Gamari
Siddhanathan Shanmugam
writes: It would be even better if we could *also* teach the native back end about SSE instructions. Is there anyone who might be willing to work on that?
Yes. Though, it would be better if someone with more experience than me decides to pick this up instead.
I would be happy to advise if you would like to pick this up. I think it would be great if the NCG were to learn about SSE and GHC could really use more people knowledgable about its backend. The best way to learn is by doing.
Cheers,
- Ben