That's a nice read, thanks for the pointer. I agree with the solution presented there. If we can do that it will be awesome. If help is needed I can spend some time on it.
One of the things that I noticed is that the code can be optimized significantly if we know the common case so that we can optimize that path at the expense of less common path. At times I saw wild difference in performance just by a very small change in the source. I could attribute the difference to code blocks having moved and differently placed jump instructions or change in register allocations impacting the common case more. This could be avoided if we know the common case.
The common case is not visible or obvious to low level tools. It is easier to write the code in a low level language like C such that it is closer to how it will run on the processor, we can also easily influence gcc from the source level. It is harder to do the same in a high level language like Haskell. Perhaps there is no point in doing so. What we can do instead is to use the llvm toolchain to perform feedback directed optimization and it will adjust the low level code accordingly based on your feedback runs. That will be entirely free since it can be done at the llvm level.
My point is that it will pay off in things like that if we invest in integrating llvm better.
-harendra