On Mon, Oct 8, 2018 at 8:53 AM Joachim Durchholz <jo@durchholz.org> wrote:

Am 08.10.2018 um 01:34 schrieb Vanessa McHale:
> The problem with an IR is that some languages would inevitably suffer -
> LLVM in particular was designed as a backend for a C compiler, and so it
> is not necessarily well-suited for lazy languages, immutable languages,
> etc. (not to mention self-modifying assembly and other such pathological
> beasts...)
Actually LLVM is built for being adaptable to different kinds of
languages. It does have a bias towards C-style languages, but you can
adapt what doesn't fit your needs *and still keep the rest*.

The following was true a few years ago:

When I asked, the LLVM IR was intentionally not specified to be reusable
across languages, so that different compiler toolchain could adapt the
IR to whatever needs their language or backend infrastructure needed.

Garbage collection is one area where you have to do a lot of work. There
are some primitive instructions that support it, but the semantics is
vague and doesn't cover all kinds of write barriers. You'll have to roll
your own IR extensions - or maybe I didn't understand the primitives
well enough to see how much they cover.
Anyway, LLVM does not come with a GC implementation.
OTOH, it does not prevent you from doing a GC. In particular, you're
free to avoid C-style pointers, so you have the full range of GC
algorithms available.

Laziness? No problem. If you do tagless/spineless, you'll code the
evaluation machine anyway. Just add an IR instructions that calls the
interpreter.

I'm far from expert in this area, but isn't that "interpreter" a simple yet slow approach to codegen? My understanding is that when you use, say, a global variable as a register for your evaluation machine, it is slower than if you somehow pin real hardware register for that purpose. I think this is what "registerized" GHC build means.

In LLVM you can't use, say, RSP in a way you want, but it is doomed to be "stack pointer register", even if you don't use stack at all.

As I read in some blog, you can slightly affect LLVM codegen by adding calling conventions, but the real solution would be another algorithm for instruction selection. No one implemented that yet, AFAIK.

Immutability? No problem - actually nowhere a problem. Immutability
happens at the language level, at the IR level it is pretty irrelevant
because compilers try to replace object copying by in-place modification
wherever possible, anyway.

Self-modifying assembly? No IR really supports that. Mostly it's
backends that generate self-modifying code from IR instructions for
specific backends.

TL;DR: For its generality, LLVM IR is better suited to languages with
specific needs in the backend than anything else that I have seen (which
means C runtimes, various VM proofs of concept which don't really count,
and JVM - in particular I don't know how .net compares).

Regards,
Jo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.