
On Thu, Dec 3, 2020, at 05:29, Simon Marlow wrote:
My understanding of what happened in C compilers is that the optimisers want to do some dataflow analysis to determine the ranges of values that variables can take, in order to eliminate unnecessary conditionals and so forth. These analyses are severely crippled if you can't assume the absence of undefined behaviour in C/C++, because most arithmetic operations have undefined behaviour: overflow is undefined, for example.
Right, I think what leads to the surprising behavior in C/C++ is that compilers sometimes use the results of the dataflow analysis in a manner that violates causality. For example, removing a NULL check because later on the pointer is dereferenced unconditionally. I don't think anyone would complain about the compiler removing NULL checks *after* the dereference, but doing so before is a bit unsettling. Suppose the NULL check the compiler cleverly elided was meant to call into a custom abort() function. Now we have to hope the compiler can figure out that our abort() wrapper doesn't return, and we probably have to annotate our abort() to help the compiler along. Another example that always comes to mind is where the compiler may replace an indirect call via a NULL function pointer with a direct call to a function f() that is never actually assigned to the function pointer, because it detected that f() is the only possible target of the pointer and calling a NULL function pointer is also UB.
We don't do this kind of analysis in GHC at the moment, and we don't have quite so much undefined behaviour floating around. But I can imagine similar issues might arise. I mean, we most definitely assume that the result of `unsafeCoerce#` is a value of the claimed type - what else can you do? I suppose I'm not really clear on what specifically it would mean to adopt "illegal but unchecked" as the policy for undefined behaviour. Does it mean the compiler can't assume anything about the result of an operation that has any undefined behaviour?
That's a fair point. I'm not entirely clear on where to draw the line either, but maybe respecting causality would be a good place to start. So it's fine for us to assume that operations that have UB complete successfully, and to assume the inputs satisfy the operation's contract from that point on, but maybe we should avoid propagating those assumptions back before the call-site that may invoke UB. But then what do we do about loops? It seems we would either have to special case the first iteration or block optimizations in the first section of the loop on every iteration.. Both options sound terrible..