
Hello GHC devs, I'm trying to understand why my code is not being optimized in the way I would expect. I'm completely stuck and I think I need the advice of an expert. I'm writing an effect system on top of transformers. The effect system wraps monad transformers in a newtype that encodes the composition structure of the transformers at the type level. Because it's a newtype all of the class members are inherited directly from the underlying type using coerce. When I implement something using this effect system I would expect to generate exactly the same code as if I had written it using transformers directly. However, it generates significantly worse code, even in a very simple case. Firstly, a case where I do get the same code. All of these compile to the constant 1. Hooray! https://github.com/tomjaguarpaw/ad/blob/cd0d876ddb448fe611515e8768dee66dc02e... Secondly, a simple cases where I do not get the same code. `mySumMTL` and `mySumNewtype` yield the same code, as expected. After all, `mySumNewtype` does exactly the same thing as `mySumMTL`, it's just wrapped in some newtypes. However, `mySumEff` yields worse code, despite *also* being the same thing as `mySumMTL` just wrapped in some newtypes. https://github.com/tomjaguarpaw/ad/blob/cd0d876ddb448fe611515e8768dee66dc02e... You can compare the generated loops at: https://github.com/tomjaguarpaw/ad/blob/cd0d876ddb448fe611515e8768dee66dc02e... Does anyone have a clue what's going wrong in the optimizer here? I don't think the singleton that I pass around to access the type level index at runtime has anything to do with it. That seems to be optimized away by inlining. Is the simplifier confused by all the coercions? Thanks for any help anyone may be able to shed, Tom