Conceptually, the runtime does (runIO# Main.main RealWorld#). Practically, ghc's implementation makes the sequencing stuff go away during code generation, so the runtime just sticks Main.main on the pattern stack and jumps into the STG to reduce it; there's your initial pattern match.
I guess I wasn't clear enough with respect to the state. Every IO action is passed the "current state" and produces a "new state" (except that in reality there is no state to pass or update, since it has no runtime representation). A loop would be a sort of fold, where each iteration gets the "current state" and produces (thisResult,"new state"), then the "new state" is passed into the next loop iteration and the final result is the collection of thisResult-s and the final "new state". Again, conceptually, since the state vanishes during code generation, having served its purpose in ensuring everything happens in order.
This is a bit hacky, since it assumes ghc never gets to see that nothing ever actually uses or updates the state so it's forced to assume it's updated and must be preserved. This is where bytestring's inlinePerformIO (better known as accursedUnutterable…) went wrong, since it inlined the whole thing so ghc could spot that the injected state (it being inlined unsafePerformIO) was fake and never used, and started lifting stuff out of loops, etc. — basically optimizing it as if it were pure code internally instead of IO because it could see through IO's "purity mask".