Your example doesn't work for the same reason the following doesn't work:
id runST (<some st code>)
It requires the inferencer to instantiate certain variables of id's type to polymorphic types based on runST (or flip's based on one), and then use that information to check <some st code> (id in your example) as a polymorphic type. At various times, GHC has had ad-hoc left-to-right behavior that made this work, but it no longer does. Right now, I believe it only has an ad-hoc check to make sure that:
runST $ <some st code>
works, and not much else. Note that even left-to-right behavior covers all cases, as you might have:
f x y
such that y requires x to be checked polymorphically in the same way. There are algorithms that can get this right in general, but it's a little tricky, and they're rather different than GHC's algorithm, so I don't know whether it's possible to make GHC behave correctly.
The reason it works when you factor out or annotate "flip one 'x'" is that that is the eventual inferred type of the expression, and then it knows to expect the id to be polymorphic. But when it's all at once, we just have a chain of unifications relating things like: (forall a. a -> a) ~ beta ~ (alpha -> alpha), where beta is part of type checking flip, and alpha -> alpha is the instantiation of id's type with unification variables, because we didn't know that it was supposed to be a fully polymorphic use. And that unification fails.