The way forever is implemented is a bit obtuse. It's mainly a hack to make GHC's optimizer avoid space leaking no matter what the surrounding code is.
You can think of the implementation as just:
forever :: Monad m => m a -> m b
forever act = do
act
forever act
which is pretty much what you'd do in an imperative language, so it's not that crazy.
You can see the similarity if you replace the do notation with manual binds and rename 'act' to 'a':
forever :: Monad m => m a -> m b
forever a = a >> forever a
Again, the knot tying stuff is just to prevent a space leak in certain optimization scenarios.