Then the only rule you have to remember is that a reduction step (to whnf) only occurs in Haskell when:
1. Evaluating a case expression (pattern matching)
2. Evaluating a seq expression (this is irrelevant for now)
Your example is a bit tricky as we don't have a concrete monad to work with. For some monads pattern matching on a (forever something) will loop forever, for some it may terminate.
An example for the first one is the Identity monad:
Identity a >>= f = f a
Trying to reduce (forever (Identity x)) will go something like this: (formally these are not all reducion steps but this is how I unroll the expression in my head)
forever (Identity x)
let a' = Identity x >> a' in a'
Identity x >> (let a' = X >> a' in a')
Identity x >>= (\_ -> let a' = Identity x >> a' in a')let a' = X >> a' in a'
And we start looping.
An example for a terminating one would be the Either () monad:
Left () >>= _ = Left ()
Right a >>= f = f a
And the reduction of the term (forever (Left ()):
forever (Left ())
let a' = Left () >> a' in a'
Left () >> (let a' = Left () >> a' in a')
Left () >>= (\_ -> let a' = Left () >> a' in a')
Left ()
The key step is the last one, reducing Left () >>= (\_ -> let a' = Left () >> a' in a') to whnf resulted in Left (), "short circuiting" the loop.
If you want to understand the theoretical basis of lazy evaluation I suggest looking into the lambda calculus and different reduction strategies of it. There is a neat theorem I forgot the name of that shows why lazy evaluation is the "right" one in the sense that if a term T reduces to another term T' using any evaluation strategy then it will also reduce to T' using lazy evaluation.