Threading state is something that the State monad does and it is
purely functional - showing that a monad *can* be pure but don't have
to be. Other monads, like IO as you have stated, have side effects.
At the end of the day I found that the monad is very, very general and
it's best way to think about it as a piece of data wrapped in some
type. With a monad you can
1. takes some data and wraps it up in a type (return) :
a -> m a
2. apply a function to the data within the type (>>=).
m a -> (a -> m b) -> m b