Threading state is something that the State monad does and it is
purely functional - showing that a monad *can* be pure but don't have
to be. Other monads, like IO as you have stated, have side effects.

Thank you, that helps.
 
At the end of the day I found that the monad is very, very general and
it's best way to think about it as a piece of data wrapped in some
type. With a monad you can
1. takes some data and wraps it up in a type (return) :
a -> m a
2. apply a function to the data within the type (>>=).
m a -> (a -> m b) -> m b
 
But if that's all you need to do, you could just use an Applicative Functor, right? The picture I have at the moment is:

Functors can apply a function to a value inside a container.

Applicative functors provide pure expressions and sequencing, but no binding. All applicative functors are also functors.

Arrows provide a way to set up more complicated pipelines with "tee" junctions, etc. All arrows are also applicative functors (?)

Monads add binding. All monads are also arrows.