I have no idea how to go about designing a DSL or for designing a
monad which will handle the animations.
Is it really about DSLs and monads?
Or is it first getting a grasp of the problem space?
You're trying to chew everything at once in a first bite. Scope out the smallest meaningful chunk and use that experiential learning to direct where to go next.
"If you can't solve a problem, then there is an easier problem you can solve: find it." -- Polya