I do not see why forall can be lifted to the top of function arrows.
I probably do not understand the notation at all. They all seem to be
different to me.

Consider this simple function:

    \b x y -> if b then x else y

Let's say we wanted to translate that into a language like System F, where every lambda has to have a type. We could write something like:

    \(b::Bool) (x::?) (y::?) -> if b then x else y

but we need something to put in those question marks. We solve this by taking the type of x and y as an additional parameter:

    \(a::*) (b::Bool) (x::a) (y::a) -> if b then x else y

This would have the type "forall a. Bool -> a -> a -> a". In a dependently typed system, we might write that type as "(a::*) -> (b::Bool) -> (x::a) -> (y::a) -> a".

Since b doesn't depend on a, we can switch their order in the argument list,

    \(b::Bool) (a::*) (x::a) (y::a) -> if b then x else y

This has type "Bool -> forall a. a -> a -> a" or "(b::Bool) -> (a::*) -> (x::a) -> (y::a) -> a".

Haskell arranges things so that the implicit type arguments always appear first in the argument list.

I find it helps to think of forall a. as being like a function, and exists a. as being like a pair.