I do not see why forall can be lifted to the top of function arrows.
I probably do not understand the notation at all. They all seem to be
different to me.
Consider this simple function:
\b x y -> if b then x else y
Let's say we wanted to translate that into a language like System F, where every lambda has to have a type. We could write something like:
\(b::Bool) (x::?) (y::?) -> if b then x else y
but we need something to put in those question marks. We solve this by taking the type of x and y as an additional parameter:
\(a::*) (b::Bool) (x::a) (y::a) -> if b then x else y
This would have the type "forall a. Bool -> a -> a -> a". In a dependently typed system, we might write that type as "(a::*) -> (b::Bool) -> (x::a) -> (y::a) -> a".
Since b doesn't depend on a, we can switch their order in the argument list,
\(b::Bool) (a::*) (x::a) (y::a) -> if b then x else y
This has type "Bool -> forall a. a -> a -> a" or "(b::Bool) -> (a::*) -> (x::a) -> (y::a) -> a".
Haskell arranges things so that the implicit type arguments always appear first in the argument list.
I find it helps to think of forall a. as being like a function, and exists a. as being like a pair.