GHCi doesn't perform any optimizations, so whenever you're running interpreted bytecode there's a significant performance hit. However, if you compile the code, you can run the compiled/optimized version from GHCi as well.
--
I missed out the optimization bit .... yes, that would make a difference.
However beyond that is it not just about graph reduction which should be the same?