GHCi doesn't perform any optimizations, so whenever you're running interpreted bytecode there's a significant performance hit. However, if you compile the code, you can run the compiled/optimized version from GHCi as well.

--

I missed out the optimization bit .... yes, that would make a difference.

However beyond that is it not just about graph reduction which should be the same?