In the second example we probably have something like 6 'JMP' statements in machine code – 3 to jump in to each function, and 3 to jump back out. In the first we have 2 – one to jump us into mcSimulate and one to return. So each iteration executes 4 more JMPs in the second example. All others things being equal this will produce slightly less efficient code.
Wow. I strongly suggest you forget about efficiency completely and become a proficient high-level haskeller, and then dive back in. Laziness changes many runtime properties, and renders your old ways of thinking about efficiency almost useless.
If you are interested, though, you can use the ghc-core tool on hackage to look at the core (lowish-level intermediate language) and even the generated assembly for minimal cases. It's dense, but interesting if you have the time to study it.
Others will know more about this specific speculation than I.
Luke