
#10678: integer-gmp's runS seems unnecessarily expensive -------------------------------------+------------------------------------- Reporter: rwbarton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.1 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): Reid, indeed it is known: #5916. (If you find any other tickets about this, please add them.) I think the best approach would be as in comment:16 of ticket:5916#comment:16 * '''Inline `runS` very late, in `CorePrep`''' Some thoughts about this. * We'd want a single, magical function which has this behaviour. Things like `unsafeDupablePerformIO` should call `runS`. * Now I think about it, I wonder if the magical function should instead be this: {{{ runRW# :: (State# s -> (# State s, a)) -> (# State s, a #) {-# NOINLINE runRW# #-} runRW# f = f realWorld# }}} That is, all `runRW#` does is to apply its argument to a fresh state token. Now we can define your `runS` thus: {{{ runS f = case ruNRW# f of (# _, r #) -> r }}} I think we can inline `runS` freely, which is good because it means that more code is exposed to the optimiser (in particular that `case`). * Moreover, if we do this, I think we now don't need `lazy`. Because `runRW#`'s strictness signature won't be strict in the `r` part; see `Note [unsafeDupablePerformIO has a lazy RHS]` in `GHC.IO`. * If `CorePrep` saw `runRW# (f a b)` it can generate `f a b realWorld#`. But if it sees `runRW# (\s.e)`, it should generate `e[realWorld#/s]`. That is, it should do the beta-reduction on the fly. That's slightly annoying because `CorePrep` doesn't currently carry around a substitution. But I suppose that you could literally call `substExpr`; this doesn't happen much. * Alternatively maybe this could be done in the code generator, effectively treating `runRW` as a primop. That would be a good plan, except that by the time we get to STG the program is in ANF, so instead of `runRW# (f a b)` we'd have {{{ let g = f a b in ruNRW# g }}} and before we see the `runRW#` we'll have generated code to allocate a closure for `g`. So we'd need to allow STG syntax (for specified primops) to have expressions, not just atoms, as the arguments to the primop. This might actually be a Jolly Good Thing. At the moment `catch# (\s -> e1) (\x s -> e2)` allocates two closures before it gets to the `catch#`, which is pretty stupid since the first thing we do after pushing the exception-catch frame is to execute `e1`. I think changes like this would make modest but significant improvements to many programs. If you'd like to dig into it, I'd gladly help. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10678#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler