[Git][ghc/ghc][master] bytecode: Don't PUSH_L 0; SLIDE 1 1

Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC Commits: 60a16db7 by Rodrigo Mesquita at 2025-09-03T10:55:50+01:00 bytecode: Don't PUSH_L 0; SLIDE 1 1 While looking through bytecode I noticed a quite common unfortunate pattern: ... PUSH_L 0 SLIDE 1 1 We do this often by generically constructing a tail call from a function atom that may be somewhere arbitrary on the stack. However, for the special case that the function can be found directly on top of the stack, as part of the arguments, it's plain redundant to push then slide it. In this commit we add a small optimisation to the generation of tailcalls in bytecode. Simply: lookahead for the function in the stack. If it is the first thing on the stack and it is part of the arguments which would be dropped as we entered the tail call, then don't push then slide it. In a simple example (T26042b), this already produced a drastic improvement in generated code (left is old, right is with this patch): ```diff 3c3 < 2025-07-29 10:14:02.081277 UTC ---
2025-07-29 10:50:36.560949 UTC 160,161c160 < PUSH_L 0 < SLIDE 1 2
SLIDE 1 1
164,165d162 < PUSH_L 0 < SLIDE 1 1 175,176c172 < PUSH_L 0 < SLIDE 1 2 ---
SLIDE 1 1
179,180d174 < PUSH_L 0 < SLIDE 1 1 206,207d199 < PUSH_L 0 < SLIDE 1 1 210,211d201 < PUSH_L 0 < SLIDE 1 1 214,215d203 < PUSH_L 0 < SLIDE 1 1 218,219d205 < PUSH_L 0 < SLIDE 1 1 222,223d207 < PUSH_L 0 < SLIDE 1 1 ... 600,601c566 < PUSH_L 0 < SLIDE 1 2 ---
SLIDE 1 1
604,605d568 < PUSH_L 0 < SLIDE 1 1 632,633d594 < PUSH_L 0 < SLIDE 1 1 636,637d596 < PUSH_L 0 < SLIDE 1 1 640,641d598 < PUSH_L 0 < SLIDE 1 1 644,645d600 < PUSH_L 0 < SLIDE 1 1 648,649d602 < PUSH_L 0 < SLIDE 1 1 652,653d604 < PUSH_L 0 < SLIDE 1 1 656,657d606 < PUSH_L 0 < SLIDE 1 1 660,661d608 < PUSH_L 0 < SLIDE 1 1 664,665d610 < PUSH_L 0 < SLIDE 1 1 ``` I also compiled lib:Cabal to bytecode and counted the number of bytecode lines with `find dist-newstyle -name "*.dump-BCOs" -exec wc {} +`: with unoptimized core: 1190689 lines (before) - 1172891 lines (now) = 17798 less redundant instructions (-1.5% lines) with optimized core: 1924818 lines (before) - 1864836 lines (now) = 59982 less redundant instructions (-3.1% lines) - - - - - 1 changed file: - compiler/GHC/StgToByteCode.hs Changes: ===================================== compiler/GHC/StgToByteCode.hs ===================================== @@ -748,12 +748,21 @@ doTailCall init_d s p fn args = do where do_pushes !d [] reps = do - assert (null reps) return () - (push_fn, sz) <- pushAtom d p (StgVarArg fn) platform <- profilePlatform <$> getProfile - assert (sz == wordSize platform) return () - let slide = mkSlideB platform (d - init_d + wordSize platform) (init_d - s) - return (push_fn `appOL` (slide `appOL` unitOL ENTER)) + assert (null reps) return () + case lookupBCEnv_maybe fn p of + Just d_v + | d - d_v == 0 -- shortcut; the first thing on the stack is what we want to enter, + , d_v <= init_d -- and it is between init_d and sequel (which will be dropped) + -> do + let slide = mkSlideB platform (d - init_d + wordSize platform) + (init_d - s - wordSize platform) + return (slide `appOL` unitOL ENTER) + _ -> do + (push_fn, sz) <- pushAtom d p (StgVarArg fn) + assert (sz == wordSize platform) return () + let slide = mkSlideB platform (d - init_d + wordSize platform) (init_d - s) + return (push_fn `appOL` (slide `appOL` unitOL ENTER)) do_pushes !d args reps = do let (push_apply, n, rest_of_reps) = findPushSeq reps (these_args, rest_of_args) = splitAt n args View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/60a16db7cbbe6f37b22eea8ae4d9e79f... -- View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/60a16db7cbbe6f37b22eea8ae4d9e79f... You're receiving this email because of your account on gitlab.haskell.org.
participants (1)
-
Marge Bot (@marge-bot)