GitLab

Rodrigo Mesquita pushed to branch wip/romes/step-out-11 at Glasgow Haskell Compiler / GHC

Commits:

335654e0

by Rodrigo Mesquita at 2025-07-29T19:14:40+01:00

bytecode: Don't PUSH_L 0; SLIDE 1 1

While looking through bytecode I noticed a quite common unfortunate
pattern:

...
PUSH_L 0
SLIDE 1 1

We do this often by generically constructing a tail call from a function
atom that may be somewhere arbitrary on the stack.
However, for the special case that the function can be found directly on
top of the stack, as part of the arguments, it's plain redundant to push
then slide it.

In this commit we add a small optimisation to the generation of
tailcalls in bytecode. Simply: lookahead for the function in the stack.
If it is the first thing on the stack and it is part of the arguments
which would be dropped as we entered the tail call, then don't push then
slide it.

In a simple example (T26042b), this already produced a drastic
improvement in generated code (left is old, right is with this patch):

```diff
3c3
< 2025-07-29 10:14:02.081277 UTC
---
> 2025-07-29 10:50:36.560949 UTC
160,161c160
<                                            PUSH_L   0
<                                            SLIDE    1 2
---
>                                            SLIDE    1 1
164,165d162
<                                       PUSH_L   0
<                                       SLIDE    1 1
175,176c172
<                             PUSH_L   0
<                             SLIDE    1 2
---
>                             SLIDE    1 1
179,180d174
<                        PUSH_L   0
<                        SLIDE    1 1
206,207d199
<                        PUSH_L   0
<                        SLIDE    1 1
210,211d201
<                   PUSH_L   0
<                   SLIDE    1 1
214,215d203
<              PUSH_L   0
<              SLIDE    1 1
218,219d205
<         PUSH_L   0
<         SLIDE    1 1
222,223d207
<    PUSH_L   0
<    SLIDE    1 1
333,334c317
<                                  PUSH_L   0
<                                  SLIDE    1 2
---
>                                  SLIDE    1 1
337,338d319
<                             PUSH_L   0
<                             SLIDE    1 1
367,368c348
<                             PUSH_L   0
<                             SLIDE    1 2
---
>                             SLIDE    1 1
371,372c351
<                        PUSH_L   0
<                        SLIDE    1 2
---
>                        SLIDE    1 1
375,376d353
<                   PUSH_L   0
<                   SLIDE    1 1
379,380c356
<              PUSH_L   0
<              SLIDE    1 2
---
>              SLIDE    1 1
383,384c359
<         PUSH_L   0
<         SLIDE    1 3
---
>         SLIDE    1 2
387,388c362
<    PUSH_L   0
<    SLIDE    1 2
---
>    SLIDE    1 1
417,418d390
<              PUSH_L   0
<              SLIDE    1 1
421,422c393
<         PUSH_L   0
<         SLIDE    1 4
---
>         SLIDE    1 3
442,443c413
<              PUSH_L   0
<              SLIDE    1 2
---
>              SLIDE    1 1
446,447c416
<         PUSH_L   0
<         SLIDE    1 4
---
>         SLIDE    1 3
510,511c479
<                        PUSH_L   0
<                        SLIDE    1 2
---
>                        SLIDE    1 1
514,515d481
<                   PUSH_L   0
<                   SLIDE    1 1
600,601c566
<                                                 PUSH_L   0
<                                                 SLIDE    1 2
---
>                                                 SLIDE    1 1
604,605d568
<                                            PUSH_L   0
<                                            SLIDE    1 1
632,633d594
<                                            PUSH_L   0
<                                            SLIDE    1 1
636,637d596
<                                       PUSH_L   0
<                                       SLIDE    1 1
640,641d598
<                                  PUSH_L   0
<                                  SLIDE    1 1
644,645d600
<                             PUSH_L   0
<                             SLIDE    1 1
648,649d602
<                        PUSH_L   0
<                        SLIDE    1 1
652,653d604
<                   PUSH_L   0
<                   SLIDE    1 1
656,657d606
<              PUSH_L   0
<              SLIDE    1 1
660,661d608
<         PUSH_L   0
<         SLIDE    1 1
664,665d610
<    PUSH_L   0
<    SLIDE    1 1
```

I also compiled lib:Cabal to bytecode and counted the number of bytecode
lines with `find dist-newstyle -name "*.dump-BCOs" -exec wc {} +`:

    with unoptimized core:
    1190689 lines (before) - 1172891 lines (now)
    = 17798 less redundant instructions (-1.5%)

    with optimized core:
    1924818 lines (before) - 1864836 lines (now)
    = 59982 less redundant instructions (-3.1%)

1 changed file:

compiler/GHC/StgToByteCode.hs

Changes:

compiler/GHC/StgToByteCode.hs

@@ -751,12 +751,21 @@ doTailCall init_d s p fn args = do
    where
    do_pushes !d [] reps = do
 -        assert (null reps) return ()
 -        (push_fn, sz) <- pushAtom d p (StgVarArg fn)
          platform <- profilePlatform <$> getProfile
 -        assert (sz == wordSize platform) return ()
 -        let slide = mkSlideB platform (d - init_d + wordSize platform) (init_d - s)
 -        return (push_fn `appOL` (slide `appOL` unitOL ENTER))
 +        assert (null reps) return ()
 +        case lookupBCEnv_maybe fn p of
 +          Just d_v
 +            | d - d_v == 0  -- shortcut; the first thing on the stack is what we want to enter,
 +            , d_v <= init_d -- and it is between init_d and sequel (which would be dropped)
 +            -> do
 +              let slide = mkSlideB platform (d - init_d + wordSize platform)
 +                                            (init_d - s - wordSize platform)
 +              return (slide `appOL` unitOL ENTER)
 +          _ -> do
 +              (push_fn, sz) <- pushAtom d p (StgVarArg fn)
 +              assert (sz == wordSize platform) return ()
 +              let slide = mkSlideB platform (d - init_d + wordSize platform) (init_d - s)
 +              return (push_fn `appOL` (slide `appOL` unitOL ENTER))
    do_pushes !d args reps = do
        let (push_apply, n, rest_of_reps) = findPushSeq reps
            (these_args, rest_of_args) = splitAt n args