Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC

Commits:

7 changed files:

Changes:

  • compiler/GHC/Cmm/Sink.hs
    ... ... @@ -26,76 +26,74 @@ import Data.Maybe
    26 26
     
    
    27 27
     import GHC.Exts (inline)
    
    28 28
     
    
    29
    --- -----------------------------------------------------------------------------
    
    30
    --- Sinking and inlining
    
    29
    +--------------------------------------------------------------------------------
    
    30
    +{- Note [Sinking and inlining]
    
    31
    +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    32
    +Sinking is an optimisation pass that
    
    33
    + (a) moves assignments closer to their uses, to reduce register pressure
    
    34
    + (b) pushes assignments into a single branch of a conditional if possible
    
    35
    + (c) inlines assignments to registers that are mentioned only once
    
    36
    + (d) discards dead assignments
    
    31 37
     
    
    32
    --- This is an optimisation pass that
    
    33
    ---  (a) moves assignments closer to their uses, to reduce register pressure
    
    34
    ---  (b) pushes assignments into a single branch of a conditional if possible
    
    35
    ---  (c) inlines assignments to registers that are mentioned only once
    
    36
    ---  (d) discards dead assignments
    
    37
    ---
    
    38
    --- This tightens up lots of register-heavy code.  It is particularly
    
    39
    --- helpful in the Cmm generated by the Stg->Cmm code generator, in
    
    40
    --- which every function starts with a copyIn sequence like:
    
    41
    ---
    
    42
    ---    x1 = R1
    
    43
    ---    x2 = Sp[8]
    
    44
    ---    x3 = Sp[16]
    
    45
    ---    if (Sp - 32 < SpLim) then L1 else L2
    
    46
    ---
    
    47
    --- we really want to push the x1..x3 assignments into the L2 branch.
    
    48
    ---
    
    49
    --- Algorithm:
    
    50
    ---
    
    51
    ---  * Start by doing liveness analysis.
    
    52
    ---
    
    53
    ---  * Keep a list of assignments A; earlier ones may refer to later ones.
    
    54
    ---    Currently we only sink assignments to local registers, because we don't
    
    55
    ---    have liveness information about global registers.
    
    56
    ---
    
    57
    ---  * Walk forwards through the graph, look at each node N:
    
    58
    ---
    
    59
    ---    * If it is a dead assignment, i.e. assignment to a register that is
    
    60
    ---      not used after N, discard it.
    
    61
    ---
    
    62
    ---    * Try to inline based on current list of assignments
    
    63
    ---      * If any assignments in A (1) occur only once in N, and (2) are
    
    64
    ---        not live after N, inline the assignment and remove it
    
    65
    ---        from A.
    
    66
    ---
    
    67
    ---      * If an assignment in A is cheap (RHS is local register), then
    
    68
    ---        inline the assignment and keep it in A in case it is used afterwards.
    
    69
    ---
    
    70
    ---      * Otherwise don't inline.
    
    71
    ---
    
    72
    ---    * If N is assignment to a local register pick up the assignment
    
    73
    ---      and add it to A.
    
    74
    ---
    
    75
    ---    * If N is not an assignment to a local register:
    
    76
    ---      * remove any assignments from A that conflict with N, and
    
    77
    ---        place them before N in the current block.  We call this
    
    78
    ---        "dropping" the assignments.
    
    79
    ---
    
    80
    ---      * An assignment conflicts with N if it:
    
    81
    ---        - assigns to a register mentioned in N
    
    82
    ---        - mentions a register assigned by N
    
    83
    ---        - reads from memory written by N
    
    84
    ---      * do this recursively, dropping dependent assignments
    
    85
    ---
    
    86
    ---    * At an exit node:
    
    87
    ---      * drop any assignments that are live on more than one successor
    
    88
    ---        and are not trivial
    
    89
    ---      * if any successor has more than one predecessor (a join-point),
    
    90
    ---        drop everything live in that successor. Since we only propagate
    
    91
    ---        assignments that are not dead at the successor, we will therefore
    
    92
    ---        eliminate all assignments dead at this point. Thus analysis of a
    
    93
    ---        join-point will always begin with an empty list of assignments.
    
    94
    ---
    
    95
    ---
    
    96
    --- As a result of above algorithm, sinking deletes some dead assignments
    
    97
    --- (transitively, even).  This isn't as good as removeDeadAssignments,
    
    98
    --- but it's much cheaper.
    
    38
    +This tightens up lots of register-heavy code.  It is particularly
    
    39
    +helpful in the Cmm generated by the Stg->Cmm code generator, in
    
    40
    +which every function starts with a copyIn sequence like:
    
    41
    +
    
    42
    +   x1 = R1
    
    43
    +   x2 = Sp[8]
    
    44
    +   x3 = Sp[16]
    
    45
    +   if (Sp - 32 < SpLim) then L1 else L2
    
    46
    +
    
    47
    +we really want to push the x1..x3 assignments into the L2 branch.
    
    48
    +
    
    49
    +Algorithm:
    
    50
    +
    
    51
    + * Start by doing liveness analysis.
    
    52
    +
    
    53
    + * Keep a list of assignments A; earlier ones may refer to later ones.
    
    54
    +   Currently we only sink assignments to local registers, because we don't
    
    55
    +   have liveness information about global registers.
    
    56
    +
    
    57
    + * Walk forwards through the graph, look at each node N:
    
    58
    +
    
    59
    +   * If it is a dead assignment, i.e. assignment to a register that is
    
    60
    +     not used after N, discard it.
    
    61
    +
    
    62
    +   * Try to inline based on current list of assignments
    
    63
    +     * If any assignments in A (1) occur only once in N, and (2) are
    
    64
    +       not live after N, inline the assignment and remove it
    
    65
    +       from A.
    
    66
    +
    
    67
    +     * If an assignment in A is cheap (RHS is local register), then
    
    68
    +       inline the assignment and keep it in A in case it is used afterwards.
    
    69
    +
    
    70
    +     * Otherwise don't inline.
    
    71
    +
    
    72
    +   * If N is an assignment to a local register, pick up the assignment
    
    73
    +     and add it to A.
    
    74
    +
    
    75
    +   * If N is not an assignment to a local register:
    
    76
    +     * remove any assignments from A that conflict with N, and
    
    77
    +       place them before N in the current block.  We call this
    
    78
    +       "dropping" the assignments.
    
    79
    +       (See Note [When does an assignment conflict?] for what it means for
    
    80
    +        A to conflict with N.)
    
    81
    +
    
    82
    +     * do this recursively, dropping dependent assignments
    
    83
    +
    
    84
    +   * At an exit node:
    
    85
    +     * drop any assignments that are live on more than one successor
    
    86
    +       and are not trivial
    
    87
    +     * if any successor has more than one predecessor (a join-point),
    
    88
    +       drop everything live in that successor. Since we only propagate
    
    89
    +       assignments that are not dead at the successor, we will therefore
    
    90
    +       eliminate all assignments dead at this point. Thus analysis of a
    
    91
    +       join-point will always begin with an empty list of assignments.
    
    92
    +
    
    93
    +As a result of above algorithm, sinking deletes some dead assignments
    
    94
    +(transitively, even).  This isn't as good as removeDeadAssignments,
    
    95
    +but it's much cheaper.
    
    96
    +-}
    
    99 97
     
    
    100 98
     -- -----------------------------------------------------------------------------
    
    101 99
     -- things that we aren't optimising very well yet.
    
    ... ... @@ -648,110 +646,171 @@ okToInline _ _ _ = True
    648 646
     
    
    649 647
     -- -----------------------------------------------------------------------------
    
    650 648
     
    
    649
    +{- Note [When does an assignment conflict?]
    
    650
    +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    651
    +An assignment 'A' conflicts with a statement 'N' if any of the following
    
    652
    +conditions are satisfied:
    
    653
    +
    
    654
    +  (C1) 'A' assigns to a register mentioned in 'N'
    
    655
    +  (C2) 'A' mentions a register assigned by 'N'
    
    656
    +  (C3) 'A' reads from memory written by 'N'
    
    657
    +
    
    658
    +In such a situation, it is not safe to commute 'A' past 'N'. For example,
    
    659
    +it is not safe to commute
    
    660
    +
    
    661
    +  A: r = 1
    
    662
    +  N: s = r
    
    663
    +
    
    664
    +because 'r' may be undefined or hold a different value before 'A'.
    
    665
    +
    
    666
    +Remarks:
    
    667
    +
    
    668
    +  (C3) includes all foreign calls, as they may modify the heap/stack.
    
    669
    +
    
    670
    +  (C1) includes the following two situations:
    
    671
    +
    
    672
    +    (C1a) 'N' defines the LHS register in the assignment 'A', for example:
    
    673
    +
    
    674
    +      A: r = <expr>
    
    675
    +      N: r = <other_expr>
    
    676
    +
    
    677
    +    (C1b) 'N' defines a register used in the RHS of 'A', for example:
    
    678
    +
    
    679
    +      A: r = s
    
    680
    +      N: s = <expr>
    
    681
    +
    
    682
    +    (C1c) 'suspendThread' clobbers every global register not backed by a
    
    683
    +          real register, as noted in #19237.
    
    684
    +
    
    685
    +Forgetting (C1a) led to bug #26550, in which we incorrectly commuted
    
    686
    +
    
    687
    +  A: _c1rB::Fx2V128 = <0.0 :: W64, 0.0 :: W64>
    
    688
    +  N: _c1rB::Fx2V128 = %MO_VF_Insert_2_W64(<0.0 :: W64,0.0 :: W64>,%MO_F_Add_W64(F64[R1 + 7], 3.0 :: W64),0 :: W32)
    
    689
    +
    
    690
    +-}
    
    691
    +
    
    651 692
     -- | @conflicts (r,e) node@ is @False@ if and only if the assignment
    
    652 693
     -- @r = e@ can be safely commuted past statement @node@.
    
    694
    +--
    
    695
    +-- See Note [When does an assignment conflict?].
    
    653 696
     conflicts :: Platform -> Assignment -> CmmNode O x -> Bool
    
    654
    -conflicts platform (r, rhs, addr) node
    
    697
    +conflicts platform assig@(r, rhs, addr) node
    
    655 698
     
    
    656
    -  -- (1) node defines registers used by rhs of assignment. This catches
    
    657
    -  -- assignments and all three kinds of calls. See Note [Sinking and calls]
    
    658
    -  | globalRegistersConflict platform rhs node                       = True
    
    659
    -  | localRegistersConflict  platform rhs node                       = True
    
    699
    +  -- (C1) node defines registers that are either the assigned register or
    
    700
    +  -- are used by the rhs of the assignment.
    
    701
    +  -- This catches assignments and all three kinds of calls.
    
    702
    +  -- See Note [Sinking and calls]
    
    703
    +  | globalRegistersConflict platform rhs   node                     = True
    
    704
    +  | localRegistersConflict  platform assig node                     = True
    
    660 705
     
    
    661
    -  -- (2) node uses register defined by assignment
    
    706
    +  -- (C2) node uses register defined by assignment
    
    662 707
       | foldRegsUsed platform (\b r' -> r == r' || b) False node        = True
    
    663 708
     
    
    664
    -  -- (3) a store to an address conflicts with a read of the same memory
    
    709
    +  -- (C3) Node writes to memory that is read by the assignment.
    
    710
    +
    
    711
    +  -- (a) a store to an address conflicts with a read of the same memory
    
    665 712
       | CmmStore addr' e _ <- node
    
    666 713
       , memConflicts addr (loadAddr platform addr' (cmmExprWidth platform e)) = True
    
    667 714
     
    
    668
    -  -- (4) an assignment to Hp/Sp conflicts with a heap/stack read respectively
    
    669
    -  | HeapMem    <- addr, CmmAssign (CmmGlobal (GlobalRegUse Hp _)) _ <- node        = True
    
    670
    -  | StackMem   <- addr, CmmAssign (CmmGlobal (GlobalRegUse Sp _)) _ <- node        = True
    
    671
    -  | SpMem{}    <- addr, CmmAssign (CmmGlobal (GlobalRegUse Sp _)) _ <- node        = True
    
    715
    +  -- (b) an assignment to Hp/Sp conflicts with a heap/stack read respectively
    
    716
    +  | CmmAssign (CmmGlobal (GlobalRegUse Hp _)) _ <- node
    
    717
    +  , memConflicts addr HeapMem
    
    718
    +  = True
    
    719
    +  | CmmAssign (CmmGlobal (GlobalRegUse Sp _)) _ <- node
    
    720
    +  , memConflicts addr StackMem
    
    721
    +  = True
    
    672 722
     
    
    673
    -  -- (5) foreign calls clobber heap: see Note [Foreign calls clobber heap]
    
    723
    +  -- (c) foreign calls clobber heap: see Note [Foreign calls clobber heap]
    
    674 724
       | CmmUnsafeForeignCall{} <- node, memConflicts addr AnyMem      = True
    
    675 725
     
    
    676
    -  -- (6) suspendThread clobbers every global register not backed by a real
    
    677
    -  -- register. It also clobbers heap and stack but this is handled by (5)
    
    726
    +  -- (d) native calls clobber any memory
    
    727
    +  | CmmCall{} <- node, memConflicts addr AnyMem                   = True
    
    728
    +
    
    729
    +  -- (C1c) suspendThread clobbers every global register not backed by a real
    
    730
    +  -- register. (It also clobbers heap and stack, but this is handled by (C3)(c) above.)
    
    678 731
       | CmmUnsafeForeignCall (PrimTarget MO_SuspendThread) _ _ <- node
    
    679 732
       , foldRegsUsed platform (\b g -> globalRegMaybe platform g == Nothing || b) False rhs
    
    680 733
       = True
    
    681 734
     
    
    682
    -  -- (7) native calls clobber any memory
    
    683
    -  | CmmCall{} <- node, memConflicts addr AnyMem                   = True
    
    684
    -
    
    685
    -  -- (8) otherwise, no conflict
    
    686 735
       | otherwise = False
    
    687 736
     
    
    688 737
     {- Note [Inlining foldRegsDefd]
    
    689
    -   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    690
    -   foldRegsDefd is, after optimization, *not* a small function so
    
    691
    -   it's only marked INLINEABLE, but not INLINE.
    
    738
    +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    739
    +foldRegsDefd is, after optimization, *not* a small function so
    
    740
    +it's only marked INLINEABLE, but not INLINE.
    
    692 741
     
    
    693
    -   However in some specific cases we call it *very* often making it
    
    694
    -   important to avoid the overhead of allocating the folding function.
    
    695
    -
    
    696
    -   So we simply force inlining via the magic inline function.
    
    697
    -   For T3294 this improves allocation with -O by ~1%.
    
    742
    +However in some specific cases we call it *very* often making it
    
    743
    +important to avoid the overhead of allocating the folding function.
    
    698 744
     
    
    745
    +So we simply force inlining via the magic inline function.
    
    746
    +For T3294 this improves allocation with -O by ~1%.
    
    699 747
     -}
    
    700 748
     
    
    701
    --- Returns True if node defines any global registers that are used in the
    
    702
    --- Cmm expression
    
    749
    +-- | Returns @True@ if @node@ defines any global registers that are used in the
    
    750
    +-- Cmm expression.
    
    751
    +--
    
    752
    +-- See (C1) in Note [When does an assignment conflict?].
    
    703 753
     globalRegistersConflict :: Platform -> CmmExpr -> CmmNode e x -> Bool
    
    704 754
     globalRegistersConflict platform expr node =
    
    705 755
        -- See Note [Inlining foldRegsDefd]
    
    706 756
        inline foldRegsDefd platform (\b r -> b || globalRegUsedIn platform (globalRegUse_reg r) expr)
    
    707 757
                     False node
    
    758
    +    -- NB: no need to worry about (C1a), as the LHS of an assignment is always
    
    759
    +    -- a local register, never a global register.
    
    708 760
     
    
    709
    --- Returns True if node defines any local registers that are used in the
    
    710
    --- Cmm expression
    
    711
    -localRegistersConflict :: Platform -> CmmExpr -> CmmNode e x -> Bool
    
    712
    -localRegistersConflict platform expr node =
    
    713
    -    -- See Note [Inlining foldRegsDefd]
    
    714
    -    inline foldRegsDefd platform (\b r -> b || regUsedIn platform (CmmLocal  r) expr)
    
    715
    -                 False node
    
    716
    -
    
    717
    --- Note [Sinking and calls]
    
    718
    --- ~~~~~~~~~~~~~~~~~~~~~~~~
    
    719
    --- We have three kinds of calls: normal (CmmCall), safe foreign (CmmForeignCall)
    
    720
    --- and unsafe foreign (CmmUnsafeForeignCall). We perform sinking pass after
    
    721
    --- stack layout (see Note [Sinking after stack layout]) which leads to two
    
    722
    --- invariants related to calls:
    
    723
    ---
    
    724
    ---   a) during stack layout phase all safe foreign calls are turned into
    
    725
    ---      unsafe foreign calls (see Note [Lower safe foreign calls]). This
    
    726
    ---      means that we will never encounter CmmForeignCall node when running
    
    727
    ---      sinking after stack layout
    
    728
    ---
    
    729
    ---   b) stack layout saves all variables live across a call on the stack
    
    730
    ---      just before making a call (remember we are not sinking assignments to
    
    731
    ---      stack):
    
    732
    ---
    
    733
    ---       L1:
    
    734
    ---          x = R1
    
    735
    ---          P64[Sp - 16] = L2
    
    736
    ---          P64[Sp - 8]  = x
    
    737
    ---          Sp = Sp - 16
    
    738
    ---          call f() returns L2
    
    739
    ---       L2:
    
    740
    ---
    
    741
    ---      We will attempt to sink { x = R1 } but we will detect conflict with
    
    742
    ---      { P64[Sp - 8]  = x } and hence we will drop { x = R1 } without even
    
    743
    ---      checking whether it conflicts with { call f() }. In this way we will
    
    744
    ---      never need to check any assignment conflicts with CmmCall. Remember
    
    745
    ---      that we still need to check for potential memory conflicts.
    
    746
    ---
    
    747
    --- So the result is that we only need to worry about CmmUnsafeForeignCall nodes
    
    748
    --- when checking conflicts (see Note [Unsafe foreign calls clobber caller-save registers]).
    
    749
    --- This assumption holds only when we do sinking after stack layout. If we run
    
    750
    --- it before stack layout we need to check for possible conflicts with all three
    
    751
    --- kinds of calls. Our `conflicts` function does that by using a generic
    
    752
    --- foldRegsDefd and foldRegsUsed functions defined in DefinerOfRegs and
    
    753
    --- UserOfRegs typeclasses.
    
    761
    +-- | Given an assignment @local_reg := expr@, return @True@ if @node@ defines any
    
    762
    +-- local registers mentioned in the assignment.
    
    754 763
     --
    
    764
    +-- See (C1) in Note [When does an assignment conflict?].
    
    765
    +localRegistersConflict :: Platform -> Assignment -> CmmNode e x -> Bool
    
    766
    +localRegistersConflict platform (r, expr, _) node =
    
    767
    +    -- See Note [Inlining foldRegsDefd]
    
    768
    +    inline foldRegsDefd platform
    
    769
    +      (\b r' ->
    
    770
    +           b
    
    771
    +        || r' == r -- (C1a)
    
    772
    +        || regUsedIn platform (CmmLocal r') expr -- (C1b)
    
    773
    +      )
    
    774
    +      False node
    
    775
    +
    
    776
    +{- Note [Sinking and calls]
    
    777
    +~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    778
    +We have three kinds of calls: normal (CmmCall), safe foreign (CmmForeignCall)
    
    779
    +and unsafe foreign (CmmUnsafeForeignCall). We perform sinking pass after
    
    780
    +stack layout (see Note [Sinking after stack layout]) which leads to two
    
    781
    +invariants related to calls:
    
    782
    +
    
    783
    +  a) during stack layout phase all safe foreign calls are turned into
    
    784
    +     unsafe foreign calls (see Note [Lower safe foreign calls]). This
    
    785
    +     means that we will never encounter CmmForeignCall node when running
    
    786
    +     sinking after stack layout
    
    787
    +
    
    788
    +  b) stack layout saves all variables live across a call on the stack
    
    789
    +     just before making a call (remember we are not sinking assignments to
    
    790
    +     stack):
    
    791
    +
    
    792
    +      L1:
    
    793
    +         x = R1
    
    794
    +         P64[Sp - 16] = L2
    
    795
    +         P64[Sp - 8]  = x
    
    796
    +         Sp = Sp - 16
    
    797
    +         call f() returns L2
    
    798
    +      L2:
    
    799
    +
    
    800
    +     We will attempt to sink { x = R1 } but we will detect conflict with
    
    801
    +     { P64[Sp - 8]  = x } and hence we will drop { x = R1 } without even
    
    802
    +     checking whether it conflicts with { call f() }. In this way we will
    
    803
    +     never need to check any assignment conflicts with CmmCall. Remember
    
    804
    +     that we still need to check for potential memory conflicts.
    
    805
    +
    
    806
    +So the result is that we only need to worry about CmmUnsafeForeignCall nodes
    
    807
    +when checking conflicts (see Note [Unsafe foreign calls clobber caller-save registers]).
    
    808
    +This assumption holds only when we do sinking after stack layout. If we run
    
    809
    +it before stack layout we need to check for possible conflicts with all three
    
    810
    +kinds of calls. Our `conflicts` function does that by using a generic
    
    811
    +foldRegsDefd and foldRegsUsed functions defined in DefinerOfRegs and
    
    812
    +UserOfRegs typeclasses.
    
    813
    +-}
    
    755 814
     
    
    756 815
     -- An abstraction of memory read or written.
    
    757 816
     data AbsMem
    

  • compiler/GHC/CmmToAsm/Reg/Linear.hs
    ... ... @@ -504,8 +504,8 @@ genRaInsn block_live new_instrs block_id instr r_dying w_dying = do
    504 504
       platform <- getPlatform
    
    505 505
       case regUsageOfInstr platform instr of { RU read written ->
    
    506 506
         do
    
    507
    -    let real_written = [ rr                      | RegWithFormat {regWithFormat_reg = RegReal rr} <- written ]
    
    508
    -    let virt_written = [ VirtualRegWithFormat vr fmt | RegWithFormat (RegVirtual vr) fmt         <- written ]
    
    507
    +    let real_written = [ rr                          | RegWithFormat {regWithFormat_reg = RegReal rr} <- written ]
    
    508
    +    let virt_written = [ VirtualRegWithFormat vr fmt | RegWithFormat (RegVirtual vr) fmt              <- written ]
    
    509 509
     
    
    510 510
         -- we don't need to do anything with real registers that are
    
    511 511
         -- only read by this instr.  (the list is typically ~2 elements,
    
    ... ... @@ -939,35 +939,39 @@ allocRegsAndSpill_spill reading keep spills alloc r@(VirtualRegWithFormat vr fmt
    939 939
     
    
    940 940
                             -- we have a temporary that is in both register and mem,
    
    941 941
                             -- just free up its register for use.
    
    942
    -                        | (temp, (RealRegUsage my_reg _old_fmt), slot) : _ <- candidates_inBoth
    
    943
    -                        = do    spills' <- loadTemp r spill_loc my_reg spills
    
    942
    +                        | (temp, (RealRegUsage cand_reg _old_fmt), slot) : _ <- candidates_inBoth
    
    943
    +                        = do    spills' <- loadTemp r spill_loc cand_reg spills
    
    944 944
                                     let assig1  = addToUFM_Directly assig temp (InMem slot)
    
    945
    -                                let assig2  = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage my_reg fmt)
    
    945
    +                                let assig2  = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg fmt)
    
    946 946
     
    
    947 947
                                     setAssigR $ toRegMap assig2
    
    948
    -                                allocateRegsAndSpill reading keep spills' (my_reg:alloc) rs
    
    948
    +                                allocateRegsAndSpill reading keep spills' (cand_reg:alloc) rs
    
    949 949
     
    
    950 950
                             -- otherwise, we need to spill a temporary that currently
    
    951 951
                             -- resides in a register.
    
    952
    -                        | (temp_to_push_out, RealRegUsage my_reg fmt) : _
    
    952
    +                        | (temp_to_push_out, RealRegUsage cand_reg old_reg_fmt) : _
    
    953 953
                                             <- candidates_inReg
    
    954 954
                             = do
    
    955
    -                                (spill_store, slot) <- spillR (RegWithFormat (RegReal my_reg) fmt) temp_to_push_out
    
    955
    +                                -- Spill what's currently in the register, with the format of what's in the register.
    
    956
    +                                (spill_store, slot) <- spillR (RegWithFormat (RegReal cand_reg) old_reg_fmt) temp_to_push_out
    
    956 957
     
    
    957 958
                                     -- record that this temp was spilled
    
    958 959
                                     recordSpill (SpillAlloc temp_to_push_out)
    
    959 960
     
    
    960
    -                                -- update the register assignment
    
    961
    +                                -- Update the register assignment:
    
    962
    +                                --  - the old data is now only in memory,
    
    963
    +                                --  - the new data is now allocated to this register;
    
    964
    +                                --    make sure to use the new format (#26542)
    
    961 965
                                     let assig1  = addToUFM_Directly assig temp_to_push_out (InMem slot)
    
    962
    -                                let assig2  = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage my_reg fmt)
    
    966
    +                                let assig2  = addToUFM assig1 vr $! newLocation spill_loc (RealRegUsage cand_reg fmt)
    
    963 967
                                     setAssigR $ toRegMap assig2
    
    964 968
     
    
    965 969
                                     -- if need be, load up a spilled temp into the reg we've just freed up.
    
    966
    -                                spills' <- loadTemp r spill_loc my_reg spills
    
    970
    +                                spills' <- loadTemp r spill_loc cand_reg spills
    
    967 971
     
    
    968 972
                                     allocateRegsAndSpill reading keep
    
    969 973
                                             (spill_store ++ spills')
    
    970
    -                                        (my_reg:alloc) rs
    
    974
    +                                        (cand_reg:alloc) rs
    
    971 975
     
    
    972 976
     
    
    973 977
                             -- there wasn't anything to spill, so we're screwed.
    

  • testsuite/tests/simd/should_run/T26542.hs
    1
    +{-# LANGUAGE MagicHash #-}
    
    2
    +{-# LANGUAGE UnboxedTuples #-}
    
    3
    +
    
    4
    +module Main where
    
    5
    +
    
    6
    +import GHC.Exts
    
    7
    +
    
    8
    +type D8# = (# DoubleX2#, Double#, DoubleX2#, Double#, DoubleX2# #)
    
    9
    +type D8  = (Double, Double, Double, Double, Double, Double, Double, Double)
    
    10
    +
    
    11
    +unD# :: Double -> Double#
    
    12
    +unD# (D# x) = x
    
    13
    +
    
    14
    +mkD8# :: Double -> D8#
    
    15
    +mkD8# x =
    
    16
    +  (# packDoubleX2# (# unD# x, unD# (x + 1) #)
    
    17
    +   , unD# (x + 2)
    
    18
    +   , packDoubleX2# (# unD# (x + 3), unD# (x + 4) #)
    
    19
    +   , unD# (x + 5)
    
    20
    +   , packDoubleX2# (# unD# (x + 6), unD# (x + 7) #)
    
    21
    +   #)
    
    22
    +{-# NOINLINE mkD8# #-}
    
    23
    +
    
    24
    +unD8# :: D8# -> D8
    
    25
    +unD8# (# v0, x2, v1, x5, v2 #) =
    
    26
    +  case unpackDoubleX2# v0 of
    
    27
    +    (# x0, x1 #) ->
    
    28
    +      case unpackDoubleX2# v1 of
    
    29
    +        (# x3, x4 #) ->
    
    30
    +          case unpackDoubleX2# v2 of
    
    31
    +            (# x6, x7 #) ->
    
    32
    +              (D# x0, D# x1, D# x2, D# x3, D# x4, D# x5, D# x6, D# x7)
    
    33
    +{-# NOINLINE unD8# #-}
    
    34
    +
    
    35
    +type D32# = (# D8#, D8#, D8#, D8# #)
    
    36
    +type D32  = (D8, D8, D8, D8)
    
    37
    +
    
    38
    +mkD32# :: Double -> D32#
    
    39
    +mkD32# x = (# mkD8# x, mkD8# (x + 8), mkD8# (x + 16), mkD8# (x + 24) #)
    
    40
    +{-# NOINLINE mkD32# #-}
    
    41
    +
    
    42
    +unD32# :: D32# -> D32
    
    43
    +unD32# (# x0, x1, x2, x3 #) =
    
    44
    +  (unD8# x0, unD8# x1, unD8# x2, unD8# x3)
    
    45
    +{-# NOINLINE unD32# #-}
    
    46
    +
    
    47
    +main :: IO ()
    
    48
    +main = do
    
    49
    +  let
    
    50
    +    !x = mkD32# 0
    
    51
    +    !ds = unD32# x
    
    52
    +  print ds

  • testsuite/tests/simd/should_run/T26542.stdout
    1
    +((0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0),(8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0),(16.0,17.0,18.0,19.0,20.0,21.0,22.0,23.0),(24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0))

  • testsuite/tests/simd/should_run/T26550.hs
    1
    +{-# LANGUAGE MagicHash #-}
    
    2
    +{-# LANGUAGE UnboxedTuples #-}
    
    3
    +
    
    4
    +module Main where
    
    5
    +
    
    6
    +import GHC.Exts
    
    7
    +
    
    8
    +type D3# = (# Double#, DoubleX2# #)
    
    9
    +
    
    10
    +unD# :: Double -> Double#
    
    11
    +unD# (D# x) = x
    
    12
    +
    
    13
    +mkD3# :: Double -> D3#
    
    14
    +mkD3# x =
    
    15
    +  (# unD# (x + 2)
    
    16
    +   , packDoubleX2# (# unD# (x + 3), unD# (x + 4) #)
    
    17
    +   #)
    
    18
    +{-# NOINLINE mkD3# #-}
    
    19
    +
    
    20
    +main :: IO ()
    
    21
    +main = do
    
    22
    +  let
    
    23
    +    !(# _ten, eleven_twelve #) = mkD3# 8
    
    24
    +    !(# eleven, twelve #) = unpackDoubleX2# eleven_twelve
    
    25
    +
    
    26
    +  putStrLn $ unlines
    
    27
    +    [ "eleven: " ++ show (D# eleven)
    
    28
    +    , "twelve: " ++ show (D# twelve)
    
    29
    +    ]

  • testsuite/tests/simd/should_run/T26550.stdout
    1
    +eleven: 11.0
    
    2
    +twelve: 12.0
    
    3
    +

  • testsuite/tests/simd/should_run/all.T
    ... ... @@ -51,6 +51,11 @@ test('int64x2_shuffle_baseline', [], compile_and_run, [''])
    51 51
     test('T25658', [], compile_and_run, ['']) # #25658 is a bug with SSE2 code generation
    
    52 52
     test('T25659', [], compile_and_run, [''])
    
    53 53
     
    
    54
    +# This test case uses SIMD instructions, even though the bug isn't in any way
    
    55
    +# tied to SIMD registers. It's useful to include it in this file so that
    
    56
    +# we re-use the logic for which architectures to run the test on.
    
    57
    +test('T26550', [], compile_and_run, ['-O1 -fno-worker-wrapper'])
    
    58
    +
    
    54 59
     # Ensure we set the CPU features we have available.
    
    55 60
     #
    
    56 61
     # This is especially important with the LLVM backend, as LLVM can otherwise
    
    ... ... @@ -139,6 +144,7 @@ test('T22187', [],compile,[''])
    139 144
     test('T22187_run', [],compile_and_run,[''])
    
    140 145
     test('T25062_V16', [], compile_and_run, [''])
    
    141 146
     test('T25561', [], compile_and_run, [''])
    
    147
    +test('T26542', [], compile_and_run, [''])
    
    142 148
     
    
    143 149
     # Even if the CPU we run on doesn't support *executing* those tests we should try to
    
    144 150
     # compile them.