[GHC] #14701: Investigate the performance impact of code alignment

#14701: Investigate the performance impact of code alignment -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Maybe ghc's performance also varies due to reasons like that: https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues The gist of the article is that tight loops can have significantly different performance depending on whether the location of the assembly instructions themselves cross a cache line. I would not have expected this to make double-digit percentage differences. From `#ghc`: {{{ bgamari: nh2[m], there are nofib tests where this is very likely the cause of a good portion of the variant angerman: nh2[m]: that linked LLVM talk from 2016 makes me not want to have to deal with that... AndreasK: nh2[m]: It's a real issue. But atm I think ghc at least in the native codegen make no real attempt to optimize for this thoughtpolice: GHC does not carry knowledge of alignment or anything, no. I’m not sure how difficult this is to suss out, but at least making sure every branch target does not cross a cache line is probably a good start thoughtpolice: Well, far jump, e.g. a call to a function. not sure how TNTC fits into this story, tbqh }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14701 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14701: Investigate the performance impact of code alignment -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14701#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14701: Investigate the performance impact of code alignment -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by michalt): Somewhat related: https://github.com/ccurtsinger/stabilizer (project based on LLVM aiming to make performance comparisons more robust by randomizing memory layout during execution; seems it's no longer maintained though) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14701#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14701: Investigate the performance impact of code alignment -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sjakobi): * cc: sjakobi (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14701#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14701: Investigate the performance impact of code alignment -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 (CodeGen) | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Code where this could be relevant is for example the binary searches we generate for case statements. {{{ some_func(bits64 r2) { bits64 var; _L1: if (r2 > 4) (likely: False) {goto _L9;} else {goto _L2;} _L2: if (r2 < 2) (likely: False) {goto _L6;} else {goto _L3;} _L3: if (r2 < 3) {goto _L5;} else {goto _L4;} _L4: var = 322; return (var); _L5: var = 222; return (var); _L6: if (r2 < 1) (likely: False) {goto _L8;} else {goto _L7;} _L7: var = 111; return (var); _L8: var = -1; return (var); _L9: if (r2 > 6) (likely: False) {goto _L8;} else {goto _L10;} _L10: if (r2 < 5) {goto _L12;} else {goto _L11;} _L11: var = 522; return (var); _L12: var = 422; return (var); } }}} This is a pretty straight forward binary search and at the outside of the range returns -1. The last two blocks don't fit into a 64 Byte cache line. Doing anything useful in this example would be hard. One option would be to push the unlikely leaves of the search which usually represent an exceptional result like a match failure towards the end. But this would require assigning a weight to each block and seems like a lot of effort. {{{ ==================== Asm code ==================== 2018-03-03 18:34:09.5530413 UTC .section .text .align 8 .globl kasdf kasdf: _cx: cmpq $4,%rbx ja _c4 _c6: cmpq $2,%rbx jb _c8 _ca: cmpq $3,%rbx jb _cc _ce: movl $322,%ebx jmp *(%rbp) _c4: cmpq $6,%rbx ja _cm _cq: cmpq $5,%rbx jb _cs _cu: movl $522,%ebx jmp *(%rbp) _cs: movl $422,%ebx jmp *(%rbp) _cm: movq $-1,%rbx jmp *(%rbp) _c8: cmpq $1,%rbx jb _cm _ck: movl $111,%ebx jmp *(%rbp) _cc: movl $222,%ebx jmp *(%rbp) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14701#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC