
#8279: bad alignment in code gen yields substantial perf issue --------------------------------------------+------------------------------ Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 7.7 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime performance bug | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: #8082 --------------------------------------------+------------------------------ Comment (by schyler): Little bit of research uncovers that Intel recommends aligning code and branch targets on 16-byte boundaries: `3.4.1.5 - Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch targets should be 16-byte aligned.` The reasons for this are as follows; * Aligning to a 16-byte boundary means that it's more likely loops will fall inside a single cacheline rather than inside 2. Falling on a boundary in a loop has a really noticeable negative performance impact. * Forward small relative jumps that lie inside the same cache line are handled more efficiently by the pipeline than far jumps Note that we have to be really careful not to turn a small relative jump into a larger jump by altering alignment slightly, because these tend to be slower (see pt. 2). Tail calls may benefit from aligning both the top of tables (for aligned reading) and the top of actual function entry points (for aligned iteration) to 16 bytes. This requires experimentation and benchmarking. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8279#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler