
#16052: Core optimizations for memset on a small range -------------------------------------+------------------------------------- Reporter: andrewthad | Owner: (none) Type: task | Status: new Priority: normal | Milestone: 8.9 Component: Compiler | Version: 8.6.3 Resolution: | Keywords: newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * keywords: => newcomer * milestone: => 8.9 Comment: Actually, now that I think of it, I believe the problem is that we don't know enough about the `ByteArray#`'s alignment for the inline memset logic to fire, given the current implementation. The current implementation is (found in `compiler/nativeGen/X86/CodeGen.hs`) is too strict in its condition: {{{#!hs genCCall dflags _ (PrimTarget (MO_Memset align)) _ [dst, CmmLit (CmmInt c _), CmmLit (CmmInt n _)] _ | fromInteger insns <= maxInlineMemsetInsns dflags && align .&. 3 == 0 = do }}} Another problem is that the codegen logic (`StgCmmPrim.doSetByteArrayOp`) makes too weak a claim about the alignment. It claims that the region is merely byte-aligned, even if the offset is aligned. Given that we know that the beginning of the bytearray is aligned to 16-bytes, we should be able to do better here (and the copy primops). Fixing this would be a nice project. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16052#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler