
For the record, this is the rewrite rule used by ByteString:
* https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.h..., calling https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.h....
This just wraps the `Addr#` directly, no copying here. However, [https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.h... append] does call `memcpy` twice. I don't think GHC has the kind of optimizations that can turn a `memcpy` call into SIMD instructions
#9577: String literals are wasting space -------------------------------------+------------------------------------- Reporter: xnyhps | Owner: xnyhps Type: bug | Status: new Priority: low | Milestone: Component: Compiler | Version: 7.8.2 (NCG) | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Unknown Unknown/Multiple | Blocked By: Type of failure: Runtime | Related Tickets: performance bug | Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by tibbe): Replying to [comment:6 xnyhps]: directly, but maybe `memcpy` is more efficient when called with aligned buffers. I'll try to test this. The memcpy implementation (used by e.g. `copyByteArray#`) does unroll memcpys of statically known size and alignment, if aligned to a word, so I definitely think we should try to align our data that way. In 7.10 (if Phab:D166 goes in) we'll do even better and use a `REP-MOVSB` instruction on Ivy bridge and newer. `REP-MOVSB` is almost as fast as an unrolled AVX loop. The memcpy unrolling is implemented in source:compiler/nativeGen/X86/CodeGen.hs. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9577#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler