
#9577: String literals are wasting space -------------------------------------+------------------------------------- Reporter: xnyhps | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler (NCG) | Version: 7.8.2 Keywords: | Operating System: Architecture: Unknown/Multiple | Unknown/Multiple Difficulty: Unknown | Type of failure: Runtime Blocked By: | performance bug Related Tickets: | Test Case: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- For [https://phabricator.haskell.org/D199 D199] I looked into how string literals are compiled down by GHC. On 64-bit OS X, a simple string `"AAA"` turns into assembly: {{{ .const .align 3 .align 0 c38E_str: .byte 65 .byte 65 .byte 65 .byte 0 }}} (And also something that invokes `unpackCString#`, but that isn't relevant here.) (`MkCore.mkStringExprFS` -> `CmmUtils.mkByteStringCLit` -> `compiler/nativeGen/X86/Ppr.pprSectionHeader`.) Note how this: * Is 8 byte aligned. * Is a `.const` section. I can't find any reason why string literals would need to be 8-byte aligned on OS X. There might be a small benefit in performance to read data starting 8-byte aligned, but I doubt doing that for string literals would be a meaningful difference. Assembly from both clang and gcc does not align string literals. The trivial program: {{{#!hs main :: IO () main = return () }}} has almost 5kB of wasted space of padding between all strings the Prelude brings in, built with GHC HEAD. The fact that it is a `.const` section, instead of `.cstring` (https://developer.apple.com/library/mac/documentation/DeveloperTools/Referen...) means duplicate strings aren't shared by the assembler. GHC floats out string literals to the top-level and uses CSE to eliminate duplicates, but that only works in a single modules. Strings shared between different modules end up as duplicate strings in an executable. The same program as above also has ~4kB of wasted space due to duplicate Prelude strings (`"base"` occurs 16 times!). Compared to the total binary size (4MB after stripping), removing this redundant data wouldn't be a big improvement (0.2%), but I still think it can be a worthwile optimization. I think this can be solved quite easily by creating a new section header for literal strings, which is unaligned and of type `.cstring`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9577 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler