
Thinking about the problem again I decided to try to add {{{ByteArray#}}} literals to GHC. The idea is the following: - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence my try for a {{{String#}}} type. - Provide {{{#!haskell unpackStringLit# :: ByteArray# -> [Char] {-# INLINE[1] unpackStringLit# #-} unpackStringLit# ba# = unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#) }}} - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}} - Let rewrites fire in phase 2. - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to {{{unpackCStringWithLen# "foo"# 3#}}} - Thus most {{{ByteArray#}}}s should get eliminated and binary size should stay more or less the same. - If someone rewrites something like {{{ByteString.pack (unpackStringLit# lit)}}}, the literal is not eliminated and emitted to
- The downside is that turning optimization off causes the compiler to create a {{{ByteArray#}}} for every string literal instead of a c-string. GHCi will also allocate {{{ByteArray#}}}s instead of string literals
#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: strings Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443 #11312, #9719 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): Replying to [comment:74 jscholl]: the binary. Thus a {{{ByteString}}} literal can increase binary size. However, I think this is what we want because we save making a copy of the data. directly.
The problem is, old `Addr#` unpacking differ from `ByteArray#` unpacking in that they are not the same encoding: they don't agree on how `\NUL` char get encoded, (at least I'm expecting `ByteArray#` is standard UTF-8 encoded). So you can't cast them with rewrite rules like that: you have to mention the encoding pitfall. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:81 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler