
#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443 #11312 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by jscholl): Thinking about the problem again I decided to try to add {{{ByteArray#}}} literals to GHC. The idea is the following: - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence my try for a {{{String#}}} type. - Provide {{{#!haskell unpackStringLit# :: ByteArray# -> [Char] {-# INLINE[1] unpackStringLit# #-} unpackStringLit# ba# = unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#) }}} - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}} - Let rewrites fire in phase 2. - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to {{{unpackCStringWithLen# "foo"# 3#}}} - Thus most {{{ByteArray#}}}s should get eliminated and binary size should stay more or less the same. - If someone rewrites something like {{{ByteString.pack (unpackStringLit# lit)}}}, the literal is not eliminated and emitted to the binary. Thus a {{{ByteString}}} literal can increase binary size. However, I think this is what we want because we save making a copy of the data. - The downside is that turning optimization off causes the compiler to create a {{{ByteArray#}}} for every string literal instead of a c-string. GHCi will also allocate {{{ByteArray#}}}s instead of string literals directly. I currently implemented the new literal type, extended the parser, changed the desugaring, added the needed rules, taught GHCi to handle {{{ByteArray#}}} literals, and generated cmm Code for them. I still have to look at all parts involved for fusion and other string rules to work, check how the change affects bootstrapping with an older compiler, take a look at template haskell, and whether there are any typeable/generic things involved. I don't know how everyone feels about adding another literal type (especially because there are now two similar types, {{{Addr#}}} and {{{ByteArray#}}}, but if we want to keep binary sizes down, we need some form of {{{Addr#}}}, and it seems like having {{{ByteArray#}}} is beneficial too). Or whether it is reasonable to provide syntax for {{{ByteArray#}}}s (letting the compiler generate them would be enough for this ticket). But right now implementing it like this feels a lot better than my previous approach using unboxed tuples. And I am sorry for not saying or doing anything such a long time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:74 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler