Re: [GHC] #5218: Add unpackCStringLen# to create Strings from string literals

15 Aug 2017

      ...
Thinking about the problem again I decided to try to add
 {{{ByteArray#}}} literals to GHC. The idea is the following:
 - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence
 my try for a {{{String#}}} type.
 - Provide
{{{#!haskell
unpackStringLit# :: ByteArray# -> [Char]
{-# INLINE[1] unpackStringLit# #-}
unpackStringLit# ba# =
  unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#)
}}}
 - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}}
 - Let rewrites fire in phase 2.
 - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to
 {{{unpackCStringWithLen# "foo"# 3#}}}
 - Thus most {{{ByteArray#}}}s should get eliminated and binary size
 should stay more or less the same.
 - If someone rewrites something like {{{ByteString.pack
 (unpackStringLit# lit)}}}, the literal is not eliminated and emitted to
...
- The downside is that turning optimization off causes the compiler to
 create a {{{ByteArray#}}} for every string literal instead of a c-string.
 GHCi will also allocate {{{ByteArray#}}}s instead of string literals
#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:  strings
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):  Phab:D2443
  #11312, #9719                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by winter):

 Replying to [comment:74 jscholl]:
 the binary. Thus a {{{ByteString}}} literal can increase binary size.
 However, I think this is what we want because we save making a copy of the
 data.
 directly.
...
The problem is, old `Addr#` unpacking differ from `ByteArray#` unpacking
 in that they are not the same encoding: they don't agree on how `\NUL`
 char get encoded, (at least I'm expecting `ByteArray#` is standard UTF-8
 encoded). So you can't cast them with rewrite rules like that: you have to
 mention the encoding pitfall.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:81
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler