
#11312: GHC inlining primitive string literals can affect program output -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Incorrect result | Unknown/Multiple at runtime | Test Case: Blocked By: | Blocking: Related Tickets: #11292 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: ekmett, core-libraries-committee (added) Comment: Let's separate two things: * Top-level unboxed string literals: #8472 * Not using `Addr#` for string literals: this ticket. Here's my summary for this ticket, after talking to Simon M. * It's plain wrong to use `Addr#` as the type of a string literal. If we do so, there is no reliable way to compute equality for {{{ data T = MkT Addr# deriving( Eq ) }}} Since the `Addr#` might come from `malloc` or something, it must compare using equality on `Addr#`. But then there is no guarantee that `MkT "foo"# == MkT "foo"#`. * So we need a new type for unlifted string literals, say `String#`. It could be primitive, and that's what I'll assume for now. * Of course the underlying representation will be the same as `Addr#`. But there should be no operation `get :: String# -> Addr#` (except maybe in the IO monad), else it'd possible that `get "foo"#` might be not-equal to `get "foo"#`. * What operations do we need on `String#`? Presumably at least {{{ eqString# :: String# -> String# -> Int# -- Like eqChar# cmpString# :: String# -> String# -> Int# -- 3-way compare lenString# :: String# -> Int# -- Number of chars indexString# :: String# -> Int# -> Char# -- Get the ith char }}} * NB: I'm deliberately not saying that the string is null-terminated. That's be up to the implementation of `String#`, provided it offered the above operations. A better representation might be a record of a length and a blob of bytes. * Could `String#` simply be a `ByteArray#`? {{{ type String# = ByteArray# }}} After all, `ByteArray#` already has primops `sizeOfByteArray#` and `indexCharArray#`. We'd just need a way to have a statically-allocated `ByteArray#`, but that would be an excellent thing anyway. e.g. I believe that Happy mis-uses literal strings to allow it to build a statically- allocated array. Avoiding yet another primitive type would be a relief. See also #5218 and #9577 * I'm not sure about how Unicode plays with all of this. * This would be a potentially breaking change for any code using unboxed string literals. I'm copying the Core Libraries Committee I'd love someone to take this up. there -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11312#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler