Re: [GHC] #5218: Add unpackCStringLen# to create Strings from string literals

#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443 #11312 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Here is where we stand on this: This bug seeks to address the fact that we currently have few good ways of encoding literal strings verbatim (e.g. as raw, unchanged bytes) in object code. This is because we insist on encoding primitive strings as null- terminated modified UTF-8. This means that things like `bytestring` and `text` have a rather complicated and inefficient handling of these literals. This inefficiency stems from two reasons, * One needs to look for and correctly handle the U+0000 codepoints (encoded as `0xc0 0x80`) in the primitive string * It's impossible to know what the length of the string is without walking it The solution here is to rework our desugaring of primitive strings such that, {{{#!hs "hello"# }}} Will be desugared as, {{{#!hs let x = "hello"# :: Addr# in (# 5#, "hello"# #) }}} This means that we can encode the string contents in plain UTF-8 without a NULL terminator. The type of `unpackCString#` then becomes, {{{#!hs unpackCString# :: (# Int#, Addr# #) -> String }}} and the implementation gets a tiny bit simpler (since it simply decodes a fixed number of bytes, instead of looking for a NULL). Consequently, libraries can then provide rules matching on `unpackCString#` applications, replacing them with what is essentially `memcpy`. This is for the most part a simple change, with the exception being GHCi support due to the need for unboxed tuples. jscholl started implementing this nearly a year ago but stalled. I recently rebased his work (Phab:D2443) and addressed several of the issues that came up in review. Unfortunately currently GHCi segmentation faults, which will take some to work out. Note that are two related problems that this does not address, * pure ASCII literals (which might be used to, for instance, encode a binary representation of a static `Array`) * `ByteArray#` literals, as requested in ticket:11312 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:73 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC