
#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443 Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): It looks like we have to make a space-time trade off here, if `ByteArray#` 's overhead is too large, adding a `Int#` will also cost a lot. Suppose GHC is smart enough to float all string constant out, then copying once in runtime is acceptable. Otherwise i would still want to switch space for time. Here i also propose another solution: We still keep primitive literal's type as `Addr#`, but encode the literal's byte length with UTF8 rules, that is using one byte for length less than 0x7F, two bytes for length less than 0x7FF...and so on. Then put these UTF8 encoded length header bytes in front of real bytes content. Now all we have to do left is to add a new unpack function `unpackGHCString#` to decode these header bytes first(we can reuse UTF8 code!!!), then use `memcpy` or whatever you want to do with length info. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:70 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler