Re: [GHC] #5218: Add unpackCStringLen# to create Strings from string literals

#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jscholl): How about instead of adding a new type {{{String#}}}, as suggested in #10064, or adding a function {{{unpackCStringLen#}}}, we add the ability to query the size of the payload of an {{{Addr#}}} at compile-time? We could provide a function which, given an {{{Addr#}}} constant, turns this into a {{{(# Int#, Addr# #)}}} pair without introducing a new type, thus keeping the overall changes low and the design flexible. How do we get the length at compile-time? We use a special builtin rewrite-rule, which writes the length to the appropiate places. For example: {{{ {-# INLINE[0] viewCString# #-} viewCString# :: Addr# -> (# Int#, Addr# #) viewCString# addr# = (# -1#, addr# #) {-# RULES "viewCString#" forall addr . viewCString# addr = (# <length of addrs pointee>#, addr #) #-} }}} Library code could then use {{{viewCString#}}} to try to determine the length at compile-time, and, if optimizations are enabled, the call gets rewritten to the correct result. Otherwise, {{{viewCString#}}} inlines in phase 0, the resulting -1 is seen by the library code and the code is simplified to determine the length at runtime, like it does today. Why does {{{viewCString#}}} return the {{{Addr#}}} again? If it does not, the {{{Addr#}}} given to {{{viewCString#}}} will be used multiple times, thus, GHC will bind it in some let, complicating the design of the rule. If the function returns it, the library can continue to use the returned {{{Addr#}}}, and GHC will less likely share it. One could go even further and extend {{{viewCString#}}} to handle two additional cases: Converting the encoding at compile-time as well as determining the number of characters of the string. So {{{viewCString#}}} would become: {{{viewCString# :: Int# -> Addr# -> (# Int#, Addr#, Int#, Int# #)}}} The first input determines the requested mode of operation (only count bytes and characters, convert to utf16le/be, utf32le/be). The first output {{{Int#}}} determines the performed operation, it should always either be the input {{{Int#}}} or some "No operation performed" code. The other two {{{Int#}}} results are the number of bytes and number of characters, and the {{{Addr#}}} contains the potentially converted literal. Of course, an interface passing around magic {{{Int#}}}s is not the nicest, but I think, this is quite low-level code and only a few libraries like text and bytestring will have to deal with it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC