
#5218: Add unpackCStringLen# to create Strings from string literals -------------------------------------+------------------------------------- Reporter: tibbe | Owner: thoughtpolice Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 7.0.3 Resolution: | Keywords: strings Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443 #11312, #9719 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): After thinking about this for a while, i think a better solution for literal problem is to overhaul the `OverloaddedStrings` extension, because the problem is created by it: without it we can't even use rewrite rules to get the `Addr#` at all. I think i will make a GHC proposal finally, but let me sketch a little bit on my idea: 1. Currently when `OverloaddedStrings` is enabled, we consider a string literal polymorphric by translating them to `fromString ...`, where `fromString` is a method from `IsString` type class. 2. This makes desugaring literal into `String` the first step in during literal compiling, and at this very step we choose to use `unpackCString# addr#` to desugar the literal. 3. Now we have a problem with the fixed desugaring scheme, it's not flexible enough to give arise a `ByteArray#` based representation, no matter what rewrite-rules are applied afterwards. 4. So i proposal to solve the problem directly at this language extension level, besides `IsString`, i propose to add following typeclass: {{{ class IsPlainAddr a where fromPlainAddr :: Addr# -> a class IsAsciiByteArray a where fromAsciiByteArray :: ByteArray# -> a class IsU8ByteArray a where fromU8ByteArray :: ByteArray# -> a -- maybe someone want utf-16 desugaring? we can add later }}} 5. Together with `IsString`, these typeclasses are special when `OverloaddedStrings` is enabled, we will try to find an instance of the type which we are overloading: If we have a `"Foo" :: Foo`, we will try to find a instance for `Foo` among these classes, the priority of those instances can have an arbitrary order as long as we document it clearly. 6. Once an instance is found, we do desugaring depending on the instance spec, and directly inject `fromXXX "xxx"#` into code, if the sourcecode codepoint can't be encoded with the instance spec, we issue a compile waring. 7. If we failed to find an instance from above, we issue an compile error. 8. Now a library author can choose to implement a type class which suit his/her need. 9. This solution can also be extended to handle `OverloadedLists`, e.g. we can add following typeclass for desugaring list literals: {{{ class IsIntList a where fromIntList :: ByteArray# -> a class IsWordList a where fromWordList :: ByteArray# -> a class IsInt8List a where fromInt8List :: ByteArray# -> a ... }}} 10. When `OverloaddedLists` is enabled, we will try to find an instance of these special classes, and transform the list into `ByteArray#` according to the instance spec, if there're overflowing we issue warnings. If later, people ask for new format of literal desugaring, we add new typeclasses and done, old code continue to work, and new code will got a compile error on old compilers. BTW. I think this is what we called "解铃还需系铃人" in chinese ; ) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/5218#comment:82 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler