
On 08/07/2011 17:36, Gábor Lehel wrote:
2011/7/7 Simon Marlow
: On 07/07/11 17:14, Gábor Lehel wrote:
On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow
wrote: Hi folks,
In response to this ticket:
http://hackage.haskell.org/trac/ghc/ticket/5275
I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.
The current implementation of Typeable is based on
mkTyCon :: String -> TyCon
which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.
The String passed to mkTyCon is returned by
tyConString :: TyCon -> String
which lets the user get at this non-portable representation (also the Show instance returns this String).
So the new proposal is to store three Strings in TyCon. The internal representation is this:
data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }
the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.
I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.
=== Proposed API changes ===
1. DEPRECATE mkTyCon
mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...
2. Add
mkTyCon3 :: String -> String -> String -> TyCon
which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.
In due course we can rename mkTyCon3 back to mkTyCon.
Any comments?
Cheers, Simon
Would this also mean typeRepKey could be taken out of the IO monad? That would be nice.
Ah yes, I forgot to mention the changes to typeRepKey. So currently we have
typeRepKey :: TypeRep -> IO Int
this API is difficult to support in the new library, I'd have to reintroduce the cache, and it wouldn't be very efficient. I plan to change it to this:
data TypeRepKey -- abstract, instance of Eq, Ord typeRepKey :: TypeRep -> IO TypeRepKey
where TypeRepKey is a newtype of the internal Fingerprint. Now, we could take typeRepKey out of IO, but the Ord instance of TypeRepKey is implementation-defined (it provides some total order, but we don't tell you what it is). So arguably we should keep the IO. What do people think?
Would the order be allowed to vary from run to run of the program (which is why it's in IO now)? Could it be specified as implementation-defined but non-varying? If so, I would favor that option along with taking it out of IO. (Plenty of things are implementation-defined, like the size of an Int.)
Yes, it's implementation-defined but non-varying. I know some people have objected to these things being outside the IO monad before, but there is already plenty of precedent (System.Info.os, size of Int, isIEEE...). However, if we take it out of IO then it may limit the possible implementations. Would the previous implementation, in which keys were assigned at runtime, still be valid? It is still implementation-defined and non-varying, but only over a single run.
Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.
The keys are 128-bit hashes, so it might still be possible to do something like this, but you would need access to the internal representations. I'm planning to expose these via Data.Typeable.Internal (no guarantees about stability of this API, however). Cheers, Simon