Re: Changes to Data.Typeable

11 Jul 2011

      On 08/07/2011 17:36, Gábor Lehel wrote:
...
2011/7/7 Simon Marlow:
...
On 07/07/11 17:14, Gábor Lehel wrote:
...
On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow    wrote:
...
Hi folks,
In response to this ticket:
http://hackage.haskell.org/trac/ghc/ticket/5275
I'm making some changes to Data.Typeable, some of which affect the API,
so
as per the new library guidelines I'm informing the list.
The current implementation of Typeable is based on
mkTyCon :: String ->    TyCon
which internally keeps a table mapping Strings to Ints, so that each
TyCon
can be given a unique Int for fast comparison.  This means the String has
to
be unique across all types in the program.  Currently derived instances
of
typeable use the qualified original name (e.g. "GHC.Types.Int") which is
not
necessarily unique, is non-portable, and exposes implementation details.
The String passed to mkTyCon is returned by
tyConString :: TyCon ->    String
which lets the user get at this non-portable representation (also the
Show
instance returns this String).
So the new proposal is to store three Strings in TyCon.  The internal
representation is this:
data TyCon = TyCon {
   tyConHash    :: {-# UNPACK #-} !Fingerprint,
   tyConPackage :: String,
   tyConModule  :: String,
   tyConName    :: String
  }
the fields of this type are not exposed externally.  Together the three
fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon,
and the Fingerprint is a hash of the concatenation of these three Strings
(so no more internal cache to map strings to unique Ids). tyConString now
returns the value of tyConName only.
I've measured the performance impact of this change, and as far as I can
tell performance is uniformly better.  This should improve things for SYB
in
particular.  Also, the size of the code generated for deriving Typeable
is
less than half as much as before.
=== Proposed API changes ===
1. DEPRECATE mkTyCon
mkTyCon is used by some hand-written instances of Typeable.  It
   will work as before, but is deprecated in favour of...
2. Add
mkTyCon3 :: String ->    String ->    String ->    TyCon
which takes the package, module, and name of the TyCon respectively.
   Most users can just derive Typeable, there's no need to use mkTyCon3.
In due course we can rename mkTyCon3 back to mkTyCon.
Any comments?
Cheers,
        Simon
Would this also mean typeRepKey could be taken out of the IO monad?
That would be nice.
Ah yes, I forgot to mention the changes to typeRepKey.  So currently we have
typeRepKey :: TypeRep ->  IO Int
this API is difficult to support in the new library, I'd have to reintroduce
the cache, and it wouldn't be very efficient.  I plan to change it to this:
data TypeRepKey -- abstract, instance of Eq, Ord
  typeRepKey :: TypeRep ->  IO TypeRepKey
where TypeRepKey is a newtype of the internal Fingerprint.  Now, we could
take typeRepKey out of IO, but the Ord instance of TypeRepKey is
implementation-defined (it provides some total order, but we don't tell you
what it is).  So arguably we should keep the IO.  What do people think?
Would the order be allowed to vary from run to run of the program
(which is why it's in IO now)? Could it be specified as
implementation-defined but non-varying? If so, I would favor that
option along with taking it out of IO. (Plenty of things are
implementation-defined, like the size of an Int.)
Yes, it's implementation-defined but non-varying.  I know some people 
have objected to these things being outside the IO monad before, but 
there is already plenty of precedent (System.Info.os, size of Int, 
isIEEE...).

However, if we take it out of IO then it may limit the possible 
implementations.  Would the previous implementation, in which keys were 
assigned at runtime, still be valid?  It is still implementation-defined 
and non-varying, but only over a single run.
...
Albeit, the use case I had in mind was using Template Haskell to
construct a case statement over the literal Int values of the keys as
determined at compile time (hopefully compiling down to something like
a C switch statement), and I'm not sure if that's going to work if the
keys are no longer Ints. (That it wouldn't compile down to a switch
statement is one thing, but I'm not sure if the code would literally
be possible to write. Maybe it'd need a Lift instance?) Anyway, I
don't think it would hurt to take it out of IO if given the
opportunity, either way.
The keys are 128-bit hashes, so it might still be possible to do 
something like this, but you would need access to the internal 
representations.  I'm planning to expose these via 
Data.Typeable.Internal (no guarantees about stability of this API, however).

Cheers,
	Simon