Changes to Data.Typeable

newer
Proposal: Add an analogue of $! to...

older
Proposal: IntMap.differenceKeysSet...

Simon Marlow

7 Jul 2011 7 Jul '11

3:44 p.m.

Hi folks, In response to this ticket: http://hackage.haskell.org/trac/ghc/ticket/5275 I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list. The current implementation of Typeable is based on mkTyCon :: String -> TyCon which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details. The String passed to mkTyCon is returned by tyConString :: TyCon -> String which lets the user get at this non-portable representation (also the Show instance returns this String). So the new proposal is to store three Strings in TyCon. The internal representation is this: data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String } the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only. I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before. === Proposed API changes === 1. DEPRECATE mkTyCon mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of... 2. Add mkTyCon3 :: String -> String -> String -> TyCon which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3. In due course we can rename mkTyCon3 back to mkTyCon. Any comments? Cheers, Simon

Show replies by date

Gábor Lehel

7 Jul 7 Jul

4:14 p.m.

On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow wrote:

...

Hi folks,

In response to this ticket:

http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

Would this also mean typeRepKey could be taken out of the IO monad? That would be nice. -- Work is punishment for failing to procrastinate effectively.

Simon Marlow

7 p.m.

On 07/07/11 17:14, Gábor Lehel wrote:

...

On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow wrote:

...
Hi folks,

In response to this ticket:

http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

Would this also mean typeRepKey could be taken out of the IO monad? That would be nice.

Ah yes, I forgot to mention the changes to typeRepKey. So currently we have typeRepKey :: TypeRep -> IO Int this API is difficult to support in the new library, I'd have to reintroduce the cache, and it wouldn't be very efficient. I plan to change it to this: data TypeRepKey -- abstract, instance of Eq, Ord typeRepKey :: TypeRep -> IO TypeRepKey where TypeRepKey is a newtype of the internal Fingerprint. Now, we could take typeRepKey out of IO, but the Ord instance of TypeRepKey is implementation-defined (it provides some total order, but we don't tell you what it is). So arguably we should keep the IO. What do people think? Obviously this is not a backwards compatible change either way. Cheers, Simon

Gábor Lehel

8 Jul 8 Jul

4:36 p.m.

2011/7/7 Simon Marlow :

...

On 07/07/11 17:14, Gábor Lehel wrote:

...
On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow wrote:

...
Hi folks,

In response to this ticket:

http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

Would this also mean typeRepKey could be taken out of the IO monad? That would be nice.

Ah yes, I forgot to mention the changes to typeRepKey. So currently we have

typeRepKey :: TypeRep -> IO Int

this API is difficult to support in the new library, I'd have to reintroduce the cache, and it wouldn't be very efficient. I plan to change it to this:

data TypeRepKey -- abstract, instance of Eq, Ord typeRepKey :: TypeRep -> IO TypeRepKey

where TypeRepKey is a newtype of the internal Fingerprint. Now, we could take typeRepKey out of IO, but the Ord instance of TypeRepKey is implementation-defined (it provides some total order, but we don't tell you what it is). So arguably we should keep the IO. What do people think?

Would the order be allowed to vary from run to run of the program (which is why it's in IO now)? Could it be specified as implementation-defined but non-varying? If so, I would favor that option along with taking it out of IO. (Plenty of things are implementation-defined, like the size of an Int.) Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.

...

Obviously this is not a backwards compatible change either way.

Cheers, Simon

-- Work is punishment for failing to procrastinate effectively.

Simon Marlow

11 Jul 11 Jul

8:17 a.m.

On 08/07/2011 17:36, Gábor Lehel wrote:

...

2011/7/7 Simon Marlow:

...
On 07/07/11 17:14, Gábor Lehel wrote:

...
On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow wrote:

...
Hi folks,

In response to this ticket:

http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

Would this also mean typeRepKey could be taken out of the IO monad? That would be nice.

Ah yes, I forgot to mention the changes to typeRepKey. So currently we have

typeRepKey :: TypeRep -> IO Int

this API is difficult to support in the new library, I'd have to reintroduce the cache, and it wouldn't be very efficient. I plan to change it to this:

data TypeRepKey -- abstract, instance of Eq, Ord typeRepKey :: TypeRep -> IO TypeRepKey

where TypeRepKey is a newtype of the internal Fingerprint. Now, we could take typeRepKey out of IO, but the Ord instance of TypeRepKey is implementation-defined (it provides some total order, but we don't tell you what it is). So arguably we should keep the IO. What do people think?

Would the order be allowed to vary from run to run of the program (which is why it's in IO now)? Could it be specified as implementation-defined but non-varying? If so, I would favor that option along with taking it out of IO. (Plenty of things are implementation-defined, like the size of an Int.)

Yes, it's implementation-defined but non-varying. I know some people have objected to these things being outside the IO monad before, but there is already plenty of precedent (System.Info.os, size of Int, isIEEE...). However, if we take it out of IO then it may limit the possible implementations. Would the previous implementation, in which keys were assigned at runtime, still be valid? It is still implementation-defined and non-varying, but only over a single run.

...

Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.

The keys are 128-bit hashes, so it might still be possible to do something like this, but you would need access to the internal representations. I'm planning to expose these via Data.Typeable.Internal (no guarantees about stability of this API, however). Cheers, Simon

Gábor Lehel

2:25 p.m.

2011/7/11 Simon Marlow :

...

On 08/07/2011 17:36, Gábor Lehel wrote:

...
2011/7/7 Simon Marlow:

...
On 07/07/11 17:14, Gábor Lehel wrote:

...
On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow wrote:

...
Hi folks,

In response to this ticket:

http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

Would this also mean typeRepKey could be taken out of the IO monad? That would be nice.

Ah yes, I forgot to mention the changes to typeRepKey. So currently we have

typeRepKey :: TypeRep -> IO Int

this API is difficult to support in the new library, I'd have to reintroduce the cache, and it wouldn't be very efficient. I plan to change it to this:

data TypeRepKey -- abstract, instance of Eq, Ord typeRepKey :: TypeRep -> IO TypeRepKey

where TypeRepKey is a newtype of the internal Fingerprint. Now, we could take typeRepKey out of IO, but the Ord instance of TypeRepKey is implementation-defined (it provides some total order, but we don't tell you what it is). So arguably we should keep the IO. What do people think?

Would the order be allowed to vary from run to run of the program (which is why it's in IO now)? Could it be specified as implementation-defined but non-varying? If so, I would favor that option along with taking it out of IO. (Plenty of things are implementation-defined, like the size of an Int.)

Yes, it's implementation-defined but non-varying. I know some people have objected to these things being outside the IO monad before, but there is already plenty of precedent (System.Info.os, size of Int, isIEEE...).

However, if we take it out of IO then it may limit the possible implementations. Would the previous implementation, in which keys were assigned at runtime, still be valid? It is still implementation-defined and non-varying, but only over a single run.

That's the question. It's in IO now because, while the keys don't vary over a single run, they do vary between them. Presumably the new version should be 'pure' if and only if that's no longer true. The upsides (of not being in IO) are obvious, but unfortunately I don't know much at all about the potential downsides in terms of limiting implementations.

...

...
Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.

The keys are 128-bit hashes, so it might still be possible to do something like this, but you would need access to the internal representations. I'm planning to expose these via Data.Typeable.Internal (no guarantees about stability of this API, however).

I was going to suggest that a Lift instance could be provided in Language.Haskell.TH.Syntax, but I see now that there's quite a few types which could have an instance and don't, so that probably belongs in a separate proposal. Just having the internals available will hopefully be 'good enough' for the use case I mentioned (which itself is not that important, just a nice optimization).

...

Cheers, Simon

-- Work is punishment for failing to procrastinate effectively.

Simon Marlow

14 Jul 14 Jul

2:56 p.m.

On 11/07/2011 15:25, Gábor Lehel wrote:

...

2011/7/11 Simon Marlow:

...
On 08/07/2011 17:36, Gábor Lehel wrote:

...
2011/7/7 Simon Marlow:

Yes, it's implementation-defined but non-varying. I know some people have objected to these things being outside the IO monad before, but there is already plenty of precedent (System.Info.os, size of Int, isIEEE...).

However, if we take it out of IO then it may limit the possible implementations. Would the previous implementation, in which keys were assigned at runtime, still be valid? It is still implementation-defined and non-varying, but only over a single run.

That's the question. It's in IO now because, while the keys don't vary over a single run, they do vary between them. Presumably the new version should be 'pure' if and only if that's no longer true. The upsides (of not being in IO) are obvious, but unfortunately I don't know much at all about the potential downsides in terms of limiting implementations.

After talking with Simon Peyton Jones about this, I decided to deprecate typeRepKey and add Ord instances to TypeRep and TyCon. Given that typeRepKey isn't returning an Int any more, it isn't adding anything over a direct Ord instance on TypeRep (we didn't do this before because the Ord instance would vary from run to run). So you can now make a Map with TypeRep as the key, however to do this efficiently you probably want to use a hash map, and make TypeRep Hashable (easy, just take a chunk of the fingerprint). Cheers, Simon

...

...
...
Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.

The keys are 128-bit hashes, so it might still be possible to do something like this, but you would need access to the internal representations. I'm planning to expose these via Data.Typeable.Internal (no guarantees about stability of this API, however).

I was going to suggest that a Lift instance could be provided in Language.Haskell.TH.Syntax, but I see now that there's quite a few types which could have an instance and don't, so that probably belongs in a separate proposal. Just having the internals available will hopefully be 'good enough' for the use case I mentioned (which itself is not that important, just a nice optimization).

...
Cheers, Simon

Gábor Lehel

3:42 p.m.

2011/7/14 Simon Marlow :

...

On 11/07/2011 15:25, Gábor Lehel wrote:

...
2011/7/11 Simon Marlow:

...
On 08/07/2011 17:36, Gábor Lehel wrote:

...
2011/7/7 Simon Marlow:

Yes, it's implementation-defined but non-varying. I know some people have objected to these things being outside the IO monad before, but there is already plenty of precedent (System.Info.os, size of Int, isIEEE...).

However, if we take it out of IO then it may limit the possible implementations. Would the previous implementation, in which keys were assigned at runtime, still be valid? It is still implementation-defined and non-varying, but only over a single run.

That's the question. It's in IO now because, while the keys don't vary over a single run, they do vary between them. Presumably the new version should be 'pure' if and only if that's no longer true. The upsides (of not being in IO) are obvious, but unfortunately I don't know much at all about the potential downsides in terms of limiting implementations.

After talking with Simon Peyton Jones about this, I decided to deprecate typeRepKey and add Ord instances to TypeRep and TyCon. Given that typeRepKey isn't returning an Int any more, it isn't adding anything over a direct Ord instance on TypeRep (we didn't do this before because the Ord instance would vary from run to run).

So you can now make a Map with TypeRep as the key, however to do this efficiently you probably want to use a hash map, and make TypeRep Hashable (easy, just take a chunk of the fingerprint).

This sounds good. Thanks. If I'm understanding-assuming correctly, the Ord instance for TypeRep would be two 64-bit compares? That doesn't sound so horrible, even without hashing.

...

Cheers, Simon

...
...
...
Albeit, the use case I had in mind was using Template Haskell to construct a case statement over the literal Int values of the keys as determined at compile time (hopefully compiling down to something like a C switch statement), and I'm not sure if that's going to work if the keys are no longer Ints. (That it wouldn't compile down to a switch statement is one thing, but I'm not sure if the code would literally be possible to write. Maybe it'd need a Lift instance?) Anyway, I don't think it would hurt to take it out of IO if given the opportunity, either way.

The keys are 128-bit hashes, so it might still be possible to do something like this, but you would need access to the internal representations. I'm planning to expose these via Data.Typeable.Internal (no guarantees about stability of this API, however).

I was going to suggest that a Lift instance could be provided in Language.Haskell.TH.Syntax, but I see now that there's quite a few types which could have an instance and don't, so that probably belongs in a separate proposal. Just having the internals available will hopefully be 'good enough' for the use case I mentioned (which itself is not that important, just a nice optimization).

...
Cheers, Simon

-- Work is punishment for failing to procrastinate effectively.

Yitzchak Gale

10 Jul 10 Jul

7:51 a.m.

Simon Marlow wrote:

...

In response to this ticket: http://hackage.haskell.org/trac/ghc/ticket/5275 ... === Proposed API changes === 1. DEPRECATE mkTyCon... 2. Add mkTyCon3 :: String -> String -> String -> TyCon ...Most users can just derive Typeable, there's no need to use mkTyCon3. In due course we can rename mkTyCon3 back to mkTyCon.

Anecdotally, it seems to me that many, if not most, packages on Hackage that create Typeable instances do so using mkTyCon, not by deriving Typeable. There seems to be some standard boilerplate that spreads virally from one library to the next. Perhaps it would be worth checking to what extent mkTyCon is used on Hackage. If this is as widespread as I suspect, even a simple deprecation warning could have a cascading effect on typical output from cabal install. Thanks, Yitz

Simon Marlow

11 Jul 11 Jul

8:36 a.m.

On 10/07/2011 08:51, Yitzchak Gale wrote:

...

Simon Marlow wrote:

...
In response to this ticket: http://hackage.haskell.org/trac/ghc/ticket/5275 ... === Proposed API changes === 1. DEPRECATE mkTyCon... 2. Add mkTyCon3 :: String -> String -> String -> TyCon ...Most users can just derive Typeable, there's no need to use mkTyCon3. In due course we can rename mkTyCon3 back to mkTyCon.

Anecdotally, it seems to me that many, if not most, packages on Hackage that create Typeable instances do so using mkTyCon, not by deriving Typeable. There seems to be some standard boilerplate that spreads virally from one library to the next.

Perhaps it would be worth checking to what extent mkTyCon is used on Hackage. If this is as widespread as I suspect, even a simple deprecation warning could have a cascading effect on typical output from cabal install.

If there are a lot of packages using mkTyCon (I've already encountered one, so you might well be right), what do you suggest? Deprecation is the gentlest way we have to signal that something needs to be fixed. Perhaps deprecation warnings should be piped into "sendmail <package-author>" instead of being printed :-) Cheers, Simon

Yitzchak Gale

6:10 p.m.

I wrote:

...

...
Perhaps it would be worth checking to what extent mkTyCon is used on Hackage.

Simon Marlow wrote:

...

If there are a lot of packages using mkTyCon (I've already encountered one, so you might well be right), what do you suggest?

It all depends on the extent of the problem. Can someone who has quick access to the entire contents of Hackage please do a grep?

...

Deprecation is the gentlest way we have to signal that something needs to be fixed.

Yes. The main questions are how loudly to yell, and perhaps how long to delay, before doing that.

...

Perhaps deprecation warnings should be piped into "sendmail <package-author>" instead of being printed :-)

Yes, that might be a good plan. :) We can start with a post to the Cafe and Reddit, though. I'll do that now. Thanks, Yitz

Edward Kmett

12 Jul 12 Jul

12:32 a.m.

+1 from me. it'll break about 40-50 of my modules, but the changes are pretty mechanical. -Edward On Thu, Jul 7, 2011 at 11:44 AM, Simon Marlow wrote:

...

Hi folks,

In response to this ticket:

http://hackage.haskell.org/**trac/ghc/ticket/5275 http://hackage.haskell.org/trac/ghc/ticket/5275

I'm making some changes to Data.Typeable, some of which affect the API, so as per the new library guidelines I'm informing the list.

The current implementation of Typeable is based on

mkTyCon :: String -> TyCon

which internally keeps a table mapping Strings to Ints, so that each TyCon can be given a unique Int for fast comparison. This means the String has to be unique across all types in the program. Currently derived instances of typeable use the qualified original name (e.g. "GHC.Types.Int") which is not necessarily unique, is non-portable, and exposes implementation details.

The String passed to mkTyCon is returned by

tyConString :: TyCon -> String

which lets the user get at this non-portable representation (also the Show instance returns this String).

So the new proposal is to store three Strings in TyCon. The internal representation is this:

data TyCon = TyCon { tyConHash :: {-# UNPACK #-} !Fingerprint, tyConPackage :: String, tyConModule :: String, tyConName :: String }

the fields of this type are not exposed externally. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique Ids). tyConString now returns the value of tyConName only.

I've measured the performance impact of this change, and as far as I can tell performance is uniformly better. This should improve things for SYB in particular. Also, the size of the code generated for deriving Typeable is less than half as much as before.

=== Proposed API changes ===

1. DEPRECATE mkTyCon

mkTyCon is used by some hand-written instances of Typeable. It will work as before, but is deprecated in favour of...

2. Add

mkTyCon3 :: String -> String -> String -> TyCon

which takes the package, module, and name of the TyCon respectively. Most users can just derive Typeable, there's no need to use mkTyCon3.

In due course we can rename mkTyCon3 back to mkTyCon.

Any comments?

Cheers, Simon

______________________________**_________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/**mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries

5119

Age (days ago)

5126

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Edward Kmett
Gábor Lehel
Simon Marlow
Yitzchak Gale