Any precedent or plan for guaranteed-safe Eq and Ord instances?

Hello all, Normally, we don't worry *too* much about incorrect instances of standard classes (Num, Eq, Ord) etc. They make the user's program wrong, but they don't compromise the type system. Unfortunately, with the LVish parallel programming library we do have a situation where incorrect instances of Eq and Ord can cause the "types to lie". In particular, something that claims to be a pure, non-IO type, can actually yield a different result on different runs, including throwing exceptions on some runs but not others. So what's the best way to lock down "SafeEq" and "SafeOrd" instances, taking control away from the user (at least with -XSafe is turned on)? We could derive our own SafeEq and SafeOrd instances based on GHC.Generics. BUT, that only helps if we can forbid the user from writing their own incorrect Generics instances when Safe Haskell is turned on. It looks like GHC.Generics is currently marked as "TrustWorthy": http://www.haskell.org/ghc/docs/7.4.1/html/libraries/ghc-prim-0.2.0.0/GHC-Ge... So.... could we get GHC.Generics marked as "Unsafe" and enable some more limited interface that is "Trustworthy"? (Allowing the user ONLY to do 'deriving Generic'). This would be similar to the new policy in GHC 7.8 of only allowing derived Typeable instances... -Ryan

Here are some examples:
---------------------------------------------
data Foo = Bar | Baz
instance Eq Foo where
_ == _ = True
instance Ord Foo where
compare Bar Bar = EQ
compare Bar Baz = LT
compare _ _ = error "I'm partial!"
---------------------------------------------
These would allow LVish's "runPar" to non-determinstically return Bar or
Baz (thinking they're the same based on Eq). Or it would allow runPar to
nondeterministically crash based on different schedulings hitting the
compare error or not [1].
FYI here's LVish:
http://www.cs.indiana.edu/~rrnewton/haddock/lvish/
https://github.com/iu-parfunc/lvars
(More info in this POPL paper:
http://www.cs.indiana.edu/~rrnewton/papers/2013_07_LVish_quasiDet_working_dr...
)
-Ryan
[1] If you're curious why this happens, its because the Ord instance is
used by, say, Data.Set and Data.Map for the keys. If you're inserting
elements in an arbitrary order, the final contents ARE deterministic, but
the comparisons that are done along the way ARE NOT.
On Tue, Oct 1, 2013 at 4:13 PM, Ryan Newton
Hello all,
Normally, we don't worry *too* much about incorrect instances of standard classes (Num, Eq, Ord) etc. They make the user's program wrong, but they don't compromise the type system.
Unfortunately, with the LVish parallel programming library we do have a situation where incorrect instances of Eq and Ord can cause the "types to lie". In particular, something that claims to be a pure, non-IO type, can actually yield a different result on different runs, including throwing exceptions on some runs but not others.
So what's the best way to lock down "SafeEq" and "SafeOrd" instances, taking control away from the user (at least with -XSafe is turned on)?
We could derive our own SafeEq and SafeOrd instances based on GHC.Generics. BUT, that only helps if we can forbid the user from writing their own incorrect Generics instances when Safe Haskell is turned on. It looks like GHC.Generics is currently marked as "TrustWorthy":
http://www.haskell.org/ghc/docs/7.4.1/html/libraries/ghc-prim-0.2.0.0/GHC-Ge...
So.... could we get GHC.Generics marked as "Unsafe" and enable some more limited interface that is "Trustworthy"? (Allowing the user ONLY to do 'deriving Generic').
This would be similar to the new policy in GHC 7.8 of only allowing derived Typeable instances...
-Ryan

Ryan Newton wrote:
Here are some examples:
--------------------------------------------- data Foo = Bar | Baz
instance Eq Foo where _ == _ = True
instance Ord Foo where compare Bar Bar = EQ compare Bar Baz = LT compare _ _ = error "I'm partial!" ---------------------------------------------
These would allow LVish's "runPar" to non-determinstically return Bar or Baz (thinking they're the same based on Eq). Or it would allow runPar to nondeterministically crash based on different schedulings hitting the compare error or not [1].
[..]
[1] If you're curious why this happens, its because the Ord instance is used by, say, Data.Set and Data.Map for the keys. If you're inserting elements in an arbitrary order, the final contents ARE deterministic, but the comparisons that are done along the way ARE NOT.
I'm not sure whether the Eq instance you mention is actually incorrect. I had always understood that Eq denotes an equivalence relation, not necessarily equality on the constructor level. One prominent example would be equality of Data.Map itself: two maps are "equal" if they contain the same key-value pairs, map1 == map2 <=> toAscList map1 == toAscList map2 but that doesn't mean that their internal representation -- the balanced tree -- is the same. Virtually all exported operations in Data.Map preserve this equivalence, but the library is, in principle, still able to distinguish "equal" maps. In other words, equality of abstract data types is different from equality of algebraic data types (constructors). I don't think you'll ever be able to avoid this proof obligation that the public API of an abstract data type preserves equivalence, so that LVish will yield "results that are deterministic up to equivalence". However, you are mainly focused on equality of keys for a Map. In this case, it might be possible to move towards pointer equality and use things like Hashable or StableName . Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Wed, Oct 02, 2013 at 11:24:39AM +0200, Heinrich Apfelmus wrote:
I'm not sure whether the Eq instance you mention is actually incorrect. I had always understood that Eq denotes an equivalence relation, not necessarily equality on the constructor level.
There's a difference between implementors being able to distingush equals, and application programmers. Internal implementations are allowed to break all sorts of invariants, but, by definition, APIs shouldn't. Are there examples where application programmers would like there so be some f, a and b such that a == b but f a /= f b (efficiency concerns aside)? I can't think of any obvious ones.
One prominent example would be equality of Data.Map itself: two maps are "equal" if they contain the same key-value pairs,
map1 == map2 <=> toAscList map1 == toAscList map2
but that doesn't mean that their internal representation -- the balanced tree -- is the same. Virtually all exported operations in Data.Map preserve this equivalence, but the library is, in principle, still able to distinguish "equal" maps.
I had a quick skim of http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-La... to find such examples, and the only one that jumped out was "showTree". Are there others? Still, although the library is, in principle, able to distinguish "equal" maps, isn't this just a leaking implementation detail? Is it somehow of benefit to API users? Tom

On Wed, Oct 2, 2013 at 5:18 AM, Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
Are there examples where application programmers would like there so be some f, a and b such that a == b but f a /= f b (efficiency concerns aside)? I can't think of any obvious ones.
Yes, and we already handle it properly:
Prelude> let f = (1.0 /)
Prelude> let (z, negz) = (0.0, -0.0)
Prelude> z == negz
True
Prelude> f z /= f negz
True
This is *not* an "IEEE Floats are weird" thing. Application
programmers want 0.0 to equal -0.0, but -Infinity to not be equal to
Infinity.
Of course, given how many "IEEE Floats are weird" things there are,
you can reasonably consider ignoring this example.

So i think we can conclude the following
1) things are not perfect currently. But it requires some huge type class
changes to have a better story
2) certain types of data types will need to be newtyped to have instances
that play nice with Ryans concurrency work. Thats ok. Theres often good
reasons for those "illegal" instances. There is no sound and COMPLETE way
to enforce having "good" instances, and thats a good thing, though having
more libs work correctly is always a good thing
3) as always, people complain about floats, when the real issue is the
current numerical type classes, which are wrong in a number of ways. I hope
to experiment with my own ideas in that direction soon. Not worth a boring
no ideas worth taking seriously cafe thread
On Wed, Oct 2, 2013 at 2:39 PM, Mike Meyer
On Wed, Oct 2, 2013 at 5:18 AM, Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
Are there examples where application programmers would like there so be some f, a and b such that a == b but f a /= f b (efficiency concerns aside)? I can't think of any obvious ones.
Yes, and we already handle it properly:
Prelude> let f = (1.0 /) Prelude> let (z, negz) = (0.0, -0.0) Prelude> z == negz True Prelude> f z /= f negz True
This is *not* an "IEEE Floats are weird" thing. Application programmers want 0.0 to equal -0.0, but -Infinity to not be equal to Infinity.
Of course, given how many "IEEE Floats are weird" things there are, you can reasonably consider ignoring this example.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Tom Ellis wrote:
On Wed, Oct 02, 2013 at 11:24:39AM +0200, Heinrich Apfelmus wrote:
I'm not sure whether the Eq instance you mention is actually incorrect. I had always understood that Eq denotes an equivalence relation, not necessarily equality on the constructor level.
There's a difference between implementors being able to distingush equals, and application programmers. Internal implementations are allowed to break all sorts of invariants, but, by definition, APIs shouldn't.
Are there examples where application programmers would like there so be some f, a and b such that a == b but f a /= f b (efficiency concerns aside)? I can't think of any obvious ones.
I think the trouble here is that the instance data Foo = Bar | Baz instance Eq Foo where _ == _ = True is perfectly fine if the possibility to distinguish between Foo and Bar is not exported, i.e. if this is precisely an implementation detail. The LVish library sits between chairs. If you consider all Eq instances Safe in the sense of SafeHaskell, then LVish must be marked Unsafe -- it can return different results in a non-deterministic fashion. However, if only Eq instance that adhere to the "exported functions preserve equivalence" law are allowed, then LVish can be marked Safe or Trustworthy, but that assurance is precisely one we lack. I think the generic form of the problem is this: module LVish where -- | 'foo' expects the invariant that the -- first argument must be 'True'. foo :: Bool -> String foo False = unsafeLaunchMissiles foo True = "safe" module B where goo = foo False What status should SafeHaskell assign to the modules LVish and B, respectively? It looks like the right assignment is: LVish - Unsafe B - Trustworthy If LVish cannot guarantee referential transparency without assumptions on external input, then it stands on a similar footing as unsafePerformIO . Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

* Heinrich Apfelmus
In other words, equality of abstract data types is different from equality of algebraic data types (constructors). I don't think you'll ever be able to avoid this proof obligation that the public API of an abstract data type preserves equivalence, so that LVish will yield "results that are deterministic up to equivalence".
It still seems to fit nicely into Safe Haskell. If you are the implementor of an abstract type, you can do whatever you want in the Eq instance, declare your module as Trustworthy, and thus take the responsibility for soundness of that instance w.r.t. your public API. Roman

Hi, Roman Cheplyaka wrote:
It still seems to fit nicely into Safe Haskell. If you are the implementor of an abstract type, you can do whatever you want in the Eq instance, declare your module as Trustworthy, and thus take the responsibility for soundness of that instance w.r.t. your public API.
A possible problem with marking "instance Eq" as an unsafe feature is that many modules would be only Trustworthy instead of Safe. So if I don't trust the authors of a module (because I don't know them), I cannot safely use their code just because they implement their own Eq instance? That would go against my "every purely functional module is automatically safe because the compiler checks that it cannot launch the missiles" understanding of Safe Haskell. Actually, Eq instances are not unsafe per se, but only if I also use some other module that assumes certain properties about all Eq instances in scope. So in order to check safety, two independent modules (the provider and the consumer of the Eq instance) would have to cooperate. Tillmann

* Tillmann Rendel
Hi,
Roman Cheplyaka wrote:
It still seems to fit nicely into Safe Haskell. If you are the implementor of an abstract type, you can do whatever you want in the Eq instance, declare your module as Trustworthy, and thus take the responsibility for soundness of that instance w.r.t. your public API.
A possible problem with marking "instance Eq" as an unsafe feature is that many modules would be only Trustworthy instead of Safe. So if I don't trust the authors of a module (because I don't know them), I cannot safely use their code just because they implement their own Eq instance?
That would go against my "every purely functional module is automatically safe because the compiler checks that it cannot launch the missiles" understanding of Safe Haskell.
Actually, Eq instances are not unsafe per se, but only if I also use some other module that assumes certain properties about all Eq instances in scope. So in order to check safety, two independent modules (the provider and the consumer of the Eq instance) would have to cooperate.
Good point! Roman

I do think something has to be done to have an Eq and Ord with more strict laws. * Operators in Eq and Ord diverge iff any of their parameters are bottom. * The default definitions of (/=), (<), (>) and `compare` are law. * (==) is reflexive and transitive * (<=) is antisymmetric ((x <= y && y <= x) `implies` (x == y)) * (<=) is 'total' (x <= y || y <= x) * (<=) is transitive Currently, reflexivity of (==) is broken in the Prelude (let x = 0/0 in x == x). I know this is for IEEE 754 compliance, but c'mon, this is Haskell, we can have better ways of dealing with NaNs. -Stijn

On Wed, 2 Oct 2013 15:46:42 +0200, Stijn van Drongelen
I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom. * The default definitions of (/=), (<), (>) and `compare` are law. * (==) is reflexive and transitive * (<=) is antisymmetric ((x <= y && y <= x) `implies` (x == y)) * (<=) is 'total' (x <= y || y <= x) * (<=) is transitive
Currently, reflexivity of (==) is broken in the Prelude (let x = 0/0 in x == x). I know this is for IEEE 754 compliance, but c'mon, this is Haskell, we can have better ways of dealing with NaNs.
Like making Double not be an instance of Eq?

On Wed, Oct 2, 2013 at 3:49 PM, Niklas Haas
On Wed, 2 Oct 2013 15:46:42 +0200, Stijn van Drongelen
wrote: I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom. * The default definitions of (/=), (<), (>) and `compare` are law. * (==) is reflexive and transitive * (<=) is antisymmetric ((x <= y && y <= x) `implies` (x == y)) * (<=) is 'total' (x <= y || y <= x) * (<=) is transitive
Currently, reflexivity of (==) is broken in the Prelude (let x = 0/0 in x == x). I know this is for IEEE 754 compliance, but c'mon, this is Haskell, we can have better ways of dealing with NaNs.
Like making Double not be an instance of Eq? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Like making IEEE754 Doubles not an instance of Eq. Normal and denormal Doubles should have Eq instances.

Only for meanings of "better" which do not imply as good performance.
On 2 October 2013 14:46, Stijn van Drongelen
I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom. * The default definitions of (/=), (<), (>) and `compare` are law. * (==) is reflexive and transitive * (<=) is antisymmetric ((x <= y && y <= x) `implies` (x == y)) * (<=) is 'total' (x <= y || y <= x) * (<=) is transitive
Currently, reflexivity of (==) is broken in the Prelude (let x = 0/0 in x == x). I know this is for IEEE 754 compliance, but c'mon, this is Haskell, we can have better ways of dealing with NaNs.
-Stijn
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Oct 02, 2013 at 03:46:42PM +0200, Stijn van Drongelen wrote:
* Operators in Eq and Ord diverge iff any of their parameters are bottom.
What's the benefit of this requirement, as opposed to, for example False <= _ = True ... Tom

* Stijn van Drongelen
I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom.
This outlaws the Eq instances of lists, trees, and other (co)recursive types. Furthermore, in this formulation, even Eq for tuples is illegal, because (undefined, something) == somethingElse is going to diverge. Roman

On Wed, Oct 2, 2013 at 4:17 PM, Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
What's the benefit of this requirement, as opposed to, for example
False <= _ = True
I was trying to cover for void types, where the only sensible definitions
are
instance Eq Void where
_ == _ = error "void (==)"
instance Ord Void where
_ <= _ = error "void (<=)"
This is because reflexivity must be guaranteed, so undefined == undefined
may not yield False, but I doubt error "foo" == (let x = x in x) should
yield True either. But perhaps this exception deserves its own rule.
On Wed, Oct 2, 2013 at 5:36 PM, Roman Cheplyaka
* Stijn van Drongelen
[2013-10-02 15:46:42+0200] I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom.
This outlaws the Eq instances of lists, trees, and other (co)recursive types.
Furthermore, in this formulation, even Eq for tuples is illegal, because
(undefined, something) == somethingElse
is going to diverge.
Roman
I knew this was going to bite me in the ass. Let me try again: * Operators in Eq and Ord may only diverge when any of their parameters are bottom.

On Wed, Oct 2, 2013 at 6:57 PM, Stijn van Drongelen
On Wed, Oct 2, 2013 at 5:36 PM, Roman Cheplyaka
wrote: * Stijn van Drongelen
[2013-10-02 15:46:42+0200] I do think something has to be done to have an Eq and Ord with more strict laws.
* Operators in Eq and Ord diverge iff any of their parameters are bottom.
This outlaws the Eq instances of lists, trees, and other (co)recursive types.
Furthermore, in this formulation, even Eq for tuples is illegal, because
(undefined, something) == somethingElse
is going to diverge.
Roman
I knew this was going to bite me in the ass. Let me try again:
* Operators in Eq and Ord may only diverge when any of their parameters are bottom.
What am I thinking. Scratch that.

Thanks for the responses all. I'm afraid the point about GHC.Generics got lost here. I'll respond and then rename this as a specific library proposal. I don't want to fix the world's Eq instances, but I am ok with requiring that people "derive Generic" for any data they want to put in an LVar container. (From which I can give them a SafeEq instance.) It's not just LVish that is in this boat.... any library that tries to provide deterministic parallelism outside of the IO monad has some very fine lines to walk. Take a look at Accelerate. It is deterministic (as long as you run only with the CUDA backend and only on one specific GPU... otherwise fold topologies may look different and non-associative folds may leak). Besides, "runAcc" does a huge amount of implicit IO (writing to disk, calling nvcc, etc)! At the very least this could fail if the disk if full. But then again, regular "pure" computations fail when memory runs out... so I'm ok grouping these in the same bucket for now. "Determinism modulo available resources." A possible problem with marking "instance Eq" as an unsafe feature is that
many modules would be only Trustworthy instead of Safe.
My proposal is actually more narrow than that. My proposal is to mark GHC.Generics as Unsafe. That way I can define my own SafeEq, and know that someone can't break it by making a Generic instance that lies. It is very hard for me to see why people should be able to make their own Generic instances (that might lie about the structure of the type), in Safe-Haskell.
That would go against my "every purely functional module is automatically safe because the compiler checks that it cannot launch the missiles" understanding of Safe Haskell.
Heh, that may already be violated by the fact that you can't use other extensions like OverlappingInstances, or provide your own Typeable instances.
Actually, Eq instances are not unsafe per se, but only if I also use some other module that assumes certain properties about all Eq instances in scope. So in order to check safety, two independent modules (the provider and the consumer of the Eq instance) would have to cooperate.
I've found, that this is a very common problem that we have when trying to make our libraries Safe-Haskell compliant -- often we want to permit and deny combinations of modules. I don't have a solution I'm afraid.

Hi, Ryan Newton wrote:
It is very hard for me to see why people should be able to make their own Generic instances (that might lie about the structure of the type), in Safe-Haskell.
I guess that lying Generics instances might arise because of software evolution. Let's say we start with an abstract data type of binary trees. module Tree (Tree, node, empty, null, split) where data Tree a = Node (Tree a) (Tree a) | Empty deriving Generics node = Node empty = Empty null Empty = True null _ = False split (Node a b) = (a, b) size Empty = 0 size (Node x y) = size x + size y Note that this data type is not really abstract, because we export the Generics instance, so clients of this module can learn about the implementation of Tree. This is not a big deal, because the chosen implementation happens to correspond to the abstract structure of binary trees anyway. So I would expect that generic code will work fine. For example, you could use generic read and show functions to serialize trees, and get a reasonable data format. Now, we want to evolve our module by caching the size of trees. We do something like this: module Tree (Tree, node, empty, null, split) where data Tree a = Tree !Int (RealTree a) data RealTree a = Node (Tree a) (Tree a) | Empty tree (Node a b) = Tree (size a + size b) t tree Empty = Tree 0 Empty node x y = tree (Node x y) empty = tree Empty null (Tree _ Empty) = True null _ = False split (Tree _ (Node a b)) = (a, b) size (Tree n _) = n Except for the Generics instance, we provide the exact same interface and behavior to our clients, we just traded some space for performance. But what Generics instance should we provide? If we just add "deriving Generics" to the two datatypes, we leak the change of representation to our clients. For example, a client that serialized a tree with a generic show function based on the old Tree cannot hope to deserialize it back with a generic read function based on the new Tree. The size information would be missing, and the structure would be different. If we write a Generics instance by hand, however, I guess we can make it present the exact same structure as the derived Generics instance for the old Tree. With this lying instance, the generic read function will happily deserialize the old data. The size will be computed on the fly, because our hand-written Generics instance will introduce calls to our smart constructors. Tillmann

(replicating what i said on the ghc-devs thread) one thing i'm confused by, and this wasn't properly addressed in the prior threads, is for a type like data Annotated t ann = MkAnn t ann would you consider the following unsafe? instance Eq t => Eq ( Annotated t ann) (==) (MkAnn t1 _) (MkAnn t2 _ ) = t1 == t2 instance Ord t => Ord (Annotated t ann) compare (MkAnn t1 _) (MkAnn t2 _) = compare t1 t2 by the rubric you've proposed these might be *BAD*, right? But theres many cases where you're doing computation on data, and you wish to track origin, time, etc, but these annotations are *irrelevant* for the actual values computed... It sounds like the proposal you want would rule out such instances! your proposal would rule out these pretty natural Ord/Eq instances! am i not understanding your proposal? It seems like you want LVish to be deterministic in the sense of "up to equality/ordering equivalence", the computation will always yield the same answer . Haskell has no way of enforcing totality, and deriving a "SafeEq" via generics is wrong, see the Annotated type example i made up, above. If you define determinism up to the equivalence/ordering relations provided, and assuming anyone using LVish can read the docs before using it, is there really any real problem? are there any *real* example computations that someone might naively do despite the warning where the actual answer would be broken because of this? If we had a richer type system (like say, Idris plus some totality info), how would we enforce the safety conditions needed to make LVish truely deterministically type safe? cheers -Carter On Sun, Oct 6, 2013 at 6:54 PM, Tillmann Rendel < rendel@informatik.uni-marburg.de> wrote:
Hi,
Ryan Newton wrote:
It is very hard for me to see why people should be able to make their own Generic instances (that might lie about the structure of the type), in Safe-Haskell.
I guess that lying Generics instances might arise because of software evolution. Let's say we start with an abstract data type of binary trees.
module Tree (Tree, node, empty, null, split) where data Tree a = Node (Tree a) (Tree a) | Empty deriving Generics
node = Node
empty = Empty
null Empty = True null _ = False
split (Node a b) = (a, b)
size Empty = 0 size (Node x y) = size x + size y
Note that this data type is not really abstract, because we export the Generics instance, so clients of this module can learn about the implementation of Tree. This is not a big deal, because the chosen implementation happens to correspond to the abstract structure of binary trees anyway. So I would expect that generic code will work fine. For example, you could use generic read and show functions to serialize trees, and get a reasonable data format.
Now, we want to evolve our module by caching the size of trees. We do something like this:
module Tree (Tree, node, empty, null, split) where data Tree a = Tree !Int (RealTree a)
data RealTree a = Node (Tree a) (Tree a) | Empty
tree (Node a b) = Tree (size a + size b) t tree Empty = Tree 0 Empty
node x y = tree (Node x y)
empty = tree Empty
null (Tree _ Empty) = True null _ = False
split (Tree _ (Node a b)) = (a, b)
size (Tree n _) = n
Except for the Generics instance, we provide the exact same interface and behavior to our clients, we just traded some space for performance. But what Generics instance should we provide? If we just add "deriving Generics" to the two datatypes, we leak the change of representation to our clients. For example, a client that serialized a tree with a generic show function based on the old Tree cannot hope to deserialize it back with a generic read function based on the new Tree. The size information would be missing, and the structure would be different.
If we write a Generics instance by hand, however, I guess we can make it present the exact same structure as the derived Generics instance for the old Tree. With this lying instance, the generic read function will happily deserialize the old data. The size will be computed on the fly, because our hand-written Generics instance will introduce calls to our smart constructors.
Tillmann ______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Oct 6, 2013 at 6:54 PM, Tillmann Rendel
Hi,
Ryan Newton wrote:
It is very hard for me to see why people should be able to make their own Generic instances (that might lie about the structure of the type), in Safe-Haskell.
I guess that lying Generics instances might arise because of software evolution. Let's say we start with an abstract data type of binary trees.
Hi, A real example exists in containers for the older Data class. Data.Sequence.Seq pretends it is a list instead of a tree:
gshow (S.fromList [1,2,3] S.|> 4) "(<| (1) (<| (2) (<| (3) (<| (4) (empty)))))"
If there was a Generic instance for Seq, it would probably pretend to be a list for the same reasons the Data instance is that way. -- Adam

Tillmann, Thanks, that is in interesting use case for handwritten Generics. I'm not fully dissuaded though, simply because: (1) it can't be too common! Especially when you intersect the people who have done or will do this with the people who care about SafeHaskell. (Again, if they don't, they won't mind this tiny Unsafe toggle.) (2) even people who fall in this intersection still have the recourse of doing what they need to do and asserting "TrustWorthy". SafeHaskell is good at supporting this kind of individual exception. Whereas in my case I have no recourse! Because my problem not about asserting that a particular module is TrustWorthy, but rather about keeping other users (running in -XSafe) from breaking my library. On Sun, Oct 6, 2013 at 6:54 PM, Tillmann Rendel < rendel@informatik.uni-marburg.de> wrote:
Hi,
Ryan Newton wrote:
It is very hard for me to see why people should be able to make their own Generic instances (that might lie about the structure of the type), in Safe-Haskell.
I guess that lying Generics instances might arise because of software evolution. Let's say we start with an abstract data type of binary trees.
module Tree (Tree, node, empty, null, split) where data Tree a = Node (Tree a) (Tree a) | Empty deriving Generics
node = Node
empty = Empty
null Empty = True null _ = False
split (Node a b) = (a, b)
size Empty = 0 size (Node x y) = size x + size y
Note that this data type is not really abstract, because we export the Generics instance, so clients of this module can learn about the implementation of Tree. This is not a big deal, because the chosen implementation happens to correspond to the abstract structure of binary trees anyway. So I would expect that generic code will work fine. For example, you could use generic read and show functions to serialize trees, and get a reasonable data format.
Now, we want to evolve our module by caching the size of trees. We do something like this:
module Tree (Tree, node, empty, null, split) where data Tree a = Tree !Int (RealTree a)
data RealTree a = Node (Tree a) (Tree a) | Empty
tree (Node a b) = Tree (size a + size b) t tree Empty = Tree 0 Empty
node x y = tree (Node x y)
empty = tree Empty
null (Tree _ Empty) = True null _ = False
split (Tree _ (Node a b)) = (a, b)
size (Tree n _) = n
Except for the Generics instance, we provide the exact same interface and behavior to our clients, we just traded some space for performance. But what Generics instance should we provide? If we just add "deriving Generics" to the two datatypes, we leak the change of representation to our clients. For example, a client that serialized a tree with a generic show function based on the old Tree cannot hope to deserialize it back with a generic read function based on the new Tree. The size information would be missing, and the structure would be different.
If we write a Generics instance by hand, however, I guess we can make it present the exact same structure as the derived Generics instance for the old Tree. With this lying instance, the generic read function will happily deserialize the old data. The size will be computed on the fly, because our hand-written Generics instance will introduce calls to our smart constructors.
Tillmann ______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
participants (11)
-
adam vogt
-
Carter Schonwald
-
Colin Adams
-
Heinrich Apfelmus
-
Mike Meyer
-
Niklas Haas
-
Roman Cheplyaka
-
Ryan Newton
-
Stijn van Drongelen
-
Tillmann Rendel
-
Tom Ellis