Explicitly calling syntactic equality on datatypes

Hello, Pretty simple question here really. I think I've searched for it several times in the past and ended up surprised I did not find an answer. Simple example: sum expressions. One way to define this, one that is comfortable to construct new sums, would be: data Sum = Value Int | Sum Sum Sum Another one, one that is comfortable to check equality with (and other similar things that rely on some notion of normal form), would be (essentially non-empty lists of ints): data Sum = Value Int | Sum Int Sum Now, in many cases you really want to go with the first one for several reasons. You just do not care about normalization in most cases, or maybe you do something more abstract and complicated on top of it that delays the possibility or the sensibility of normalization until something else happens later. So say, that I have the first definition. But then, I want to define equality semantically. An obvious way to go is to produce a function that normalizes Sums (from the first definition) to guarantee that the first sub-sum is always going to be a value, and then check that these two are "equal". And this is where my question comes in, because, of course, the following is infinite recursion: instance Eq Sum where a == b = (normalize a) == (normalize b) What I'd like is to be able to override the default implementation of Eq, but be able to *explicitly* call syntactic equality in calculating it, so that I can do: instance Eq Sum where a == b = (syntacticEq (normalize a) (normalize b)) So, I can see how this is not as straightforward as a function. You cannot correctly produce a polymorphic function syntacticEq :: a -> a -> Bool that works on the SYNTAX of a because it is hidden inside the polymorphic function. What (deriving Eq) does (as I understand it) is to produce an instance at compilation time by looking at how the specific type is presented. But that same compilation time producing could be done to produce it as a function instead of as an implementation of Eq, so that syntacticEq would not be an actual function, but some sort of "macro" that, on compile time, creates the default implementation of equality, but instead of defining it as (==) it defines it as a new function that is only used there. Of course, I can just implement syntactic equality myself explicitly, but when I have to do this for 3, 4, 7 types, some of which have many cases, and a lot of these are polymorphic, so that I have to put all the (Eq arg1, Eq arg2, Eq arg3, ...) => constraints before all of them, it gets old quick. So, is there something like this? Am I saying something really dumb? Of course, I am thinking about equality here, but the same could be said for any standard derivation of classes, like Functor, Show, etc. ALL of them sound like I've wanted to do them at some point in the past. Thanks, Juan. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hi, The usual (cumbersome) solution would be to use a newtype wrapper. Something like: ``` {-# LANGUAGE DerivingStrategies #-} {-# LANGUAGE DerivingVia #-} import Data.Coerce data Sum = Value Int | Sum Sum Sum deriving stock Eq deriving stock Show normalize :: Sum -> Sum normalize (Value i) = Value i normalize (Sum (Value i) b) = Sum (Value i) (normalize b) normalize (Sum (Sum x y) b) = normalize (Sum x (Sum y b)) -- | Normalized sum newtype NSum = NSum Sum deriving Show via Sum instance Eq NSum where NSum a == NSum b = normalize a == normalize b sample1 :: Sum sample1 = Value 10 `Sum` (Value 11 `Sum` Value 12) sample2 :: Sum sample2 = (Value 10 `Sum` Value 11) `Sum` Value 12 sample1' :: NSum sample1' = coerce sample1 sample2' :: NSum sample2' = coerce sample2 test1 = sample1 == sample2 test2 = sample1' == sample2' ``` It would be interesting to combine DerivingVia and DerivingStrategies to allow what you want: ``` data Sum = Value Int | Sum Sum Sum deriving stock Eq via normalize ``` It would require a GHC proposal though (and some more thinking). Sylvain On 17/09/2019 15:56, Juan Casanova wrote:
Hello,
Pretty simple question here really. I think I've searched for it several times in the past and ended up surprised I did not find an answer.
Simple example: sum expressions.
One way to define this, one that is comfortable to construct new sums, would be:
data Sum = Value Int | Sum Sum Sum
Another one, one that is comfortable to check equality with (and other similar things that rely on some notion of normal form), would be (essentially non-empty lists of ints):
data Sum = Value Int | Sum Int Sum
Now, in many cases you really want to go with the first one for several reasons. You just do not care about normalization in most cases, or maybe you do something more abstract and complicated on top of it that delays the possibility or the sensibility of normalization until something else happens later. So say, that I have the first definition.
But then, I want to define equality semantically. An obvious way to go is to produce a function that normalizes Sums (from the first definition) to guarantee that the first sub-sum is always going to be a value, and then check that these two are "equal".
And this is where my question comes in, because, of course, the following is infinite recursion:
instance Eq Sum where a == b = (normalize a) == (normalize b)
What I'd like is to be able to override the default implementation of Eq, but be able to *explicitly* call syntactic equality in calculating it, so that I can do:
instance Eq Sum where a == b = (syntacticEq (normalize a) (normalize b))
So, I can see how this is not as straightforward as a function. You cannot correctly produce a polymorphic function syntacticEq :: a -> a -> Bool that works on the SYNTAX of a because it is hidden inside the polymorphic function. What (deriving Eq) does (as I understand it) is to produce an instance at compilation time by looking at how the specific type is presented. But that same compilation time producing could be done to produce it as a function instead of as an implementation of Eq, so that syntacticEq would not be an actual function, but some sort of "macro" that, on compile time, creates the default implementation of equality, but instead of defining it as (==) it defines it as a new function that is only used there.
Of course, I can just implement syntactic equality myself explicitly, but when I have to do this for 3, 4, 7 types, some of which have many cases, and a lot of these are polymorphic, so that I have to put all the (Eq arg1, Eq arg2, Eq arg3, ...) => constraints before all of them, it gets old quick.
So, is there something like this? Am I saying something really dumb? Of course, I am thinking about equality here, but the same could be said for any standard derivation of classes, like Functor, Show, etc. ALL of them sound like I've wanted to do them at some point in the past.
Thanks, Juan.

I have created an issue: https://gitlab.haskell.org/ghc/ghc/issues/17210 On 17/09/2019 18:03, Sylvain Henry wrote:
Hi,
The usual (cumbersome) solution would be to use a newtype wrapper. Something like:
```
{-# LANGUAGE DerivingStrategies #-} {-# LANGUAGE DerivingVia #-}
import Data.Coerce
data Sum = Value Int | Sum Sum Sum deriving stock Eq deriving stock Show
normalize :: Sum -> Sum normalize (Value i) = Value i normalize (Sum (Value i) b) = Sum (Value i) (normalize b) normalize (Sum (Sum x y) b) = normalize (Sum x (Sum y b))
-- | Normalized sum newtype NSum = NSum Sum deriving Show via Sum
instance Eq NSum where NSum a == NSum b = normalize a == normalize b
sample1 :: Sum sample1 = Value 10 `Sum` (Value 11 `Sum` Value 12)
sample2 :: Sum sample2 = (Value 10 `Sum` Value 11) `Sum` Value 12
sample1' :: NSum sample1' = coerce sample1
sample2' :: NSum sample2' = coerce sample2
test1 = sample1 == sample2
test2 = sample1' == sample2'
```
It would be interesting to combine DerivingVia and DerivingStrategies to allow what you want:
``` data Sum = Value Int | Sum Sum Sum deriving stock Eq via normalize
```
It would require a GHC proposal though (and some more thinking).
Sylvain
On 17/09/2019 15:56, Juan Casanova wrote:
Hello,
Pretty simple question here really. I think I've searched for it several times in the past and ended up surprised I did not find an answer.
Simple example: sum expressions.
One way to define this, one that is comfortable to construct new sums, would be:
data Sum = Value Int | Sum Sum Sum
Another one, one that is comfortable to check equality with (and other similar things that rely on some notion of normal form), would be (essentially non-empty lists of ints):
data Sum = Value Int | Sum Int Sum
Now, in many cases you really want to go with the first one for several reasons. You just do not care about normalization in most cases, or maybe you do something more abstract and complicated on top of it that delays the possibility or the sensibility of normalization until something else happens later. So say, that I have the first definition.
But then, I want to define equality semantically. An obvious way to go is to produce a function that normalizes Sums (from the first definition) to guarantee that the first sub-sum is always going to be a value, and then check that these two are "equal".
And this is where my question comes in, because, of course, the following is infinite recursion:
instance Eq Sum where a == b = (normalize a) == (normalize b)
What I'd like is to be able to override the default implementation of Eq, but be able to *explicitly* call syntactic equality in calculating it, so that I can do:
instance Eq Sum where a == b = (syntacticEq (normalize a) (normalize b))
So, I can see how this is not as straightforward as a function. You cannot correctly produce a polymorphic function syntacticEq :: a -> a -> Bool that works on the SYNTAX of a because it is hidden inside the polymorphic function. What (deriving Eq) does (as I understand it) is to produce an instance at compilation time by looking at how the specific type is presented. But that same compilation time producing could be done to produce it as a function instead of as an implementation of Eq, so that syntacticEq would not be an actual function, but some sort of "macro" that, on compile time, creates the default implementation of equality, but instead of defining it as (==) it defines it as a new function that is only used there.
Of course, I can just implement syntactic equality myself explicitly, but when I have to do this for 3, 4, 7 types, some of which have many cases, and a lot of these are polymorphic, so that I have to put all the (Eq arg1, Eq arg2, Eq arg3, ...) => constraints before all of them, it gets old quick.
So, is there something like this? Am I saying something really dumb? Of course, I am thinking about equality here, but the same could be said for any standard derivation of classes, like Functor, Show, etc. ALL of them sound like I've wanted to do them at some point in the past.
Thanks, Juan.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Tue, Sep 17, 2019 at 02:56:36PM +0100, Juan Casanova wrote:
data Sum = Value Int | Sum Sum Sum [...] But then, I want to define equality semantically. An obvious way to go is to produce a function that normalizes Sums (from the first definition) to guarantee that the first sub-sum is always going to be a value, and then check that these two are "equal".
And this is where my question comes in, because, of course, the following is infinite recursion:
instance Eq Sum where a == b = (normalize a) == (normalize b)
Then why not introduce a datatype which guarantees structurally the that value is normalised and use its Eq? data SumNormalised = ValueNormalised Int | SumNormalised Int Sum deriving Eq normalize :: Sum -> Sum Normalised normalize = <pretty much the same body you would have had before> instance Eq Sum where a == b = (==) `on` normalize Tom

Then why not introduce a datatype which guarantees structurally the that value is normalised and use its Eq?
There are many reasons for it. The first two have to do with code clarity. 1. Adding a datatype means that anytime I wish to use the normalized sum from a sum that I know is already normalized, I have to prepend it with the data constructor. This is something that bothers me outside of the equality checking thing, because once is fine, twice is bearable, but when you end up having 5 unavoidable wrappers on everything you want to use, doing pattern matching becomes very annoying. Yes, you can make it less bad by creating intermediate functions that wrap and unwrap, but this does not always solve the problem. 2. If normalization is only used for equality, for instance, (or equality and few other things) then doing this creates unnecessary duplication of types. For example, I have two sum values, which I care not if they have been normalized or they haven't (in fact, I care that they are only normalized if it has been necessary to do so). At some point, I need to check their equality. With your approach, I need to create the normalized sum version and check equality there, and then of course I can continue using the original values, but the point is I have had to create the added type and explicitly construct it to check for equality. Because I can't just do (==) on the non-normalized sum values wherever I need to check for equality, I need to explicitly transform into normalized form and check equality, and at that point the type helps nothing. 3. It reduces reusability. Say that I need to check for equality many times or between a collection of elements (e.g. to sort them). I can normalize them all once in the beginning to avoid normalizing several times. But then, because they are a different type, I cannot apply to them the other operations that I have defined for the non-normalized type. Sure, I can lift them to the normalized type, and even if this is fairly straightforward, it implies duplicating all the functions that I could ever want to use on a pre-normalized sum (which, to be honest, could be basically all of them). Yes, I can always do clever things like create a type class or whatnot. But that is precisely my point: I am having to do complicated things and writing a lot of code for something that is very straightforward: normalize (here, I'll show you how), and then check for equality. Of course, normalize on an already normalized sum is expected to be very quick, so pre-normalizing a sum that we know will be equality checked many times is gonna be both comfortable and easy. But in order to be able to do that, I have to manually implement syntactic equality on the side. Something which GHC *already knows how to do*. And as I said, this is fine for 1 type. Even 3. But for 15 it starts to be a concern. Of course it's not a major concern, but it's one of those things that seem like you could gain a lot from doing so little. Also of course, if efficiency and reducing normalization to the very minimum are critical, then I would end up having a monad in which the value may or may not have already been normalized, and the first time normalization is necessary, I store the normalized value and use that onwards. But this is, again, too much complexity for something that should be simple. Thanks for the reply anyway, Juan. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

What about phantom types? {-# LANGUAGE KindSignatures, DataKinds #-} data IsNormalized = Normalized | NotNormalized data Sum (n :: IsNormalized) = Value Int | Sum (Sum n) (Sum n) You could still say instance MyClass (Sum n) where… but you could also write normalizeSum :: Sum NotNormalized -> Sum Normalized -- or even create a class with two inhabitants for this instance Eq (Sum NotNormalized) where (==) = (==) `on` normalizeSum instance Eq (Sum Normalized) where … -- real implementation It's not ideal. For example if you want to sort a list, it would still be better to normalize the whole list before sorting, but at least you could still use the other operators afterwards without a "denormalization" step. So maybe it would help reduce some boilerplate? Cheers, MarLinn

Quoting MarLinn
What about phantom types?
{-# LANGUAGE KindSignatures, DataKinds #-} data IsNormalized = Normalized | NotNormalized data Sum (n :: IsNormalized) = Value Int | Sum (Sum n) (Sum n)
normalizeSum :: Sum NotNormalized -> Sum Normalized -- or even create a class with two inhabitants for this instance Eq (Sum NotNormalized) where (==) = (==) `on` normalizeSum instance Eq (Sum Normalized) where … -- real implementation
That looks quite good tbh. I did not know you could do that. Care to explain, are "Normalized" and "NotNormalized" types, data constructors, type families? A combination of them? Of course you still have to write a fair amount of code, but it does look a lot better than what I've been using and what I've seen suggested so far. At least to me. Thanks, Juan. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

On 18/09/2019 15.34, Juan Casanova wrote:
Quoting MarLinn
on Wed, 18 Sep 2019 14:53:32 +0200: What about phantom types?
{-# LANGUAGE KindSignatures, DataKinds #-} data IsNormalized = Normalized | NotNormalized data Sum (n :: IsNormalized) = Value Int | Sum (Sum n) (Sum n)
normalizeSum :: Sum NotNormalized -> Sum Normalized -- or even create a class with two inhabitants for this instance Eq (Sum NotNormalized) where (==) = (==) `on` normalizeSum instance Eq (Sum Normalized) where … -- real implementation
That looks quite good tbh. I did not know you could do that. Care to explain, are "Normalized" and "NotNormalized" types, data constructors, type families? A combination of them?
I'm not an expert, so I apologize if I get some of the details wrong. This is all a trick with kinds. Kinds are to types as types are to values. The key is in the two extensions. DataKinds allows you to use data as kinds. So it's possible to use IsNormalized as a kind inhabited by the two types Normalized and NotNormalized in the same ways as you can use the kind * that is inhabited by all "normal" types. Note that types of kind * are also in turn inhabited by values; types of kind IsNormalized are not, so you can only use it and it's types in signatures for now. KindSignatures then allows you to use the signature in the definition of Sum which tells GHC that the first parameter of Sum has this new kind, so only its inhabitants can be used in this place. You can check this by asking GHCI ":kind Sum". It will tell you that "Sum :: IsNormalized -> *", so Sum has a kind that takes a type of kind IsNormalized and returns a kind of the usual type *. Hope that helps.

Nice! Moreover the "real implementation" can use standalone deriving: ``` {-# LANGUAGE DataKinds #-}{-# LANGUAGE KindSignatures #-}{-# LANGUAGE StandaloneDeriving #-}{-# LANGUAGE FlexibleInstances #-}import Data.Functiondata IsNormalized = Normalized | NotNormalizeddata Sum (e :: IsNormalized) = Value Int | Sum (Sum e) (Sum e) deriving (Show)normalize :: Sum a -> Sum Normalizednormalize (Value i) = Value inormalize (Sum (Value i) b) = Sum (Value i) (normalize b)normalize (Sum (Sum x y) b) = normalize (Sum x (Sum y b))instance Eq (Sum NotNormalized) where (==) = (==) `on` normalizederiving instance Eq (Sum Normalized)sample1 :: Sum NotNormalizedsample1 = Value 10 `Sum` (Value 11 `Sum` Value 12)sample2 :: Sum NotNormalizedsample2 = (Value 10 `Sum` Value 11) `Sum` Value 12test = sample1 == sample2-- True ``` On 18/09/2019 14:53, MarLinn wrote:
What about phantom types?
{-# LANGUAGE KindSignatures, DataKinds #-} data IsNormalized = Normalized | NotNormalized data Sum (n :: IsNormalized) = Value Int | Sum (Sum n) (Sum n)
You could still say
instance MyClass (Sum n) where…
but you could also write
normalizeSum :: Sum NotNormalized -> Sum Normalized -- or even create a class with two inhabitants for this instance Eq (Sum NotNormalized) where (==) = (==) `on` normalizeSum instance Eq (Sum Normalized) where … -- real implementation
It's not ideal. For example if you want to sort a list, it would still be better to normalize the whole list before sorting, but at least you could still use the other operators afterwards without a "denormalization" step. So maybe it would help reduce some boilerplate?
Cheers, MarLinn
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (4)
-
Juan Casanova
-
MarLinn
-
Sylvain Henry
-
Tom Ellis