Associativity of the generic representation of sum types

Hello, I just used the new GHC generics together with the DefaultSignatures extension to provide a default generic implementation for toJSON and parseJSON in the aeson package: https://github.com/mailrank/aeson/pull/26 It appears that the generic representation of a sum type has a tree shape as in: (a :+: b) :+: (c :+: d) In my case this tree-shaped representation is problematic when parsing a JSON value to this type. My overloaded parsing function is parameterized with a key which specifies which of the a, b, c or d constructors to parse. When it encounters a constructor it checks if it matches the key, if so it is parsed, if not parsing will fail. Because of the tree-shaped representation of sum types I have to recursively parse the left and right branch and join them using <|>: https://github.com/basvandijk/aeson/blob/d5535817ceb192aa9d7d0d0b291e1901f3f... I don't know for sure but I suspect that this can cause memory leaks since the <|> has to keep the right value in memory when it is parsing the left. Ideally the generic representation of sum types is right associative as in: a :+: (b :+: (c :+: d)) This way I only have to check if 'a' matches, if it does the right branch can be forgotten. Is there a good reason for not having it right associative? Regards, Bas

Hi Bas,
On Thu, Sep 22, 2011 at 03:55, Bas van Dijk
Hello,
I just used the new GHC generics together with the DefaultSignatures extension to provide a default generic implementation for toJSON and parseJSON in the aeson package:
https://github.com/mailrank/aeson/pull/26
It appears that the generic representation of a sum type has a tree shape as in:
(a :+: b) :+: (c :+: d)
That is correct.
In my case this tree-shaped representation is problematic when parsing a JSON value to this type. My overloaded parsing function is parameterized with a key which specifies which of the a, b, c or d constructors to parse. When it encounters a constructor it checks if it matches the key, if so it is parsed, if not parsing will fail. Because of the tree-shaped representation of sum types I have to recursively parse the left and right branch and join them using <|>:
https://github.com/basvandijk/aeson/blob/d5535817ceb192aa9d7d0d0b291e1901f3f...
I don't know for sure but I suspect that this can cause memory leaks since the <|> has to keep the right value in memory when it is parsing the left.
It is not immediately clear to me why this would cause memory leaks...
Ideally the generic representation of sum types is right associative as in:
a :+: (b :+: (c :+: d))
This way I only have to check if 'a' matches, if it does the right branch can be forgotten.
Is there a good reason for not having it right associative?
The reason is performance. In particular for large datatypes with many constructors, a balanced sum-of-products performs better than a right-nested one. Also, it makes things like writing generic space-efficient encoders/decoders easier. But I would be very interested in understanding if/how the balanced view leads to a space leak, so please let me know if you can provide some more information. Thanks, Pedro
Regards,
Bas
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

2011/9/22 José Pedro Magalhães
Hi Bas,
On Thu, Sep 22, 2011 at 03:55, Bas van Dijk
wrote: Hello,
I just used the new GHC generics together with the DefaultSignatures extension to provide a default generic implementation for toJSON and parseJSON in the aeson package:
https://github.com/mailrank/aeson/pull/26
It appears that the generic representation of a sum type has a tree shape as in:
(a :+: b) :+: (c :+: d)
That is correct.
In my case this tree-shaped representation is problematic when parsing a JSON value to this type. My overloaded parsing function is parameterized with a key which specifies which of the a, b, c or d constructors to parse. When it encounters a constructor it checks if it matches the key, if so it is parsed, if not parsing will fail. Because of the tree-shaped representation of sum types I have to recursively parse the left and right branch and join them using <|>:
https://github.com/basvandijk/aeson/blob/d5535817ceb192aa9d7d0d0b291e1901f3f...
I don't know for sure but I suspect that this can cause memory leaks since the <|> has to keep the right value in memory when it is parsing the left.
It is not immediately clear to me why this would cause memory leaks...
Ideally the generic representation of sum types is right associative as in:
a :+: (b :+: (c :+: d))
This way I only have to check if 'a' matches, if it does the right branch can be forgotten.
Is there a good reason for not having it right associative?
The reason is performance. In particular for large datatypes with many constructors, a balanced sum-of-products performs better than a right-nested one. Also, it makes things like writing generic space-efficient encoders/decoders easier.
But I would be very interested in understanding if/how the balanced view leads to a space leak, so please let me know if you can provide some more information.
Thanks, Pedro
Regards,
Bas
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Hi José, After thinking about this some more, I don't think I need to worry about space leaks. The only worry I have is that in the following code: ------------------------------------------------------------------------ class GFromSum f where gParseSum :: Pair -> Parser (f a) instance (GFromSum a, GFromSum b) => GFromSum (a :+: b) where gParseSum keyVal = fmap L1 (gParseSum keyVal) <|> fmap R1 (gParseSum keyVal) instance (Constructor c, GFromJSON a) => GFromSum (C1 c a) where gParseSum (key, value) | key == pack (conName (undefined :: t c a p)) = gParseJSON value | otherwise = notFound $ unpack key notFound :: String -> Parser a notFound key = fail $ "The key \"" ++ key ++ "\" was not found" ------------------------------------------------------------------------ when gParseSum determines that the key equals the name of the constructor, it is going to parse the value. However what happens when the parsing of the value fails? Ideally it would immediately terminate. However because of the <|> it will try all other branches. This is unnecessary computation. Fortunately parsing those other branches will immediately fail because non will have a constructor that equals the key. So maybe I'm just worrying to much :-) Regards, Bas

Hi José, I have another related question: (Excuse me for the big email, I had trouble making it smaller) I discovered a bug in my code that converts a product into a JSON value. I would like to convert products without field selectors into Arrays (type Array = Vector Value) and products with field selectors (records) into Objects (type Object = Map Text Value). Currently my code makes the wrong assumption that product types are build in a right associative way so that I can simply do this: --------------------------------------------------------------------- -- Products without field selectors: instance (GToJSON a, Flatten b) => GToJSON (S1 NoSelector a :*: b) where gToJSON = toJSON . flatten -- Other products, so products with field selectors (records): instance (GObject a, GObject b) => GToJSON (a :*: b) where gToJSON = Object . gObject --------------------------------------------------------------------- Note that flatten converts the product into a list of Values: --------------------------------------------------------------------- class Flatten f where flatten :: f a -> [Value] instance (GToJSON a, Flatten b) => Flatten (S1 NoSelector a :*: b) where flatten (m1 :*: r) = gToJSON m1 : flatten r instance (GToJSON a) => Flatten (S1 NoSelector a) where flatten m1 = [gToJSON $ unM1 m1] --------------------------------------------------------------------- and gObject convert the product into an Object: --------------------------------------------------------------------- class GObject f where gObject :: f a -> Object instance (GObject a, GObject b) => GObject (a :*: b) where gObject (a :*: b) = gObject a `M.union` gObject b instance (Selector s, GToJSON a) => GObject (S1 s a) where gObject = M.singleton (pack (selName m1)) (gToJSON (unM1 m1)) --------------------------------------------------------------------- The problem of course is that products have a tree-shape (as in: (a :*: b) :*: (c :*: d)) which causes the wrong instance to be selected. I tried to solve it in the following way: There's only one GToJSON instance for products: --------------------------------------------------------------------- instance (ToValue (ProdRes (a :*: b)), GProductToJSON a, GProductToJSON b) => GToJSON (a :*: b) where gToJSON = toValue . gProductToJSON --------------------------------------------------------------------- It uses the overloaded helper function gProductToJSON which converts a product into a ProdRes. A ProdRes is an associated type family which for products without field selectors equals a difference list of Values and for records equals an Object: --------------------------------------------------------------------- class GProductToJSON f where type ProdRes f :: * gProductToJSON :: f a -> ProdRes f instance GToJSON a => GProductToJSON (S1 NoSelector a) where type ProdRes (S1 NoSelector a) = DList Value gProductToJSON = singleton . gToJSON instance (Selector s, GToJSON a) => GProductToJSON (S1 s a) where type ProdRes (S1 s a) = Object gProductToJSON m1 = M.singleton (pack (selName m1)) (gToJSON (unM1 m1)) --------------------------------------------------------------------- The gProductToJSON for products recursively converts the left and right branches to a ProdRes and unifies them using 'union': --------------------------------------------------------------------- instance (GProductToJSON a, GProductToJSON b, ProdRes a ~ ProdRes b) => GProductToJSON (a :*: b) where type ProdRes (a :*: b) = ProdRes a -- or b gProductToJSON (a :*: b) = gProductToJSON a `union` gProductToJSON b class Union r where union :: r -> r -> r instance Union (DList Value) where union = append instance Union Object where union = M.union --------------------------------------------------------------------- Finally, the overloaded toValue turns the ProdRes into a JSON value. --------------------------------------------------------------------- class ToValue r where toValue :: r -> Value instance ToValue (DList Value) where toValue = toJSON . toList instance ToValue Object where toValue = Object --------------------------------------------------------------------- Difference lists are simply: --------------------------------------------------------------------- type DList a = [a] -> [a] toList :: DList a -> [a] toList = ($ []) singleton :: a -> DList a singleton = (:) append :: DList a -> DList a -> DList a append = (.) --------------------------------------------------------------------- The problem with this code is that I get the following error: Conflicting family instance declarations: type ProdRes (S1 NoSelector a) type ProdRes (S1 s a) I was under the impression that GHC would be able to resolve this simply by choosing the most specific type (just as it does with type classes). Unfortunately it doesn't. So I'm a bit stuck now. How would you solve it? What would make all this much easier is if the meta-information of constructors had a flag which indicated if it was a record or not. Could this be added? Regards, Bas

2011/9/22 Bas van Dijk
What would make all this much easier is if the meta-information of constructors had a flag which indicated if it was a record or not. Could this be added?
I just discovered the predicate: -- | Marks if this constructor is a record conIsRecord :: t c (f :: * -> *) a -> Bool I think this can solve my problem. Bas

2011/9/22 Bas van Dijk
I just discovered the predicate:
-- | Marks if this constructor is a record conIsRecord :: t c (f :: * -> *) a -> Bool
I think this can solve my problem.
I think I have solved the bug now using conIsRecord. This is the new implementation: https://github.com/basvandijk/aeson/blob/newGenerics/Data/Aeson/Types/Intern... However, I would still very much like to have the information, whether a constructor is a record or not, statically available. This has two advantages: * More efficient: programs can make a static instead of a dynamic choice. * No more ugly undefined instances: because the information is not statically available I need to add several "undefined" instances like: instance GFromRecord (a :+: b) where gParseRecord = undefined instance GFromRecord U1 where gParseRecord = undefined instance GFromRecord (K1 i c) where gParseRecord = undefined instance GFromRecord (M1 i c f) where gParseRecord = undefined These instances will never be evaluated at runtime. They only exist to satisfy the type-checker. So I propose making the following changes to GHC.Generics: Add a phantom type to C that specifies whether it's a record or not using the (empty) datatypes: data Record data Product Maybe it's also nice to have the type synonyms: type R1 = M1 (C Record) type P1 = M1 (C Product) I will make an official ticket for this. Regards, Bas

2011/9/22 Bas van Dijk
I will make an official ticket for this.

Hi Bas,
Thanks for all the effort you put into this. Let's move the discussion to
trac, then.
Cheers,
Pedro
2011/9/22 Bas van Dijk
2011/9/22 Bas van Dijk
: I will make an official ticket for this.
participants (2)
-
Bas van Dijk
-
José Pedro Magalhães