Chuch encoding of data structures in Haskell

Hi all, I'm exploring the use of church encodings of algebraic data types in Haskell. Since it's hard to imagine being the first to do so I wonder if folks here could point me to some references on the subject. I'm looking for examples of church encodings in Haskell a little bit beyond Church Booleans and Church Numerals. Günther

2010/5/27 Günther Schmidt
I'm exploring the use of church encodings of algebraic data types in Haskell. Since it's hard to imagine being the first to do so I wonder if folks here could point me to some references on the subject.
I'm looking for examples of church encodings in Haskell a little bit beyond Church Booleans and Church Numerals.
From the above, a general description of Church encoding can be deduced: The encoding of a value is a function that replaces each data constructor with an arbitrary function. The Church encoding represents, in a way, the most generalized means of using values of
The fully general description of Church encoding is rather simple, but I've rarely seen it described explicitly. Consider the type of Church encodings for Bool, Either, and the 2-tuple (written here as "Pair" for clarity): churchedBool :: t -> t -> t churchedEither :: (a -> t) -> (b -> t) -> t churchedPair :: (a -> b -> t) -> t And compare the signatures for the constructors of the non-Church encoded types: True :: Bool False :: Bool Left :: a -> Either a b Right :: b -> Either a b Pair :: a -> b -> Pair a b We can observe two patterns: 1) The Church encodings take as many arguments as the type has constructors 2) The type of each argument is the same as the signature of a constructor, except returning an arbitrary type. As this suggests, "Church decoding" is as simple as applying the Church encoded type to each of the constructors. that type--which is why Haskell already includes variations of "Church encode" functions for a few standard types, like so: encodeEither x = \f g -> either f g x encodeMaybe x = \z f -> maybe z f x encodeTuple x = \f -> uncurry f x encodeBool x = \t e -> if x then t else e But what of Church numerals? First, we must consider the Church-encoding of recursive data types. Given arbitrary nested data types, there's nothing else that can be done--the outer types know nothing of the types they contain. But if an inner type is known to be the same as the outer type, there are two options for the encoding: Work only with the outermost value, as with non-recursive types, or work with the recursive value as a whole, by having the outermost value pass its arguments inward. Now, consider the signature of a Church numeral: churchedNumeral :: (t -> t) -> t -> t Given the above, what can we say about the equivalent "decoded" data type? It takes two arguments, so we have two constructors. The second argument is a single value, so the associated constructor is nullary. The first argument must be associated with a unary constructor, but look at the parameter it takes: the same type as the result! This is how we can tell that Church numerals are the encoding of a recursive type. Since the only way a recursive constructor can do anything with the values it contains is to pass its arguments inward, the value it has to work with is the result of doing so. Thus, the other constructor must take a single argument of its own type. What does this look like as a standard data type? data Nat = S Nat | Z Good old inefficient Peano numbers, of course! Keeping all that in mind, consider a List type, and both ways of encoding it: data List a = Nil | Cons a (List a) churchedListOuter :: t -> (a -> ___ -> t) -> t churchedListRecursive :: t -> (a -> t -> t) -> t For the "outermost" method, what type belongs in place of the ___? The second argument of Cons is itself a List, so naively we would like to simply put the type of "churchedListOuter" itself in place of ___, but that won't work. In fact, nothing will work here, because recursion on an "outermost encoded" list is impossible in a typed λ-calculus without some means of using general recursion provided as a primitive operation. Nested tuples can be used, but the length of the list will be reflected in the type, preventing polymorphism over arbitrary lists. This is also why Church encoding is most often seen in the setting of the untyped λ-calculus. The implicitly recursive encoding, however, presents no such problems. So, perhaps a function to Church encode lists would be useful? Indeed it would, and as before, it already exists: encodeList x = \z f -> foldr f z x Recall the earlier observation that "decoding" involves applying the encoded type to its equivalent constructors; the same holds true for recursive types, as demonstrated by right-folding a list with (:) and [], or applying a Church numeral to 0 and (+ 1). Regarding recursive data types as the least fixed point of a non-recursive data type is thus tied to replacing the "outermost" encoding with the "recursive" encoding, and the "Church encode" function for a recursive type is simply a generalized right fold, partially applied. Now, the descriptions above are rather informal, and ignore the possibility of infinite lazy recursive data structures, among other gaps; but perhaps will help to get you started regardless. - C.

Hello C, thank you for explaining. The funny thing is that I have never seen anybody take this even a single step further than you have in your email. In particular I have not found anything where someone might use church encoding to solve a quite practical problem, namely for implementing extensible records. For instance suppose we'd have data Person = Person { firstName :: String, lastName :: String } person1 :: Person person1 = Person "John" "Doe" alternatively, and I'm not 100% if this what is commonly understood as "church encoding" I'm just guessing, we could write this: person1' :: (String -> String -> x) -> x person1' = \fn -> fn "John" "Doe" Now from a practical standpoint, up to here anyway, the first encoding is far more useful. But there is a problem when we want to extend the Person type by a new field, age for instance. Person as such is *not* extensible, modifying it would require us to rewrite existing code. But person1' is easily extended: person1'' :: (String -> String -> Int -> x) -> x person1'' = \fn -> person1' fn 36 or more generally: (.+.) :: (t1 -> t -> t2) -> t -> t1 -> t2 rec .+. v = \fn -> rec fn v person1''' :: (String -> String -> Int -> x) -> x person1''' = person1'' .+. 36 There is also a way to shrink the "record" from the left: drp :: ((t1 -> t) -> b) -> t -> b drp rec = \fn -> rec $ \_ -> fn I can think of a number of applications for this and especially I believe it should be possible to create more "combinators". The approach is so simple and trivial that it must have occurred to people a hundred times over. Yet I do not find any other examples of this. Whenever I google for church encoding the examples don't go beyond church numerals. Am I googling for the wrong keywords? Best regards Günther Am 27.05.10 19:10, schrieb C. McCann:
2010/5/27 Günther Schmidt
: I'm exploring the use of church encodings of algebraic data types in Haskell. Since it's hard to imagine being the first to do so I wonder if folks here could point me to some references on the subject.
I'm looking for examples of church encodings in Haskell a little bit beyond Church Booleans and Church Numerals.
The fully general description of Church encoding is rather simple, but I've rarely seen it described explicitly. Consider the type of Church encodings for Bool, Either, and the 2-tuple (written here as "Pair" for clarity):
churchedBool :: t -> t -> t churchedEither :: (a -> t) -> (b -> t) -> t churchedPair :: (a -> b -> t) -> t
And compare the signatures for the constructors of the non-Church encoded types:
True :: Bool False :: Bool Left :: a -> Either a b Right :: b -> Either a b Pair :: a -> b -> Pair a b
We can observe two patterns: 1) The Church encodings take as many arguments as the type has constructors 2) The type of each argument is the same as the signature of a constructor, except returning an arbitrary type. As this suggests, "Church decoding" is as simple as applying the Church encoded type to each of the constructors.
From the above, a general description of Church encoding can be deduced: The encoding of a value is a function that replaces each data constructor with an arbitrary function. The Church encoding represents, in a way, the most generalized means of using values of that type--which is why Haskell already includes variations of "Church encode" functions for a few standard types, like so:
encodeEither x = \f g -> either f g x encodeMaybe x = \z f -> maybe z f x encodeTuple x = \f -> uncurry f x encodeBool x = \t e -> if x then t else e
But what of Church numerals? First, we must consider the Church-encoding of recursive data types. Given arbitrary nested data types, there's nothing else that can be done--the outer types know nothing of the types they contain. But if an inner type is known to be the same as the outer type, there are two options for the encoding: Work only with the outermost value, as with non-recursive types, or work with the recursive value as a whole, by having the outermost value pass its arguments inward.
Now, consider the signature of a Church numeral:
churchedNumeral :: (t -> t) -> t -> t
Given the above, what can we say about the equivalent "decoded" data type? It takes two arguments, so we have two constructors. The second argument is a single value, so the associated constructor is nullary. The first argument must be associated with a unary constructor, but look at the parameter it takes: the same type as the result! This is how we can tell that Church numerals are the encoding of a recursive type. Since the only way a recursive constructor can do anything with the values it contains is to pass its arguments inward, the value it has to work with is the result of doing so. Thus, the other constructor must take a single argument of its own type. What does this look like as a standard data type?
data Nat = S Nat | Z
Good old inefficient Peano numbers, of course!
Keeping all that in mind, consider a List type, and both ways of encoding it:
data List a = Nil | Cons a (List a)
churchedListOuter :: t -> (a -> ___ -> t) -> t churchedListRecursive :: t -> (a -> t -> t) -> t
For the "outermost" method, what type belongs in place of the ___? The second argument of Cons is itself a List, so naively we would like to simply put the type of "churchedListOuter" itself in place of ___, but that won't work. In fact, nothing will work here, because recursion on an "outermost encoded" list is impossible in a typed λ-calculus without some means of using general recursion provided as a primitive operation. Nested tuples can be used, but the length of the list will be reflected in the type, preventing polymorphism over arbitrary lists. This is also why Church encoding is most often seen in the setting of the untyped λ-calculus.
The implicitly recursive encoding, however, presents no such problems. So, perhaps a function to Church encode lists would be useful? Indeed it would, and as before, it already exists:
encodeList x = \z f -> foldr f z x
Recall the earlier observation that "decoding" involves applying the encoded type to its equivalent constructors; the same holds true for recursive types, as demonstrated by right-folding a list with (:) and [], or applying a Church numeral to 0 and (+ 1).
Regarding recursive data types as the least fixed point of a non-recursive data type is thus tied to replacing the "outermost" encoding with the "recursive" encoding, and the "Church encode" function for a recursive type is simply a generalized right fold, partially applied.
Now, the descriptions above are rather informal, and ignore the possibility of infinite lazy recursive data structures, among other gaps; but perhaps will help to get you started regardless.
- C.

Of interest, (.+.) is the T combinator - called (##) in Peter Thiemann's Wash and the queer bird in Raymond Smullyan's To Mock a Mockingbird. Your technique might well relate to the 'element transforming style' of Wash, see the Modelling HTML in Haskell paper. Best wishes Stephen

The approach is so simple and trivial that it must have occurred to people a hundred times over. Yet I do not find any other examples of this. Whenever I google for church encoding the examples don't go beyond church numerals.
Am I googling for the wrong keywords?
You might find "Typing Record Concatenation for Free" interesting: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.401 Claus

On May 27, 2010, at 13:44 , Günther Schmidt wrote:
The approach is so simple and trivial that it must have occurred to people a hundred times over. Yet I do not find any other examples of this. Whenever I google for church encoding the examples don't go beyond church numerals.
Hm. If I reorder your (.+.) slightly and reparenthesize: reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) churchedNumeral :: (t -> t ) -> t -> t t unifies with (t1 -> t2), giving us a Church numeral made up of (t1,t2). (I think.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On May 27, 2010, at 19:07 , Brandon S. Allbery KF8NH wrote:
reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) churchedNumeral :: (t -> t ) -> t -> t
t unifies with (t1 -> t2), giving us a Church numeral made up of (t1,t2). (I think.)
Which also explains why that record representation isn't used: it's as inefficient as Peano numbers are. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Thursday 27 May 2010 7:15:15 pm Brandon S. Allbery KF8NH wrote:
On May 27, 2010, at 19:07 , Brandon S. Allbery KF8NH wrote:
reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) churchedNumeral :: (t -> t ) -> t -> t
t unifies with (t1 -> t2), giving us a Church numeral made up of (t1,t2). (I think.)
Which also explains why that record representation isn't used: it's as inefficient as Peano numbers are.
Church encoding has additional efficiency problems, too. If we look at Peano numerals for instance, the Church encoding is: forall r. r -> (r -> r) -> r This means that our only method of computation with them is the structurally recursive fold. So, suppose we want to implement a predecessor operation. With Haskell-like data, we do: pred :: Nat -> Nat pred Zero = Zero pred (Suc n) = n This operation is expected to be O(1). What if we want to write this using only fold? Well, we have to do tricks (this same trick works to define tail with foldr): pred n = snd $ fold (Zero, Zero) (\ ~(m, _) -> (Suc m, m)) So, we've done it, but if you look closely, you'll see that we've actually rebuilt the entire 'tail' of the number. So this definition is O(n) (at least, if it all gets evaluated). So, Church encodings are less efficient than a direct implementation is capable of being. There is a dual problem with codata. You can encode the extended natural numbers as: exists s. (s, s -> Either () s) but this only gives you the anamorphism. So, if you want to write the successor operation, it's: succ n = (Nothing, step) where step Nothing = Right (Just n) step (Just m) = case observe m of Left () -> Left () Right m' -> Right (Just m') so we end up re-unfolding the entire number. Taking the paramorphism/apomorphism as a primitive operation fixes this, I believe, but you cannot, to my knowledge, encode types that way in, say, System F, because it would involve: Nat = forall r. r -> (Nat -> r -> r) -> r which is still a recursive equation. -- Dan

2010/5/27 Günther Schmidt
Hello C,
thank you for explaining.
The funny thing is that I have never seen anybody take this even a single step further than you have in your email.
In particular I have not found anything where someone might use church encoding to solve a quite practical problem, namely for implementing extensible records.
I think this article uses Church encoded lists, I could be wrong: http://okmij.org/ftp/Algorithms.html#zip-folds Jason

churchedBool :: t -> t -> t
Important detail: the precise type is "∀t. t → t → t".
encodeBool x = \t e -> if x then t else e
So the type of encodeBool should be: Bool → ∀t. t → t → t whereas Haskell will infer it to be ∀t. Bool → t → t → t which means that a given object can only be eliminated to one type. I.e. to make such an encoding really usable, you need "deep polymorphism" (which GHC supports just fine, but which is not part of the Haskell standard). Stefan

On Thu, May 27, 2010 at 9:11 PM, Stefan Monnier
I.e. to make such an encoding really usable, you need "deep polymorphism" (which GHC supports just fine, but which is not part of the Haskell standard).
Ah, yes, and thank you for pointing that out. My message involved a great deal of hand-waving that I neglected to clearly identify as such. The same caveat of course applies to most Church encodings. For instance, the proper type of a (recursive) encoded list would be: type ChurchList a = ∀t. t -> (a -> t -> t) -> t ...where "a" is fixed as the type of the list elements. Thus, "cons" ought to look something like this charming type: cons :: ∀a. a -> (∀t. t -> (a -> t -> t) -> t) -> (∀t. t -> (a -> t -> t) -> t) For extra fun, try writing an instance such as: instance (Show a) => Show (ChurchList a) where [...etc.] ...at which point, perhaps, we remind ourselves that the language is named "Haskell", not "Alonzo", and drop the whole matter. - C.

Stefan Monnier wrote:
churchedBool :: t -> t -> t
Important detail: the precise type is "∀t. t → t → t".
encodeBool x = \t e -> if x then t else e
So the type of encodeBool should be:
Bool → ∀t. t → t → t
whereas Haskell will infer it to be
∀t. Bool → t → t → t
Those are the same type.
which means that a given object can only be eliminated to one type.
No, the t is universally quantified just fine. Perhaps you're thinking of the need for rank-2 polymorphism in functions which take Church-encoded arguments? -- Live well, ~wren

On 28 May 2010 15:18, wren ng thornton
Stefan Monnier wrote:
churchedBool :: t -> t -> t
Important detail: the precise type is "∀t. t → t → t".
encodeBool x = \t e -> if x then t else e
So the type of encodeBool should be:
Bool → ∀t. t → t → t
whereas Haskell will infer it to be
∀t. Bool → t → t → t
Those are the same type.
I can see a slight distinction between them, based upon when the quantification occurs (the former returns a function that work on all t, the latter will do that encoding for all t). For most purposes there is no difference, but IIUC the former will let you do hlist style stuff post-encoding whilst the latter doesn't. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Ivan Miljenovic wrote:
On 28 May 2010 15:18, wren ng thornton
wrote: Stefan Monnier wrote:
churchedBool :: t -> t -> t Important detail: the precise type is "∀t. t → t → t".
encodeBool x = \t e -> if x then t else e So the type of encodeBool should be:
Bool → ∀t. t → t → t
whereas Haskell will infer it to be
∀t. Bool → t → t → t
Those are the same type.
I can see a slight distinction between them, based upon when the quantification occurs (the former returns a function that work on all t, the latter will do that encoding for all t).
The isomorphism in System F: in : (∀t. Bool → t → t → t) → (Bool → ∀t. t → t → t) in f = \b. /\t. f @t b out : (Bool → ∀t. t → t → t) → (∀t. Bool → t → t → t) out g = /\t. \b. g b @t If you're using the raw System F, then there's a slight difference ---as there always is between isomorphic things which are non-identical--- but that difference is not theoretically significant. If you're using a higher-level language like Haskell, then the conversions are done for you, so there's even less difference. The expression 1+1 is distinct from 2: it has a different representation in memory, it takes a different number of steps to normalize, etc. Sometimes these differences are important for engineering (e.g., dealing with space leaks), but they're rarely significant for theory. -- Live well, ~wren

Günther Schmidt
Hi all,
I'm exploring the use of church encodings of algebraic data types in Haskell. Since it's hard to imagine being the first to do so I wonder if folks here could point me to some references on the subject.
I'm looking for examples of church encodings in Haskell a little bit beyond Church Booleans and Church Numerals.
It's not quite practical, but have a look at Paul Bailes' Totally Functional Programming papers: http://www.itee.uq.edu.au/~paul/tfp-papers/ Also, Types and Programming Languages by Benjamin Pierce (who keeps reminding me of M*A*S*H, even though his middle name starts with a C...) has a decent section on church encodings. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com
participants (11)
-
Brandon S. Allbery KF8NH
-
C. McCann
-
Claus Reinke
-
Dan Doel
-
Günther Schmidt
-
Ivan Lazar Miljenovic
-
Ivan Miljenovic
-
Jason Dagit
-
Stefan Monnier
-
Stephen Tetley
-
wren ng thornton