Help understanding John Hughes' "The Design of a Pretty-printing Library"

Getting stuck in *Real World Haskell*'s chapter on pretty printing (chap 5), I decided to take a look at one of the suggested sources, i.e., John Hughes' *The Design of a Pretty-printing Library https://www.researchgate.net/publication/226525714_The_Design_of_a_Pretty-pr...*. Not that I fully got the first three section, still, let me skip ahead to section 4, *Designing a Sequence Type *where we're designing a supposedly generic sequence type, I'm guessing in the same universal, fundamental way that Peano's Axioms give a generic underpinning of the natural numbers. Good. I get it. This is why Haskell is unique. So he says we take the following *signature* as our starting point nil :: Seq a unit :: a -> Seq a cat :: Seq a -> Seq a -> Seq a list :: Seq a -> [a] where nil , unit , and cat give us ways to build sequences, and list is an *observation *[explained before]. The correspondence with the usual list operations is nil = [] unit x = [x] cat = (++) These operations are to satisfy the following laws: xs `cat` ( ys `cat` zs ) = ( xs `cat` ys ) `cat` zs nil `cat` xs = xs xs `cat` nil = xs list nil = [] list ( unit x `cat` xs ) = x : list xs My first question is why must we satisfy these laws? Where did these laws come from? Is this a universal truth thing? I've heard the word *monoid *and *semigroup* tossed around. Is this germane here? Next in *4.1 Term Representation* I'm immediately lost with the concept of *term. *On a hunch, I borrowed a book *Term Rewriting and All That *from Baader and Nipkow which I'm sure goes very deep into the concept of terms, but I'm not there yet. What is meant by *term*? As 4.1 continues, confusing is this passage The most direct way to represent values of [the?] sequence type is just as terms of the algebra [huh?], for example using data Seq a = Nil j Unit a j Seq a `Cat` Seq a But this trivial representation does not exploit the algebraic laws that we know to hold [huh?], and moreover the list observation will be a little tricky to de ne (ideally we would like to implement observations by very simple, non-recursive functions: the real work should be done in the implementations of the Seq operators themselves). Instead, we may choose a restricted subset of terms -- call them simplified forms -- into which every term can be put using the algebraic laws. Then we can represent sequences using a datatype that represents the syntax of simplified forms. In this case, there is an obvious candidate for simplified forms: terms of the form nil and unit x `cat` xs , where xs is also in simplified form. Simplified forms can be represented using the type data Seq a = Nil j a `UnitCat` Seq a with the interpretation Nil = nil x `UnitCat` xs = unit x `cat` xs All of this presupposes much math lore that I'm hereby asking someone on this mailing list to point me towards. I threw in a few [huh?] as specific points, but in general, I'm not there yet. Any explanations appreciated. ⨽ Lawrence Bottorff Grand Marais, MN, USA borgauf@gmail.com

On Fri, Jul 9, 2021 at 10:11 AM Galaxy Being
Getting stuck in *Real World Haskell*'s chapter on pretty printing (chap 5), I decided to take a look at one of the suggested sources, i.e., John Hughes' *The Design of a Pretty-printing Library https://www.researchgate.net/publication/226525714_The_Design_of_a_Pretty-pr...*. Not that I fully got the first three section, still, let me skip ahead to section 4, *Designing a Sequence Type *where we're designing a supposedly generic sequence type, I'm guessing in the same universal, fundamental way that Peano's Axioms give a generic underpinning of the natural numbers. Good. I get it. This is why Haskell is unique. So he says we take the following *signature* as our starting point
nil :: Seq a unit :: a -> Seq a cat :: Seq a -> Seq a -> Seq a list :: Seq a -> [a]
where nil , unit , and cat give us ways to build sequences, and list is an *observation *[explained before]. The correspondence with the usual list operations is
nil = [] unit x = [x] cat = (++)
These operations are to satisfy the following laws:
xs `cat` ( ys `cat` zs ) = ( xs `cat` ys ) `cat` zs nil `cat` xs = xs xs `cat` nil = xs list nil = [] list ( unit x `cat` xs ) = x : list xs
My first question is why must we satisfy these laws? Where did these laws come from? Is this a universal truth thing? I've heard the word *monoid *and *semigroup* tossed around. Is this germane here?
This is how the paper defines what the word Sequence means in a mathematical way, otherwise there might be some ambiguity for what a Sequence should be. The laws are the design for the Sequence type. Some of them also happen to be the laws that define Monoid and Semigroup. The first law is associativity, which means Sequence can implement Semigroup. The next two laws are the identity laws, which means Sequence can implement Monoid. https://wiki.haskell.org/Monoid#The_basics The last two laws say that you can take any Sequence and make a list out of it without losing any information (all elements in the same order). This is also related to Foldable but it's stronger because of the ordering. https://wiki.haskell.org/Foldable_and_Traversable Reading all of the laws together (Monoid + Foldable with ordering) you can infer that Sequence is isomorphic to list. With mconcat you can turn a list into a Sequence and with list you can get the original list back. This also means that Sequence should be able to lawfully (but perhaps not as efficiently) implement any other typeclass or operation that is implemented for list. Next in *4.1 Term Representation* I'm immediately lost with the concept of *term.
*On a hunch, I borrowed a book *Term Rewriting and All That *from Baader and Nipkow which I'm sure goes very deep into the concept of terms, but I'm not there yet. What is meant by *term*? As 4.1 continues, confusing is this passage
The most direct way to represent values of [the?] sequence type is just as terms of the algebra [huh?], for example using
data Seq a = Nil j Unit a j Seq a `Cat` Seq a
But this trivial representation does not exploit the algebraic laws that we know to hold [huh?], and moreover the list observation will be a little tricky to de ne (ideally we would like to implement observations by very simple, non-recursive functions: the real work should be done in the implementations of the Seq operators themselves). Instead, we may choose a restricted subset of terms -- call them simplified forms -- into which every term can be put using the algebraic laws. Then we can represent sequences using a datatype that represents the syntax of simplified forms.
This essentially states that the Unit a constructor is redundant because it's equivalent to Seq a Nil. Having fewer possible representations greatly simplifies how easy it is to reason about a data type and usually also simplifies its implementation.
In this case, there is an obvious candidate for simplified forms: terms of the form nil and unit x `cat` xs , where xs is also in simplified form. Simplified forms can be represented using the type
data Seq a = Nil j a `UnitCat` Seq a
with the interpretation
Nil = nil x `UnitCat` xs = unit x `cat` xs
All of this presupposes much math lore that I'm hereby asking someone on this mailing list to point me towards. I threw in a few [huh?] as specific points, but in general, I'm not there yet. Any explanations appreciated.
This section of the paper explains how a first attempt at implementing a Sequence type from first principles could end up with a representation that is indistinguishable from the list type other than the names (nil = []; cat = (:)). The important part is the end of *4.1* where we learn why this representation doesn't have the desired performance characteristics for this use case and then in *4.2* a more suitable representation is derived. -bob

On Jul 9, 2021, at 10:10 AM, Galaxy Being
wrote: Getting stuck in Real World Haskell's chapter on pretty printing (chap 5), I decided to take a look at one of the suggested sources, i.e., John Hughes' The Design of a Pretty-printing Library. Not that I fully got the first three section, still, let me skip ahead to section 4, Designing a Sequence Type where we're designing a supposedly generic sequence type, I'm guessing in the same universal, fundamental way that Peano's Axioms give a generic underpinning of the natural numbers.
Sort of. But he explicitly says he’s not trying to define a new sequence type (more general than lists or something), but rather see if we could start from some requirements and from there deduce the list type. This is in the context of “deriving functional programs from specifications”, as an example case. I didn't read the whole paper so I'm not sure if this is directly relevant to designing a pretty printing library, or if it's just warm up. I suspect that the paper is more about deriving programs from specification than it is about pretty printing.
Good. I get it. This is why Haskell is unique.
BTW this isn’t how everyone approaches writing Haskell programs on a day-to-day basis.
So he says we take the following signature as our starting point
nil :: Seq a unit :: a -> Seq a cat :: Seq a -> Seq a -> Seq a list :: Seq a -> [a]
where nil , unit , and cat give us ways to build sequences, and list is an observation [explained before]. The correspondence with the usual list operations is
nil = [] unit x = [x] cat = (++)
These operations are to satisfy the following laws:
xs `cat` ( ys `cat` zs ) = ( xs `cat` ys ) `cat` zs nil `cat` xs = xs xs `cat` nil = xs list nil = [] list ( unit x `cat` xs ) = x : list xs
My first question is why must we satisfy these laws? Where did these laws come from? Is this a universal truth thing?
These are just his requirements for how you’d expect a sequence to behave. (Concatenation is associative and appending an empty sequence does nothing, etc.)
I've heard the word monoid and semigroup tossed around. Is this germane here?
The first three are the criteria for being a monoid, but that doesn’t explain anything because it doesn’t tell you why you’d want your sequence type to be a monoid. So it’s something to notice but it doesn’t explain the requirements. The requirements he just made up, from thinking about the concept of a sequence.
Next in 4.1 Term Representation I'm immediately lost with the concept of term. On a hunch, I borrowed a book Term Rewriting and All That from Baader and Nipkow which I'm sure goes very deep into the concept of terms, but I'm not there yet. What is meant by term?
Oh no it’s much simpler. “Term” is just the math/logic version of the concept “value”. In programming, you have types (like Int) and values (like 3). In logic-speak 3 is a term (to distinguish it from “3 + 5”, which is an expression or a formula). A term is a mathematical object.
As 4.1 continues, confusing is this passage
The most direct way to represent values of [the?] sequence type is just as terms of the algebra [huh?], for example using
He’s talking about how you’d map the logical concepts into Haskell--how you would "represent the terms as values" or how you would decide what Haskell value would faithfully represent the conceptual term. (Basically, “value of the type” is the Haskell side, and “terms of the algebra” is the math/logic side.)
data Seq a = Nil | Unit a | Seq a `Cat` Seq a
The nil/unit/cat from above are functions for constructing a sequence, so you could take that literally and represent a sequence by just making each of those functions be constructors. That’s sort of the simplest thing you could do, so he’s trying that first. So in this approach: nil = Nil unit x = Unit x cat x y = Cat x y
But this trivial representation does not exploit the algebraic laws that we know to hold [huh?]
He’s just saying that doing it this simple way doesn’t satisfy the equalities we want above. For instance: nil `cat` xs results in: Cat Nil xs which is not the same as xs (but rather, it's a new data structure with xs as part of it). And then he goes on to build toward a different representation that satisfies the requirements. I haven't looked at the Real World Haskell book in a while, but this paper is assuredly much more conceptual/abstract than the book, so you are probably getting help from something harder than where you started. It's interesting but keep in mind that research papers are often not simple to understand (and a book is at least trying to be). I hope that helps a bit. Jeff
participants (3)
-
Bob Ippolito
-
Galaxy Being
-
Jeff Clites