On Fri, Jul 9, 2021 at 10:11 AM Galaxy Being <borgauf@gmail.com> wrote:

Getting stuck in Real World Haskell's chapter on pretty printing (chap 5), I decided to take a look at one of the suggested sources, i.e., John Hughes' The Design of a Pretty-printing Library. Not that I fully got the first three section, still, let me skip ahead to section 4, Designing a Sequence Type where we're designing a supposedly generic sequence type, I'm guessing in the same universal, fundamental way that Peano's Axioms give a generic underpinning of the natural numbers. Good. I get it. This is why Haskell is unique. So he says we take the following signature as our starting point

nil :: Seq a
unit :: a -> Seq a
cat :: Seq a -> Seq a -> Seq a
list :: Seq a -> [a]

where nil , unit , and cat give us ways to build sequences, and list is an observation [explained before]. The correspondence with the usual list operations is

nil = []
unit x = [x]
cat = (++)

These operations are to satisfy the following laws:

xs `cat` ( ys `cat` zs ) = ( xs `cat` ys ) `cat` zs
nil `cat` xs = xs
xs `cat` nil = xs
list nil = []
list ( unit x `cat` xs ) = x : list xs

My first question is why must we satisfy these laws? Where did these laws come from? Is this a universal truth thing? I've heard the word monoid and semigroup tossed around. Is this germane here?

This is how the paper defines what the word Sequence means in a mathematical way, otherwise there might be some ambiguity for what a Sequence should be. The laws are the design for the Sequence type. Some of them also happen to be the laws that define Monoid and Semigroup.

The first law is associativity, which means Sequence can implement Semigroup. The next two laws are the identity laws, which means Sequence can implement Monoid. https://wiki.haskell.org/Monoid#The_basics

The last two laws say that you can take any Sequence and make a list out of it without losing any information (all elements in the same order). This is also related to Foldable but it's stronger because of the ordering. https://wiki.haskell.org/Foldable_and_Traversable

Reading all of the laws together (Monoid + Foldable with ordering) you can infer that Sequence is isomorphic to list. With mconcat you can turn a list into a Sequence and with list you can get the original list back. This also means that Sequence should be able to lawfully (but perhaps not as efficiently) implement any other typeclass or operation that is implemented for list.

Next in 4.1 Term Representation I'm immediately lost with the concept of term. On a hunch, I borrowed a book Term Rewriting and All That from Baader and Nipkow which I'm sure goes very deep into the concept of terms, but I'm not there yet. What is meant by term? As 4.1 continues, confusing is this passage

The most direct way to represent values of [the?] sequence type is just as terms of the
algebra [huh?], for example using

data Seq a = Nil j Unit a j Seq a `Cat` Seq a

But this trivial representation does not exploit the algebraic laws that we know to
hold [huh?], and moreover the list observation will be a little tricky to de ne (ideally we would like to implement observations by very simple, non-recursive functions: the real work should be done in the implementations of the Seq operators themselves). Instead, we may choose a restricted subset of terms -- call them simplified forms -- into which every term can be put using the algebraic laws. Then we can represent sequences using a datatype that represents the syntax of simplified forms.

This essentially states that the Unit a constructor is redundant because it's equivalent to Seq a Nil. Having fewer possible representations greatly simplifies how easy it is to reason about a data type and usually also simplifies its implementation.

In this case, there is an obvious candidate for simplified forms: terms of the form nil and unit x `cat` xs , where xs is also in simplified form. Simplified forms can be represented using the type

data Seq a = Nil j a `UnitCat` Seq a

with the interpretation

Nil = nil
x `UnitCat` xs = unit x `cat` xs

All of this presupposes much math lore that I'm hereby asking someone on this mailing list to point me towards. I threw in a few [huh?] as specific points, but in general, I'm not there yet. Any explanations appreciated.

This section of the paper explains how a first attempt at implementing a Sequence type from first principles could end up with a representation that is indistinguishable from the list type other than the names (nil = []; cat = (:)). The important part is the end of 4.1 where we learn why this representation doesn't have the desired performance characteristics for this use case and then in 4.2 a more suitable representation is derived.

-bob