Replacing and improving pattern guards with PMC syntax

One of the problems of pattern matching in Haskell is that it might well be regarded as not very declarative: function definition lines cease being generalisable equations, and the definition of the rewriting strategy involved in Haskell pattern matching extends over about three pages, no matter whether you look in the Haskell report, or for a definition of the ``functional rewriting strategy'' in the theoretical literature. <shameless plug> To fix exactly this problem was my original goal when I started developing a family of Pattern Matching Calculi, see http://www.cas.mcmaster.ca/~kahl/PMC/ . (Other pattern matching calculi in the literature have different goals.) The original calculus, which I shall call ``PMC'' for the remainder of this message, is a confluent rewriting system with a normalising strategy that incorporates Haskell pattern matching --- the confluence proof has been mechanised using Isabelle. So far I never considered it important to devise a concrete syntax for PMC, but triggered by the current pattern guards thread on haskell-prime, I now try to give a PMC syntax that blends well with Haskell. PMC ``automatically'' includes the ``expression pattern'' of pattern guards, but organised differently, see below. I am now presenting a series of syntactic extensions to Haskell. without assuming that the reader is familiar with PMC, but I will also explain how these extensions relate with PMC. One of my gripes with Haskell pattern matching is that the anonymous variant, namely case expressions, has an application to an argument built in, and therefore does not by itself produce a function. However, lambda-abstraction already allows single-alternative pattern matching, so I propose to relax this to multi-alternative pattern matching:
length = \ [] -> 0 x : xs -> 1 + length xs
(Lists of alternatives correspond to the ``matchings'' of PMC, and concatenation of those lists (using ``;'' or layout) corresponds to matching alternative in PMC. ``\ m'' then corresponds to ``{| m |}'' in PMC. ) For multi-argument pattern matching, I propose to allow nested matchings:
zip = \ x : xs -> y : ys -> (x,y) : zip xs ys _ -> _ -> []
take' = \ 0 -> _ -> [] n -> { [] -> [] ; x : xs -> x : take (pred n) xs }
This means that the alternative ``->'' now can have either an expression or a list of alternatives as its right argument. (Having an expression there corresponds to ``lifting'' $\leftharpoonup \_ \rightharpoonup$ in PMC.) Pattern guards now can be seen as a special case of alternatives-level argument application (``argument supply'' $\righttriangle$ in PMC). I find it useful to write the argument first, as in PMC. I see two options of writing argument supply in Haskell: either using new keywords, giving rise to the syntax production alt -> match exp with alts or using a keyword infix operator: alt -> exp |> alts (Using this kind of argument supply instead of pattern guards also has the advantage that multiple alternatives become possible.) I start with Simon Peyton Jones' standard example:
clunky env v1 v2 | Just r1 <- lookup env v1 , Just r2 <- lookup env v2 = r1 + r2 | otherwise = v1 + v2
This could now be written using either ``match _ with _'':
clunky env v1 v2 = \ match lookup env v1 with Just r1 -> match lookup env v2 with Just r2 -> r1 + r2 v1 + v2
or infix ``_ |> _'':
clunky env v1 v2 = \ lookup env v1 |> Just r1 -> lookup env v2 |> Just r2 -> r1 + r2 v1 + v2
Note that the syntactic structure is different from pattern guards; here is the fully parenthesised version:
clunky env v1 v2 = \ { lookup env v1 |> { Just r1 -> { lookup env v2 |> { Just r2 -> r1 + r2 }}} ; v1 + v2 }
Boolean guards are matches against True:
take = \ 0 -> _ -> [] n -> match n > 0 with True -> { [] -> [] ; x : xs -> x : take (pred n) xs } _ -> error "take: negative argument"
But Boolean guards can also be kept as syntactic sugar:
take = \ 0 -> _ -> [] n | n > 0 -> { [] -> [] ; x : xs -> x : take (pred n) xs } | otherwise -> error "take: negative argument"
Now we have all the syntax in place --- PMC's ``failure'' is of course the empty alternative (allowed in Haskell 98). For the grammar, the relevant productions in Haskell 98 are: exp10 -> \ apat1 ... apatn -> exp (lambda abstraction, n>=1) | "case" exp "of" { alts } (case expression) alts -> alt1 ; ... ; altn (n>=1) alt -> pat "->" exp [where decls] | pat gdpat [where decls] | (empty alternative) gdpat -> gd "->" exp [ gdpat ] gd -> | exp This would change to the following, where I propose to rename ``alt'' to ``match'' for ``matching'' as in PMC, since this is now a full-fledged syntactic category of importance similar to that of expressions (PMC works essentially because of this separation of syntactic categories): exp10 -> \ { matchs } (lambda abstraction) | "case" exp "of" { matchs } (case expression) matchs -> match1 ; ... ; matchn (alternative, n>=1) match -> expr (result) | pat "->" { matchs } [where decls] (pattern matching) | pat gdpat [where decls] (pat. mat. w. Boolean guards) | (empty alternative) | "match" exp "with" { matchs } (pattern argument supply) gdpat -> gd "->" { matchs } [ gdpat ] gd -> | exp (case expressions would than be syntactic sugar:
case exp of {matchs} = \ match exp with {matchs} )
These productions imply that we would loose multiple abstraction: \ x y z -> 3 becomes \ x -> y -> z -> 3 . For reasons of declarativity, I am also tempted to argue that ``='' should only be allowed for one-line definitions, so we would write:
f 0 -> 1 f n -> n * f (n - 1)
An easy-to-remember motto for this could be ``if it can fall through, use ->''. This would of course break every single textbook and tutorial, so I won't insist... No matter whether with ``->'' or with ``='', the above would of course be syntactic sugar for:
f = \ 0 -> 1 n -> n * f (n - 1)
LAWS ==== The most important nice thing is that we can now use equational reasoning for expressions involving pattern matching. The PMC laws (see the papers for more explanation) translate into the syntax presented above as follows (note that most of these equations are between matchings, i.e., between (lists of) alternatives, and only some are between expressions): --------------------------------------------------------------------- Failure as unit:
{- empty alternative -} ; alts = alts
Failure lifting:
\ { } = undefined
Result lifting:
\ { exp } = exp
Argument propagation into matching:
(\ { alts }) exp = \ { match exp with { alts } }
Argument propagation into expression:
match arg with { exp } = exp arg
First match wins (no backtracking after first success):
exp ; alts = exp
Failure-strictness of argument supply:
match arg with { } = {- empty alternative -}
Distributivity of argument supply over alternative:
match arg with { alts1 ; alts2 } = match arg with { alts1 } ; match arg with { alts2 }
beta (v is a variable):
match arg with { v -> { alts } } = alts[v \ arg] -- substitution
Match when matching the same constructor:
match c(e1,...,en) with {c(p1,...,pn) -> {alts}} = match e1 with {p1 -> ... -> match en with pn -> {alts}}
Mismatch when matching different constructors:
match c(e1,...,em) with {d(p1,...,pn) -> {alts}} = {- empty alternative -}
Strictness of matching against constructors:
match undefined with {c(p1,...,pn) -> {alts}} = undefined
--------------------------------------------------------------------- Together with Strictness of function application in the function:
undefined arg = undefined -- already in Haskell
, these laws, when oriented as rules, produce a confluent rewriting system, and the standard definition of strong head normal form induces a normalising strategy. Besides extending syntactic convenience even beyond pattern guards, adopting the abstract syntax of PMC would therefore also simplify the definition of the semantics of pattern matching in Haskell, and extend the scope of equational reasoning to pattern matching. For the concrete syntax I have no strong feelings, but hope that I provided a starting point for discussion. Wolfram

I'm not sure I follow all the details, but I think I agree ;) with Wolfram on several points, even if I've arrived there via a different route. Some of this may sound strange to those who equate "declarative" with "equational", so let me state in advance that I do not agree with that particular notion - expression-level programs, such as parsers built from parser combinators, can be just as declarative as equations. However, I do agree that pattern matching has become a problem in functional languages (I once knew a fpl where that part of the language and implementation was of roughly the same complexity as the whole rest of it), and Haskell is unfortunately no exception. The problem is not that there is syntactic sugar for pattern matching, but that this isn't sugar coating at all - there is functionality hidden in there that cannot be provided by the remainder of the language. In other words, pattern matching and associated "sugar" become part of Haskell's core, which thus becomes more complex, without offering sufficient compensation in terms of expressiveness. The particular issue at hand is not that case is modeled after let rather than after lambda, but that pattern match failure and fall through are built-in concepts that cannot be programmed in the language but have to be dealt with in specific syntactic forms. Following the old adage of "simplicity through generality", I would prefer instead if we would have available language- constructs for matchings, composition of matchings, and "splicing" of such rule sets back into expressions ("\" in Wolfram's description). Then pattern match fall-through would be programmable, patterns matching constructs could be composed as easily as parser combinators (as an added bonus, getting closer to first-class patterns), case, multi-equations, and do-notation would become true syntactic sugar, and Haskell's core would finally start to become simpler, for once. My own awkward-looking translations were driven by having found two tool's in Haskell that come close to this ideal, even if the syntax is awkward: the mapping of pattern-match failure to "fail" in do-notation, and the use of "fail msg=mzero" in MonadPlus. By these means, matchings (lambdas with patterns and explicit match failure) can be represented as do-blocks: lhs->rhs ==> \x->do { lhs <- return x; return rhs } (1) and then be composed (the examples I gave moved the \s to the front instead and used "mplus" to compose matchings, but we could also lift the compositions to function-level instead), and finally "spliced" back (turning explicit into implicit match failure) using fromJust or suchlike. Adding pattern guards into this translation was straightforward - they simply go between the two returns. Now, instead of repeatingWolfram's arguments about case and multi-equations, let's have a look at do-notation, as used in (1): it is obvious that this part of the translation is the source of the awkwardness I mentioned, as the right-hand side of (1) has a lot more syntax noise than the left-hand side. What we really want here is some way of "lifting" lambda-abstractions so as to make potential pattern-match failures explicit and handleable. Perhaps we can get rid of some of that syntax noise? \x->do { lhs <- return x; return rhs } --> \x->(return x >>= \lhs->return rhs) --> \x->(>>= (\lhs->return rhs)) (return x) --> (>>= (return . (\lhs->rhs))) . return hey, that's great! so the lifting we are looking for is simply lift match = (>>= (return . match)) . return right? wrong, unfortunately. Looking more closely at the translation of do-notation in 3.14 of the Report, we see that it actually creates a new function by syntactic rather than semantic manipulation (in fact mapping the problem of fall-through back to a multi-equation first, then to "fail"), so we have no hope of reproducing the nice behaviour wrt pattern-match failure without using the do-notation, and all the syntax noise that implies. I'm not sure whether Wolfram suggested only a simplication of the specification of pattern matching or an actual reification of the concepts of matching, composition of matching, and splicing of matchings into multi-alternative lambdas. Personally, I'd very much like to see the latter, as this issue frequently confronts me, eg., when embedding domain-specific languages and their patterns in Haskell. When Haskell came into being, it started from the lambda calculus, but nowadays, a substantial portion of Haskell programs use various monads to capture program patterns. If Haskell was designed today, monads would not be a late add-on with some syntax to be translated away, but they would be at the core of the language, with other features translated away into that generalised core. And one of the things that would make possible is to replace some previously built-in and non-composable notions, like pattern-match fall through and composition of rule alternatives, by a smaller, yet more expressive core. And that is always a worthwhile goal for language redesign, isn't it?-) Cheers, Claus PS. Outside of Haskell98, we can come closer to defining that lift function, but it gets rather ugly. Here is a first approximation: lift match = \x->either (fail . show) return $! unsafePerformIO $ (return $! Right $! match x) `Control.Exception.catch` (return . Left) Main> lift head ([False]::[Bool]) >>= print False Main> lift head ([]::[Bool]) >>= print Program error: user error (pattern match failure: head []) Main> lift head ([]::[Bool]) `mplus` (Just True) Just True Main> lift head ([False]::[Bool]) `mplus` (Just True) Just False

The problem is not that there is syntactic sugar for pattern matching, but that this isn't sugar coating at all - there is functionality hidden in there that cannot be provided by the remainder of the language.
In other words, pattern matching and associated "sugar" become part of Haskell's core, which thus becomes more complex, without offering sufficient compensation in terms of expressiveness.
I agree. The pattern matching problem is best solved by supplying sugar which either is compositional or fundamental. The compositional structure behind pattern guards is of course (MonadPlus Maybe). So an idea could be to give the plus in MonadPlus suitable syntactic sugar (which it currently lacks) and get pattern guards for free. This can be too much, but at least there might be some kind of sugar for the specific (MonadPlus Maybe) which actually yields pattern guards. I think of sugar along the lines of the following example f (Right (Right p)) = p f (Left p) = p f p = p and its sugared version f p = fromJust $ | Right q <= p | Right r <= q = r | Left q <= p = q | = p where the nested pattern is split in two parts for purpose of demonstration. I don't know whether this can be parsed, but it's intended to be parenthesized as follows: {| {Right q <= p; {| { Right r <= q; = r;}}; }; {| {Left q <= p; = q;}}; {| {= p;}}; The intention is that | behaves like do with the extra feature that adjacent | are collected together by `mplus`. So the desugaring of a list of | statements is like data | a = | a desugarBar :: [| (Maybe a)] -> Maybe a desugarBar xs = foldr1 mplus [expr | {| expr} <- xs ] Further, pat <= expr is equivalent to pat <- return (expr) and that's why we add <= different from <-. Note that <= is not equivalent to {let pat = expr;} and this is actually the whole point of the story. The {= p;} should of course desugar to {return p;} and can somehow end a | scope. It might be difficult to parse but looks much better than return p. Inside the |, things are like in do notation. This means that the delimiter is (;) and not (,) and we have full (<-) access to monadic actions of type (Maybe a): | Right q <= p; Right r <= q = r <==> do Right q <- return p Right r <- return q return r | val <- lookup key xs; val2 <- lookup key2 xs; = val1+val2 <==> do val <- lookup key xs val2 <- lookup key2 xs return (val1 + val2) It's possible to nest | as we all know it from do | Right q <= p; {| Left r <= p = r | Right r <= p = r } with curly braces added only for clarity. Layout should eliminate them. Note how this works nicely with the fact the the last statement in do notation implicitly determines the returned value. Another thing to consider are boolean guards which could be automatically enclosed by a corresponding (guard): | Right q <= p; p > 5; = p-5 <==> do Right q <- return p guard (p > 5) return (p-5) One last thing is to eliminate fromJust: f x | (interesting things here) should be syntactic sugar for f x = fromJust $ | (interesting things here) Regards, apfelmus

My own awkward-looking translations were driven by having found two tool's in Haskell that come close to this ideal, even if the syntax is awkward: the mapping of pattern-match failure to "fail" in do-notation, and the use of "fail msg=mzero" in MonadPlus. By these means, matchings (lambdas with patterns and explicit match failure) can be represented as do-blocks:
lhs->rhs ==> \x->do { lhs <- return x; return rhs } (1)
and then be composed (the examples I gave moved the \s to the front instead and used "mplus" to compose matchings, but we could also lift the compositions to function-level instead), and finally "spliced" back (turning explicit into implicit match failure) using fromJust or suchlike. Adding pattern guards into this translation was straightforward - they simply go between the two returns. Note that it is very much this issue of 'monadic choice' in the case of
hey, that's great! so the lifting we are looking for is simply
lift match = (>>= (return . match)) . return
right? wrong, unfortunately. Looking more closely at the translation of do-notation in 3.14 of the Report, we see that it actually creates a new function by syntactic rather than semantic manipulation (in fact mapping the problem of fall-through back to a multi-equation first, then to "fail"), so we have no hope of reproducing the nice behaviour wrt pattern-match failure without using the do-notation, and all the syntax noise that implies. The MPC2006 paper has a section describing *which* monad that Haskell98 "bakes in" to its syntactic-level translation. So we agree with your observation that there is a whole 'design space' of what to do on match failure, but Haskell 98 bakes a particular choice in. See that section for how other published papers "bake in" other choices, and how large
Claus Reinke wrote: pattern-match failure which is dealt with (in gory, assembly-level categoretical terms) in the MPC2006 paper you can find at http://www.cas.mcmaster.ca/~kahl/PMC/ (disclaimer: I am a co-author of that particular paper). An early draft of this paper relied heavily on `mplus` with an extra condition -- until it was realized that very few monads satisfy this! This is where 'function lifting' came in, where we essentially add enough arrows "on the left" (in a completely deterministic and typed manner, see the boxed-; and boxed-+ definitions) to pull failure up to the right type, so that failure could be dealt with at the 'right time'. It is only function types that introduce complications -- all other types are quite straightforward to deal with. [Deleted: Claus' derivation of a monadic lifting of pattern-match failure which sure looks equivalent/ very close to what was in our MPC2006 paper]. the design space can really be [for example, using the List monad leads to quite interesting ideas, as does using LogicT].
I'm not sure whether Wolfram suggested only a simplication of the specification of pattern matching or an actual reification of the concepts of matching, composition of matching, and splicing of matchings into multi-alternative lambdas. Wolfram's PMC [1] is an all-out reification of the concepts of matching, composition, splicing, etc. The biggest thing it doesn't handle are some of Barry Jay's excellent ideas on generic pattern-matching (see http://www-staff.it.uts.edu.au/~cbj/patterns/). The issue, as far as I see it, is that although Barry's latest ideas are first-rate, they seem extremely difficult to 'type' [and getting PMC to 'type' was really non-trivial].
And one of the things that would make possible is to replace some previously built-in and non-composable notions, like pattern-match fall through and composition of rule alternatives, by a smaller, yet more expressive core. And that is always a worthwhile goal for language redesign, isn't it?-) Agreed!
Jacques [1] Proper attribution: the PMC is Wolfram's, I just thought it was so cool that I insisted on 'helping' him with the type system...
participants (4)
-
apfelmus@quantentunnel.de
-
Claus Reinke
-
Jacques Carette
-
kahl@cas.mcmaster.ca