
Very interesting Conor. Do you know Xi et al's APLAS'03 paper? (Hongwei, I'm not sure whether you're on this list). Xi et al. use GRDTs (aka GADTs aka first-class phantom types) to represent XML documents. There're may be some connections between what you're doing and Xi et al's work. I believe that there's an awful lot you can do with GADTs (in the context of regular expressions). But there're two things you can't accomplish: semantic subtyping and regular expression pattern matching a la XDuce/CDuce. Unless somebody out there proves me wrong. Martin Hi folks, Inspired by Ralf's post, I thought I'd just GADTize a dependently typed=20 program I wrote in 2001. Wolfgang Jeltsch wrote:
Now lets consider using an algebraic datatype for regexps:
data RegExp =3D Empty | Single Char | RegExp :+: RegExp | RegExp :|: RegExpt | Ite= r RegExp
Manipulating regular expressions now becomes easy and safe =E2=80=93 you= are just not=20 able to create "syntactically incorrect regular expressions" since durin= g=20 runtime you don't deal with syntax at all. =20
A fancier variation on the same theme...
data RegExp :: * -> * -> * where Zero :: RegExp tok Empty One :: RegExp tok () Check :: (tok -> Bool) -> RegExp tok tok Plus :: RegExp tok a -> RegExp tok b -> RegExp tok (Either a b) Mult :: RegExp tok a -> RegExp tok b -> RegExp tok (a, b) Star :: RegExp tok a -> RegExp tok [a]
data Empty
The intuition is that a RegExp tok output is a regular expression=20 explaining how to parse a list of tok as an output. Here, Zero is the=20 regexp which does not accept anything, One accepts just the empty=20 string, Plus is choice and Mult is sequential composition; Check lets=20 you decide whether you like a single token. Regular expressions may be seen as an extended language of polynomials=20 with tokens for variables; this parser works by repeated application of=20 the remainder theorem.
parse :: RegExp tok x -> [tok] -> Maybe x parse r [] =3D empty r parse r (t : ts) =3D case divide t r of Div q f -> return f `ap` parse q ts
Example *RegExp> parse (Star (Mult (Star (Check (=3D=3D 'a'))) (Star (Check (=3D=3D= =20 'b'))))) "abaabaaabbbb" Just [("a","b"),("aa","b"),("aaa","bbbb")] The 'remainder' explains if a regular expression accepts the empty=20 string, and if so, how. The Star case is a convenient=20 underapproximation, ruling out repeated empty values. =20
empty :: RegExp tok a -> Maybe a empty Zero =3D mzero empty One =3D return () empty (Check _) =3D mzero empty (Plus r1 r2) =3D (return Left `ap` empty r1) `mplus` (return Right `ap` empty r2) empty (Mult r1 r2) =3D return (,) `ap` empty r1 `ap` empty r2 empty (Star _) =3D return []
The 'quotient' explains how to parse the tail of the list, and how to=20 recover the meaning of the whole list from the meaning of the tail.
data Division tok x =3D forall y. Div (RegExp tok y) (y -> x)
Here's how it's done. I didn't expect to need scoped type variables, but=20 I did...
divide :: tok -> RegExp tok x -> Division tok x divide t Zero =3D Div Zero naughtE divide t One =3D Div Zero naughtE divide t (Check p) | p t =3D Div One (const t) | otherwise =3D Div Zero naughtE divide t (Plus (r1 :: RegExp tok a) (r2 :: RegExp tok b)) =3D case (divide t r1, divide t r2) of (Div (q1 :: RegExp tok a') (f1 :: a' -> a), Div (q2 :: RegExp tok b') (f2 :: b' -> b)) -> Div (Plus q1 q2) (f1 +++ f2) divide t (Mult r1 r2) =3D case (empty r1, divide t r1, divide t r2) of (Nothing, Div q1 f1, _) -> Div (Mult q1 r2) (f1 *** id) (Just x1, Div q1 f1, Div q2 f2) -> Div (Plus (Mult q1 r2) q2) (either (f1 *** id) (((,) x1) . f2)) divide t (Star r) =3D case (divide t r) of Div q f -> Div (Mult q (Star r)) (\ (y, xs) -> (f y : xs))
Bureaucracy.
(***) :: (a -> b) -> (c -> d) -> (a, c) -> (b, d) (f *** g) (a, c) =3D (f a, g c)
(+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d (f +++ g) (Left a) =3D Left (f a) (f +++ g) (Right c) =3D Right (g c)
naughtE :: Empty -> x naughtE =3D undefined
It's not the most efficient parser in the world (doing some algebraic=20 simplification on the fly wouldn't hurt), but it shows the sort of stuff=20 you can do. Have fun Conor _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe