New subject: Regular Expressions without GADTs

19 Oct 2005


      Conor McBride wrote:
...
Neither Oleg nor Bruno translated my code; they threw away my
structurally recursive on-the-fly automaton and wrote combinator parsers
instead. That's why there's no existential, etc. The suggestion that
removing the GADT simplifies the code would be better substantiated if
like was compared with like.
...
...
I'm sure the program I actually wrote can be expressed with the type
class trick, just by cutting up my functions and pasting the bits into
individual instances; the use of the existential is still available. I
don't immediately see how to code this up in Bruno's style, but that
doesn't mean it can't be done. Still, it might be worth comparing like
with like.
Please see the enclosed code. It is still in Haskell98 -- and works in
Hugs.
...
I suspect that once you start producing values with the
relevant properties (as I do) rather than just consuming them (as Oleg
and Bruno do), the GADT method might work out a little neater.
Actually, the code is pretty much your original code, with downcased
identifiers. It faithfully implements that parser division
approach. Still, there are no existentials. I wouldn't say that GADT
code is so much different. Perhaps the code below is a bit neater due
to the absence of existentials, `case' statements, and local type
annotations.

{-- Haskell98! --}

module RegExps where

import Monad

newtype Zero = Zero Zero -- Zero type in Haskell 98

-- Bruno.Oliveira's type class
class RegExp g where
    zero   :: g tok Zero
    one    :: g tok ()
    check  :: (tok -> Bool) -> g tok tok
    plus   :: g tok a -> g tok b -> g tok (Either a b)
    mult   :: g tok a -> g tok b -> g tok (a,b)
    star   :: g tok a -> g tok [a]

data Parse tok t = Parse { empty :: Maybe t
			 , divide :: tok -> Parse tok t}


parse :: Parse tok a -> [tok] -> Maybe a
parse p [] = empty p
parse p (t:ts) = parse (divide p t) ts

liftP f p = Parse{empty  = liftM f (empty p),
		  divide = \tok -> liftP f (divide p tok)}

liftP2 mf p1 p2 = Parse{empty = mf (empty p1) (empty p2),
			divide = \tok -> liftP2 mf (divide p1 tok) 
		                                   (divide p2 tok)}

lsum x y  = (liftM Left x) `mplus` (liftM Right y)
lprod x y = liftM2 (,) x y

-- Conor McBride's parser division algorithm

instance RegExp Parse  where
    zero        = Parse mzero (\_ -> zero)
    one         = Parse (return ()) (\_ -> liftP (const ()) zero)
    check p     = Parse mzero (\t -> if p t 
			                then liftP (const t) one
			                else liftP (const t) zero)

    plus r1 r2  = Parse (lsum (empty r1) (empty r2))
		        (\t -> plus (divide r1 t) (divide r2 t))


    mult r1 r2  = Parse (lprod (empty r1) (empty r2))
                        (\t -> let (q1,q2) = (divide r1 t, divide r2 t)
			           lp x1 = liftP (either id ((,) x1))
			       in maybe 
			           (mult q1 r2)
			           (\x1 -> lp x1 (plus (mult q1 r2) q2))
				   (empty r1))

    star r      = Parse (return [])
		        (\t-> liftP (uncurry (:)) (mult (divide r t) (star r)))

p1 :: RegExp g => g Char ([Char], [Char])
p1 = mult (star (check (== 'a'))) (star (check (== 'b')))

testp = parse (star (mult (star (check (== 'a'))) (star (check (== 'b')))))
        "abaabaaabbbb"
{-
*RX> testp
Just [("a","b"),("aa","b"),("aaa","bbbb")]

-}

testc = parse (star one) "abracadabra"


-- Parsing the list of integers

ieven = even :: Int->Bool
iodd  = odd  :: Int->Bool

p2 :: RegExp g => g Int (Either (Int, (Int, [Int])) (Int, [Int]))
p2 = plus (mult (check iodd) (mult (check iodd) (star (check ieven))))
	    (mult (check ieven) (star (check iodd)))

test2 = parse (star p2) [1::Int,1,2,3,3,4]

Re: [Haskell-cafe] Regular Expressions without GADTs

oleg＠pobox.com

Conor McBride

Ross Paterson

Ross Paterson

Conor McBride

tags

participants (4)