
I've been using the creation of a regular expression engine as an
ongoing project to learn Haskell. Last night I created the newest
iteration.
My love for the irrefutable pattern can be found in the definition of
the rexn ( apply repetition to extracted nodes ) function below.
/snip
rexn ns pps = let ( ~( xs , rps ) ,
~( ~( nxs ) ,
~( rxs , rrps ) ) ) = ( exn nxs pps ,
case rps of
('?':'?':rr)
-> ( ( ns ) ,
( ns ++ xs , rr ) )
('?':rr)
-> ( ( ns ) ,
( xs ++ ns , rr ) )
/snip
Beautiful, beautiful lazy execution and irrefutable pattern matching.
The data flow is able to easily twist in and out of functions, even as
an argument to the function call that creates it.
Is this kind of usage normal?
Any and all comments, suggestions, refutations, and criticisms are welcome.
-- regular expression engine -- (c) 2008 michael speer
import Char ( isSpace )
-- import Debug.Trace ( trace )
xor :: Bool -> Bool -> Bool
xor True a = not a
xor False a = a
data RxToken = RxStart -- matchable start of target string
| RxChar Char -- a literal character to match
| RxBound -- inserted wherever alphanums touch whitespace
| RxEnd -- matchable end of target string
| RxEOF -- an additional token to push through to
catch anything trying for RxEnd
-- RxEOF is never matched.
deriving ( Show )
rxTokenize tts = RxStart : case tts of
[] -> RxEnd : RxEOF : []
tts@(t:_) -> case not $ isSpace t of
True -> RxBound : rxt tts
False -> rxt tts
where
rxt (t:[]) | not $ isSpace t = RxChar t : RxBound : RxEnd : RxEOF : []
| otherwise = RxChar t : RxEnd : RxEOF : []
rxt (t:ts@(t':_)) | isSpace t `xor` isSpace t' = RxChar t :
RxBound : rxt ts
| otherwise = RxChar t : rxt ts
data RxTransform = RxTransform ( RxNode -> RxToken -> [ RxNode ] )
| RxNullTransform
data RxNode = RxActive { rxTransforms :: [RxTransform] ,
rxMatched :: String ,
rxNumSubs :: Integer ,
rxSubExprs :: [ String ] }
| RxComplete { rxMatched :: String ,
rxNumSubs :: Integer ,
rxSubExprs :: [ String ] }
instance Show RxNode where
show (RxComplete matched _ _) = "