Irrefutable pattern love and new regex engine.

I've been using the creation of a regular expression engine as an
ongoing project to learn Haskell. Last night I created the newest
iteration.
My love for the irrefutable pattern can be found in the definition of
the rexn ( apply repetition to extracted nodes ) function below.
/snip
rexn ns pps = let ( ~( xs , rps ) ,
~( ~( nxs ) ,
~( rxs , rrps ) ) ) = ( exn nxs pps ,
case rps of
('?':'?':rr)
-> ( ( ns ) ,
( ns ++ xs , rr ) )
('?':rr)
-> ( ( ns ) ,
( xs ++ ns , rr ) )
/snip
Beautiful, beautiful lazy execution and irrefutable pattern matching.
The data flow is able to easily twist in and out of functions, even as
an argument to the function call that creates it.
Is this kind of usage normal?
Any and all comments, suggestions, refutations, and criticisms are welcome.
-- regular expression engine -- (c) 2008 michael speer
import Char ( isSpace )
-- import Debug.Trace ( trace )
xor :: Bool -> Bool -> Bool
xor True a = not a
xor False a = a
data RxToken = RxStart -- matchable start of target string
| RxChar Char -- a literal character to match
| RxBound -- inserted wherever alphanums touch whitespace
| RxEnd -- matchable end of target string
| RxEOF -- an additional token to push through to
catch anything trying for RxEnd
-- RxEOF is never matched.
deriving ( Show )
rxTokenize tts = RxStart : case tts of
[] -> RxEnd : RxEOF : []
tts@(t:_) -> case not $ isSpace t of
True -> RxBound : rxt tts
False -> rxt tts
where
rxt (t:[]) | not $ isSpace t = RxChar t : RxBound : RxEnd : RxEOF : []
| otherwise = RxChar t : RxEnd : RxEOF : []
rxt (t:ts@(t':_)) | isSpace t `xor` isSpace t' = RxChar t :
RxBound : rxt ts
| otherwise = RxChar t : rxt ts
data RxTransform = RxTransform ( RxNode -> RxToken -> [ RxNode ] )
| RxNullTransform
data RxNode = RxActive { rxTransforms :: [RxTransform] ,
rxMatched :: String ,
rxNumSubs :: Integer ,
rxSubExprs :: [ String ] }
| RxComplete { rxMatched :: String ,
rxNumSubs :: Integer ,
rxSubExprs :: [ String ] }
instance Show RxNode where
show (RxComplete matched _ _) = "

On Tue, 2008-01-22 at 11:55 -0500, Michael Speer wrote:
I've been using the creation of a regular expression engine as an ongoing project to learn Haskell. Last night I created the newest iteration.
My love for the irrefutable pattern can be found in the definition of the rexn ( apply repetition to extracted nodes ) function below. /snip rexn ns pps = let ( ~( xs , rps ) , ~( ~( nxs ) , ~( rxs , rrps ) ) ) = ( exn nxs pps ,
~nxs is redundant.
Beautiful, beautiful lazy execution and irrefutable pattern matching. The data flow is able to easily twist in and out of functions, even as an argument to the function call that creates it.
Is this kind of usage normal?
No. Irrefutable patterns are usually the right thing to use in circular programs, but usually you only need a few well-placed irrefutable patterns. I suspect most (all?) of yours are unnecessary. The use of tuples also feels a bit excessive as an impression. I'd have to really look to decide, but it gives me the feeling that a data structure should be being used somewhere. Your many unnecessary parentheses bothers me, but I tend to dislike any unnecessary ones except where there might reasonably be some ambiguity while reading. I also recommend trying a different way of laying out your code as most of it is wrapped below and most of -that- is due to it being excessively indented. Perhaps pull some of the lambdas into functions.
Any and all comments, suggestions, refutations, and criticisms are welcome.
-- regular expression engine -- (c) 2008 michael speer
import Char ( isSpace ) -- import Debug.Trace ( trace )
xor :: Bool -> Bool -> Bool xor True a = not a xor False a = a
data RxToken = RxStart -- matchable start of target string | RxChar Char -- a literal character to match | RxBound -- inserted wherever alphanums touch whitespace | RxEnd -- matchable end of target string | RxEOF -- an additional token to push through to catch anything trying for RxEnd -- RxEOF is never matched. deriving ( Show )
rxTokenize tts = RxStart : case tts of [] -> RxEnd : RxEOF : [] tts@(t:_) -> case not $ isSpace t of True -> RxBound : rxt tts False -> rxt tts where rxt (t:[]) | not $ isSpace t = RxChar t : RxBound : RxEnd : RxEOF : [] | otherwise = RxChar t : RxEnd : RxEOF : [] rxt (t:ts@(t':_)) | isSpace t `xor` isSpace t' = RxChar t : RxBound : rxt ts | otherwise = RxChar t : rxt ts
data RxTransform = RxTransform ( RxNode -> RxToken -> [ RxNode ] ) | RxNullTransform
data RxNode = RxActive { rxTransforms :: [RxTransform] , rxMatched :: String , rxNumSubs :: Integer , rxSubExprs :: [ String ] } | RxComplete { rxMatched :: String , rxNumSubs :: Integer , rxSubExprs :: [ String ] }
instance Show RxNode where show (RxComplete matched _ _) = "
" data RxDepth = RxTop | RxSub deriving ( Show )
rxCompile pps = let ( xs , rps ) = oexn [success] RxTop pps in case length rps of 0 -> RxActive { rxTransforms = xs , rxMatched = [] , rxNumSubs = 0 , rxSubExprs = [] } _ -> error $ "Not all of pattern consumed : remains : " ++ rps where -- or together different expression sections -- (a|b|c) oexn ns RxTop [] = ( ns , [] ) oexn _ RxSub [] = error "Pattern ended while still in sub expression" oexn ns d pps = let ( ~( xs , rps ) , ~( nxs , nrps ) ) = ( aexn ns pps , case rps of ('|':rr) -> let ( inxs , irps ) = oexn ns d rr in ( xs ++ inxs , irps ) (')':rr) -> case d of
RxTop -> error "Erroneous close parenthesis in pattern "
RxSub -> ( xs , rr ) [] -> case d of
RxTop -> ( xs , [] )
RxSub -> error "End of pattern while still in sub expression" ) in ( nxs , nrps ) -- and together extracted nodes in a given expression segment -- abd?dfs aexn ns pps = let ( ~( xs , rps ) , ~( nxs , nrps ) ) = ( rexn nxs pps , case rps of ('|':_) -> ( ns , rps ) (')':_) -> ( ns , rps ) [] -> ( ns , rps ) _ -> aexn ns rps ) in ( xs , nrps ) -- replication application - weee! rexn ns pps = let ( ~( xs , rps ) , ~( ~( nxs ) , ~( rxs , rrps ) ) ) = ( exn nxs pps , case rps of ('?':'?':rr) -> ( ( ns ) ,
( ns ++ xs , rr ) ) ('?':rr) -> ( ( ns ) ,
( xs ++ ns , rr ) ) ('*':'?':rr) -> ( ( ns ++ xs ) ,
( ns ++ xs , rr ) ) ('*':rr) -> ( ( xs ++ ns ) ,
( xs ++ ns , rr ) ) ('+':'?':rr) -> ( ( ns ++ xs ) ,
( xs , rr ) ) ('+':rr) -> ( ( xs ++ ns ) ,
( xs , rr ) ) _ -> ( ( ns ) ,
( xs , rps ) ) ) in ( rxs , rrps ) -- extract node ( including an entire subexpression as a single node ) exn _ ('?':_) = error "Bad question mark operator" exn _ ('*':_) = error "Bad splat operator" exn _ ('+':_) = error "Bad plus sign operator" exn ns ('(':ps) = oexn ns RxSub ps exn ns (p:ps) = ( [ RxTransform ( \rxn k -> case k of (RxChar c) -> if c == p then
[ RxActive { rxTransforms = ns ,
rxMatched = c : rxMatched rxn ,
rxNumSubs = rxNumSubs rxn ,
rxSubExprs = rxSubExprs rxn } ] else [] (RxStart) -> [rxn] (RxBound) -> [rxn] _ -> [] ) ] , ps )
exn ns [] = error "can this be reached?"
success = RxTransform ( \ rxn k -> [ RxComplete { rxMatched = reverse $ rxMatched rxn , rxNumSubs = rxNumSubs rxn , rxSubExprs = map reverse $ rxSubExprs rxn } ] )
rxExec n tts = iexec [n] (rxTokenize tts) where iexec (win@(RxComplete _ _ _ ):_) _ = Just win iexec [] _ = Nothing iexec nns (k:ks) = iexec ( concatMap ( \n -> case n of
(RxComplete _ _ _) -> [n]
a@(RxActive _ _ _ _) -> concatMap (\xf -> case xf of
(RxTransform f) -> f a k
(RxNullTransform) -> []
) (rxTransforms a) ) nns ) ks
main = do print $ rxTokenize "this is a test" print $ rxExec (rxCompile "hello|world") "hello" print $ rxExec (rxCompile "hello|world") "world" print $ rxExec (rxCompile "abcde|ab") "abcd" print $ rxExec (rxCompile "ab?c") "ac" print $ rxExec (rxCompile "ab?c") "abc" print $ rxExec (rxCompile "ab*c") "ac" print $ rxExec (rxCompile "ab*c") "abc" print $ rxExec (rxCompile "ab*c") "abbbc" print $ rxExec (rxCompile "ab+c") "ac" print $ rxExec (rxCompile "ab+c") "abc" print $ rxExec (rxCompile "ab+c") "abbbbbc" print $ rxExec (rxCompile "(a|b)+") "aaabbbabaaababaaabbbabbaba" print $ rxExec (rxCompile "abc|") "zyx" _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Jan 22, 2008 11:59 AM, Derek Elkins
On Tue, 2008-01-22 at 11:55 -0500, Michael Speer wrote:
I've been using the creation of a regular expression engine as an ongoing project to learn Haskell. Last night I created the newest iteration.
My love for the irrefutable pattern can be found in the definition of the rexn ( apply repetition to extracted nodes ) function below. /snip rexn ns pps = let ( ~( xs , rps ) , ~( ~( nxs ) , ~( rxs , rrps ) ) ) = ( exn nxs pps ,
~nxs is redundant.
Beautiful, beautiful lazy execution and irrefutable pattern matching. The data flow is able to easily twist in and out of functions, even as an argument to the function call that creates it.
Is this kind of usage normal?
No. Irrefutable patterns are usually the right thing to use in circular programs, but usually you only need a few well-placed irrefutable patterns. I suspect most (all?) of yours are unnecessary. The use of
You're absolutely right. Thank you. I am learning this on my own and had originally figured out how to use them by nesting like that. I had written something like this without them and got it to work on discovering and adding them. Not one of the lazy marks was required in the current version. The implicit lazy bindings carried everything through fine. Perhaps it was the nested tuples that required them in the other version. It would smash the stack without them.
tuples also feels a bit excessive as an impression. I'd have to really look to decide, but it gives me the feeling that a data structure should be being used somewhere. Your many unnecessary parentheses bothers me, but I tend to dislike any unnecessary ones except where there might reasonably be some ambiguity while reading. I also recommend trying a different way of laying out your code as most of it is wrapped below and most of -that- is due to it being excessively indented. Perhaps pull some of the lambdas into functions.
I didn't consider the wrapping when I posted. It is easier read where I posted it in my blog : http://michaelspeer.blogspot.com/2008/01/irrefutable-pattern-love.html Thanks.

On Tue, 2008-01-22 at 11:55 -0500, Michael Speer wrote:
rexn ns pps = let ( ~( xs , rps ) , ~( ~( nxs ) , ~( rxs , rrps ) ) ) = ( exn nxs pps ,
Not one of the lazy marks was required in the current version.
Pattern bindings via let are always irrefutable; the ~s here are all redundant. This took me a little while to figure out; I was browsing Control.Monad.State.Lazy and trying to figure out how it was different than Control.Monad.State.Strict; they look almost exactly the same. But then I noticed code that looked like the following: (strict) m >>= f = State $ \s -> case runState m s of (a, s') -> runState (f a) s' (lazy) m >>= f = State $ \s -> let (a, s') = runState m s in runState (f a) s' The strict code deconstructs the pair in a case expression, so "runState m s" will always get evaluated at least enough to know that it is going to return a pair and not _|_. The lazy code immediately calls (f a) and m may not get evaluated at all if f is lazy and doesn't access the state. -- ryan

On Tue, Jan 22, 2008 at 02:09:05PM -0800, Ryan Ingram wrote:
On Tue, 2008-01-22 at 11:55 -0500, Michael Speer wrote:
rexn ns pps = let ( ~( xs , rps ) , ~( ~( nxs ) , ~( rxs , rrps ) ) ) = ( exn nxs pps ,
Not one of the lazy marks was required in the current version.
Pattern bindings via let are always irrefutable; the ~s here are all redundant.
False; let's irrefutability only applies at the topmost level. Prelude> let (a,b) = undefined in 2 2 Prelude> let (a,(b,c)) = (2,undefined) in a *** Exception: Prelude.undefined Prelude> let (a,~(b,c)) = (2,undefined) in a 2 Stefan
participants (4)
-
Derek Elkins
-
Michael Speer
-
Ryan Ingram
-
Stefan O'Rear