
At 20:40 21/10/03 -0400, ajb@spamcop.net wrote:
I've done it for regular expressions (e.g. lex, alex etc), but not for regexps (e.g. Text.Regex). This particular terminology overload annoys me no end, by the way.
I checked the code into haskell-libs. It hasn't propagated through to the web view of the repository yet, but you should eventually be able to find it here:
http://cvs.sourceforge.net/viewcvs.py/haskell-libs/libs/text/
Look for Dfa.lhs.
Thanks for this... it turns out I have an immediate use for it. I've downloaded it but I'm having a little trouble figuring out how to drive it. Is there any description anywhere? Absent that, this is what I am figuring (am I getting this right?): + The supplied regex is a value of type Re t, which describes an expression to match a sequence of tokens of type t. In common use, t might be Char. + The constructors for Re are: ReOr [Re t] -- matches any one of the Re's in the list | ReCat [Re t] -- matches a sequence of the RE's in the list | ReStar (Re t) -- matches zero or more of the given Re | RePlus (Re t) -- matches one or more of the given Re | ReOpt (Re t) -- matches zero or one of the given Re | ReTerm [t] -- matches a sequence of tokens exactly matching the given list. That last is very much guesswork: is it true that ReTerm ts = ReCat $ map (\t ReTerm [t]) ts ? A function to construct an (Re Char) from a simple textual representation would be handy. Maybe I'll tackle that. ... Having got an Re value, matchRe is a function that applies it (after compilation) to a sequence of 't', returning True if the expression is matched, otherwise False. Is this about right? There's another function matchRe2, which seems to do something recursive with the regular expression but I can't figure out what. Is it just a different implementation strategy? It does seem to give the same answers. ... I also noticed this comment in the code: [[ Utility typeclasses for enforcing all the constraints we need on our monad's free type variables. Note that this requires both -fglasgow-exts and -fallow-undecidable-instances in GHC to work properly. ]] I was wondering if the use of -fallow-undecidable-instances might cause problems with Hugs, though it does seem to work. I'm also wondering if there's a way to use this to match a leading subsequence (rather than the entire sequence of tokens supplied), and to discover which parts of the regexp have been matched. ... [later] I've also looked at the Haskell Dynamic Lexer Engine (http://www.nondot.org/sabre/Projects/HaskellLexer/) that Derek pointed out ... it looks a closer fit to my current goals, though I do like the "purity" of your approach (i.e. its focus on the core engine). #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact