
John Meacham wrote:
On Fri, Jul 14, 2006 at 05:05:50PM +0100, Chris Kuklewicz wrote:
As a user, the JRegex API can also only support a single Regex type and a single backend. But it would be really handy to be able to use different types of regular expressions. Mainly there are going to be different regex syntax possibilities:
This isn't true, the API is a class, you can create as many instances as you like for it. In fact, it comes with at least 2 back ends, and at least a couple different instances for the regex syntax. It was specifically designed as a framework for many regular expression backends to be used via a common and useful interface.
John
JRegex does require the source too be a list [x]:
class RegexContext x a where (=~) :: RegexLike r x => [x] -> r -> a (=~~) :: (Monad m, RegexLike r x) => [x] -> r -> m a
class RegexLike r a | r -> a where matchTest :: r -> [a] -> Bool matchCount :: r -> [a] -> Int matchAll :: r -> [a] -> [(Array Int (Int,Int))] matchOnce :: r -> [a] -> Bool -> Maybe (Array Int (Int,Int))
The List requirement precludes a ByteString instance. The functional dependency "r->a" also prevents mixing different backends with different data source types. The Bool parameter to matchOnce is there so matchAll can be implemented in terms of matchOnce, exploiting the fact that the source data type is a list. (Though this is not very optimal compared to a specialized matchAll). I am done rewriting the Posix regex and PCRE code with both String and ByteString as instances. The latest type classes (from today) look like:
type MatchArray = Array Int (Int,Int) -- (starting index,length)
class (RegexOptions regex compOpt execOpt) => RegexMaker regex source where makeRegex :: source -> regex makeRegexOpts :: compOpt -> execOpt -> source -> regex
class RegexLike regex source where matchAll :: regex -> source -> [MatchArray] matchCount :: regex -> source -> Int matchOnce :: regex -> source -> Maybe MatchArray matchTest :: regex -> source -> Bool matchTest regex source = isJust (matchOnce r s) matchCount regex source = length (matchAll r s)
I have omitted the RegexOptions class for space ( The job of the "Bool" to matchOnce is subsumed by the more general execOpt handling ). Clearly I have taken the names and most of the types from JRegex. I don't have the cool polymorphic RegexContext yet, but that is the next step. Once the code stabilizes at all, I will post a link to the development darcs address. The flexibility of source data type and backend is provided by making WrapPosix.hsc and WrapPRCE.hsc modules that expect a source type of CString/CStringLen and are comprehensive enough so that the four files (Byte)?String(Posix|PCRE) are just .hs files (no -cpp needed) instead of a .hsc files. And these four use optimized routines for match(All|Count|Once|Test) instead of using either of the defaults. The next backend to make instances for will be my "Text.Regex.Lazy" one based on Parsec. Then I will have 3 backends and two data source types, making 6 combinations. For example: I can compile a String regex as a PCRE and match that against a ByteString and against a String. The type of the regex source and the type of the data are separate. And I can make a new Regex from an old one with different execution options. -- Chris