
Mitar wrote:
Hi!
On Dec 28, 2007 5:51 PM, Lihn, Steve
wrote: Since regex is involved, it is specific to (Byte)String, not a generic list.
Oh, this gives me an interesting idea: making regular expressions more generic.
The new regex-base API is fairly generic. If you look at the classes in regex-base's Text.Regex.RegexLike: class Extract source => RegexLike regex source where matchAll :: regex -> source -> [MatchArray] matchOnce :: regex -> source -> Maybe MatchArray matchCount :: regex -> source -> Int matchTest :: regex -> source -> Bool matchAllText :: regex -> source -> [MatchText source] matchOnceText :: regex -> source -> Maybe (source, MatchText source, source) you can see that the "regex" type parameter is fully abstract, and that the "source" being searched is also fully abstract. The reason for having those specific class methods is to allow for the instance to expose the most efficient way to do each operation. You could make an instance for string seaching (e.g. KMP or BM searching). Pretty much any "search" or "find" operation could be made into an instance of RegexLike. The main constraint is that the MatchArray/MatchText use Int indexing and the Extract instance wants to be able to do lookup with this: type MatchOffset = Int type MatchLength = Int type MatchArray = Array Int (MatchOffset, MatchLength) type MatchText source = Array Int (source, (MatchOffset, MatchLength)) class Extract source where before :: Int -> source -> source after :: Int -> source -> source empty :: source extract :: (Int, Int) -> source -> source One benefit is that all the RegexContext instances are implemented by using just the above class methods, so all the polymorphic match/matchM will immediately work. If there is ever a strong need for going beyond the range of Int indexing, then one could either make new variants of the classes, or add methods to the existing ones. But if you are searching over 2GB of something, then perhaps have this generic type class API is not the top priority.
Would not it be interesting and useful (but not really efficient) to have patterns something like:
foo :: Eq a => a -> ... foo (_{4}'b') = ...
which would match a list with four elements ending with an element 'b'. Or:
foo (_+';'_+';'_) = ...
which would match a list with embedded two ';' elements. (Last _ matched the remaining of the list.)
OK, maybe guards are not the proper place to implement this as would add a possibility to make a really messy Haskell programs. But extending regular expressions to work on any list of elements with type implementing Eq would be realy powerfull. And then we could use split in many other than just text processing contexts.
Of course, the problem in both cases is implementing something like regular expressions efficiently, especially on lists, but this is why there are smart people around. :-)
Mitar