
Bill Wood wrote:
It would seem that for a regular expression facility to constitute a parser it would have to be able to work on token streams. So my question is, does either the the perl6 generalization or any of the Haskell regex facilities support regular expressions over any free monoid other than finite character sequences?
-- Bill Wood
Currently: The regular expressions themselves are finite, but some Haskell regex-* backend packages can run searches on "infinite" [Char], since they are lazy. This is true of regex-parsec and regex-dfa. In particular they take care not to consume extra Char's from the input stream, though Parsec insists on looking at the next character. The regex-base type class API in Text.Regex.Base.RegexLike does not limit you to any particular source of the regex or any particular type of thing to be searched. There is a limit that comes from using Int as the index of choice. If you are searching something more that 2 billion units long then you would run into API problems. (One could just make a parallel BigRegex type class API and make instances for it for the backends that can handle it. ). Or I may expand it to take any (Num). Hmmm..... Specifically, the RegerMaker is parameterized over "source" and "regex" and so is completely generic. This "source" is what specifies how to build the the compiled "regex" opaque type. Specifically, the Extract is parameterized over "source" (but limits the index to Int). This "source" is the type being searched. Specifically, the RegexLike class is parameterized over "regex" and "source", where "regex" is the supposed to be the opaque compiled type from RegexMaker and "source" is the type being searched. Currently the RegexMaker source can be [Char] or ByteString and the RegexLike/Extract "source" can be [Char] or ByteString. Adding (Data.Sequence Char) would make sense, and perhaps (Data.Sequence Word8) as ASCII. If you write a very generic backend then you may be able to make more generic instances of the API. Note that the instances should be obvious because your generic backend uses a unique opaque "regex" type. Also not that the API is named Regex in several places but there is no need to use anything like a Regex syntax. In fact you could use something other than RegexMaker to create the "regex" type used for specifying the searching/matching. I have not considered it until now, but maybe one could create an instance of RegexLike based around Parsec's GenParser. -- Chris