
Hi Bryan, I wrote the current regex API, so your suggestions are interesting to me. The also goes for anyone else's regex API opinions, of course. Bryan O'Sullivan wrote:
Ketil Malde wrote:
Python used to do pretty well here compared to Haskell, with rather efficient hashes and text parsing, although I suspect ByteString IO and other optimizations may have changed that now.
It still does just fine. For typical "munge a file with regexps, lists, and maps" tasks, Python and Perl remain on par with comparably written Haskell. This because the scripting-level code acts as a thin layer of glue around I/O, regexps, lists, and dicts, all of which are written in native code.
The Haskell regexp libraries actually give us something of a leg down with respect to Python and Perl.
True, the pure Haskell library is not as fast as a C library. In particular, the current regex-tdfa handles lazy bytestring in a sub-optimal manner. This may eventually be fixed. But the native code libraries have also been wrapped in the same API, and they are quite fast when combined with strict ByteStrings.
The aggressive use of polymorphism in the return type of (=~) makes it hard to remember which of the possible return types gives me what information. Not only did I write a regexp tutorial to understand the API in the first place, I have to reread it every time I want to match a regexp.
The (=~) operator uses many return types provided by the instances of RegexContext. These are all thin wrappers around the unpolymorphic return types of the RegexLike class. So (=~) could be avoided altogether, or another API created.
A suitable solution would be a return type of RegexpMatch a => Maybe a (to live alongside the existing types, but aiming to become the one that's easy to remember), with appropriate methods on a, but I don't have time to write up a patch.
The (=~~) is the monadic wrapper for (=~) to allow for different failure behaviors. So using (=~~) with Maybe is already possible, and gives Nothing whenever there are zero matches. But more interesting to me is learning what API you would like to see. What would you like the code that uses the API to be? Could you sketch either the definition or usage of your RegexMatch class suggestion? I don't use my own regex API much, so external feedback and ideas would be wonderful. -- Chris