regex and Regular Expressions Libraries

Dear Haskell Cafe, With the regex announcement, I wanted to try get other folks' perspective on what has been happening with the Haskell regular expression libraries. I have been in turn impressed at how good the engineering of the Regex packages while gobsmacked by how difficult the traditional Text.Regex API is to use. In this blog post, http://engineers.irisconnect.net/posts/2017-03-07-regex.html I rather cheekily speculate that the Haskellers perhaps have been a bit disdainful of regular expressions (not important in a language capable of doing proper parsing, etc.). What do you think? Chris

On Thu, Mar 9, 2017 at 2:06 PM, Chris Dornan
In this blog post,
http://engineers.irisconnect.net/posts/2017-03-07-regex.html
I rather cheekily speculate that the Haskellers perhaps have been a bit disdainful of regular expressions (not important in a language capable of doing proper parsing, etc.).
What do you think?
I've voiced that opinion in #haskell a few times, that the API's designed to scare people toward parsers. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

That's always been my canonical example about how you can overuse
typeclasses to make a simple job into an impenetrable documentation
hunt. Long ago I wrapped it in a simple API and have always used
that. Much later, pcre-heavy showed up and I just switched my wrapper
to use that, since it was still a little too typeclass happy for my
taste.
On Thu, Mar 9, 2017 at 11:06 AM, Chris Dornan
Dear Haskell Cafe,
With the regex announcement, I wanted to try get other folks' perspective on what has been happening with the Haskell regular expression libraries.
I have been in turn impressed at how good the engineering of the Regex packages while gobsmacked by how difficult the traditional Text.Regex API is to use.
In this blog post,
http://engineers.irisconnect.net/posts/2017-03-07-regex.html
I rather cheekily speculate that the Haskellers perhaps have been a bit disdainful of regular expressions (not important in a language capable of doing proper parsing, etc.).
What do you think?
Chris
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hi Evan,
By the sounds of it regex should help with this – each match operator being
available in an un-overloaded format. Does this API work for you?
Chris
On 2017-03-10, 05:12, "Evan Laforge"

On Fri, Mar 10, 2017 at 2:24 AM, Chris Dornan
By the sounds of it regex should help with this – each match operator being available in an un-overloaded format. Does this API work for you?
I looked at the tutorial and.... maybe not so much? I hardcode to Text + PCRE since that's all I need, but that combination seems to be unsupported. As a light user of regexes, I won't remember much of the API between uses, so I'm just looking to find the 'Regex -> Text -> Bool' function as fast as possible, and a bunch of polymorphic operators I'll never remember would just get in the way. Also for the same reason I'd be worried about any deviation from "standard" PCRE, e.g. $(..) for groups. However, I'm a lightweight user, so don't take me too seriously, and I made my own tiny little bikeshed anyway. Which is to say don't let me rain on your parade :) For what it's worth, I mostly used regexes in python, and it gets along fine with hardcoded Text + PCRE, no operators, and basically three functions: match, get groups, and substitute groups. So it's no surprise my wrapper basically looks like that: compileOptions :: [Option] -> String -> Either String Regex matches :: Regex -> Text -> Bool -- | Return (complete_match, [group_match]). groups :: Regex -> Text -> [(Text, [Text])] -- | Half-open ranges of where the regex matches. groupRanges :: Regex -> Text -> [((Int, Int), [(Int, Int)])] -- ^ (entire, [group]) substitute :: Regex -> (Text -> [Text] -> Text) -- ^ (complete_match -> groups -> replacement) -> Text -> Text I also added a Show instance that shows the regex rather than hex and the mysteriously missing: -- | Escape a string so the regex matches it literally. escape :: String -> String The QuasiQuote stuff seems neat, but I'm sort of scared of TH, and if the regex gets complicated enough that would make it worth it, I probably already switched to a parser. Or I get regexes from user input because of how succinct they are and that's runtime anyway.

Thanks Evan,
That feedback is really valuable and I understand why you would have no reason
to switch to regex.
On the use of ‘$’, as far as I know this extension will not clash with any of
the PCRE extensions (if anybody knows of any problems please give me shout),
though for sure you will have to fill out the numbers when converting between
the two text-replacement schemes.
As for Text and PCRE – that is on the top my list but it will need some
coordination with the upstream regex-pcre maintainers.
I do have escape functions though I haven’t included them in the tutorial yet.
Being able to recover the text of the REs would be great and I would like to include
it in a future release, but again that will need some coordination with the regex-base
maintainers.
I will raise those issues.
Fantastic feedback!
Cheers,
Chris
On 10/03/2017, 17:08, "Evan Laforge"

On Fri, Mar 10, 2017 at 9:30 AM, Chris Dornan
On the use of ‘$’, as far as I know this extension will not clash with any of the PCRE extensions (if anybody knows of any problems please give me shout), though for sure you will have to fill out the numbers when converting between the two text-replacement schemes.
Oh ok, I was worried that ()s would become non-capturing and you'd have to use $() to capture. I think your scheme with $() for groups and replacement is actually nicer than the traditional (xyz) and (?:xyz) and \# for replacement, but you know tradition hangs heavy on the minds of us regex cargo-culters :)
As for Text and PCRE – that is on the top my list but it will need some coordination with the upstream regex-pcre maintainers.
Nowadays I inherit that from pcre-heavy, but of course if you're already on another backend then maybe not so simple. libpcre takes bytestrings, so I'll bet the "Text interface" amounts to sticking ". encodeUtf8" on the front and turning on the UTF8 flag.

I must correct myself. I said:
Being able to recover the text of the REs would be great and I would like to include it in a future release, but again that will need some coordination with the regex-base maintainers.
Sorry, that isn’t right at all. Regex already allows you to recover the text from a compiled RE via the reSource function: reSource :: RE -> String Evans said:
Nowadays I inherit that from pcre-heavy, but of course if you're already on another backend then maybe not so simple.
Yes, regex is built on top of regex-base and the regex-tdfa + regex-pcre back ends.
but you know tradition hangs heavy
Indeed so! Chris

Hi Chris,
The combination of parser combinators and the old regex libraries have
meant I've avoided regexes as much as possible.
regex looks very promising so I'm sure to try it out next time i need
something similar!
Will you add regex to stackage? Having to muck around with extra-deps gets
tiring. I would also like to see Text support, but having to pack/unpack
wouldn't be a dealbreaker for me.
Have you considered doing anything fancy to make capture groups safer to
use? If i could get a compile error when i'm using the wrong number/wrongly
named groups I'd be very excited.
Cheers,
Adam
On Fri, 10 Mar 2017 at 19:06 Chris Dornan
I must correct myself. I said:
Being able to recover the text of the REs would be great and I would like to include it in a future release, but again that will need some coordination with the regex-base maintainers.
Sorry, that isn’t right at all. Regex already allows you to recover the text from a compiled RE via the reSource function:
reSource :: RE -> String
Evans said:
Nowadays I inherit that from pcre-heavy, but of course if you're already on another backend then maybe not so simple.
Yes, regex is built on top of regex-base and the regex-tdfa + regex-pcre back ends.
but you know tradition hangs heavy
Indeed so!
Chris
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Adam Bergmark sez:
Will you add regex to stackage?
Absolutely – on my list for this weekend.
Have you considered doing anything fancy to make capture groups safer to use? If i could get a compile error when i'm using the wrong number/wrongly named groups I'd be very excited.
I totally agree! The only reason this has not been done is because it is not easy to do with the current structure of regex and the way it fits into regex-base. I am open to suggestions though – just opened an issue for it https://github.com/iconnect/regex/issues/60. Cheers, Chris
participants (4)
-
Adam Bergmark
-
Brandon Allbery
-
Chris Dornan
-
Evan Laforge