
Hi,
I don't see how fallback to NFA simulation is really a failure wrt DoS
attacks. It's not exponential in time or memory, just linear memory
(in size of regex) instead of constant, and slower than DFA.
On Wed, Sep 14, 2011 at 1:29 AM, Roman Cheplyaka
* Chris Kuklewicz
[2011-09-13 08:21:55+0100] I wrote regex-tdfa, the efficient (though not yet lightning fast) Posix-like engine.
You are not the first to want an efficient Perl-like engine. The answer you seek flows from Russ Cox (though in C++):
http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-reg...
Quoting relevant bit:
It also finds the leftmost-first match, the same match that Perl would, and can return submatch information. The one significant exception is that RE2 drops support for backreferences¹ and generalized zero-width assertions, because they cannot be implemented efficiently.
Hi Chris, thanks for the response.
There's one thing about Cox's work that I don't understand. On the page [1] he mentions that the algorithm falls back to direct NFA simulation (search for "If all else fails" on that page). This defeats his goal of defending against DoS attacks (the second paragraph of that page).
And I wouldn't be comfortable with an algorithm that is worst-case exponential either.
Then there's another issue: I specifically want a combinator library, and not every automaton-based algorithm can serve this purpose in a statically typed language (unless there's some clever trick I don't know).
[1]: http://swtch.com/~rsc/regexp/regexp3.html
-- Roman I. Cheplyaka :: http://ro-che.info/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/