
Hello. What is the fastest regex package of the multitude of packages present on Hackage (namely, PCRE, PCRE-light, DFA, TDFA and many more)? My benchmark (parsing a huge logfile with a regex like "GET /foo.xml.*fooid=([0-9]++).*barid=([0-9]++)") shows that plain PCRE is the fastest one (I tried PCRE, PCRE-light and TDFA; DFA can't do capturing groups at all, TDFA was abysmally slow (about 20x slower than PCRE), and it doesn't support ++), but maybe have I missed any blazing-fast package? Even PCRE seems somewhat too slow to me - matching this regex eats 50% of the program time; considering that the only things it does is ungzipping the log and searching for the regex, this seems like far too much, the regex is not that complex to my mind. I tried to replace this regex by Data.ByteString.Search.KnuthMorrisPratt and BoyerMoore from stringsearch, but they work even twice slower. I also noticed that PCRE does about 2x fewer allocations than PCRE-Light, but is not much faster. However, PCRE, strangely, has a non-pure interface, I had to use it with unsafePerformIO. It's a pity I can't put these findings on haskellwiki to http://www.haskell.org/haskellwiki/Regular_expressions since I don't have an account and new accounts can't be created right now. Could anyone put them there? All in all, my question remains: what is the fastest way to do this kind of parsing on a lazy bytestring?