
Simon Marlow wrote:
On 09 August 2006 15:14, Chris Kuklewicz wrote:
For 10^5 characters on String: PCRE 0.077s DFA 0.131s TRE 0.206s PosixRE 0.445s Parsec 0.825s Old Posix 43.760s (Text.Regex using splitRegex)
Old Text.Regex took 43.76 seconds on 10^5 characters to do a task comparable to the one the others did (it ran splitRegex). The new PosixRE wrapping took 0.445 seconds instead. Yes it is two orders of magnitude faster, and this is because my wrapping only marshals the String to CString once. Laziness cannot be worth 2 orders of magnitude of runtime. This is why we needed a new wrapping, which has grown into the new library.
Right, I see the problem with Text.Regex.splitRegex, it repeatedly packs the String into a CString. But then why this result:
BenchPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) total is 1.294s .. etc. ... BenchPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) total is 91.435s
Was this the old Posix, or your new one? If the new one, why is it so slow compared to the others?
Cheers, Simon
Your question has prompted me to go back into my PosixRE wrapping code and compare it to the PCRE code. I have made some changes which ought to enhance the performance of the PosixRE code. Let us see the new bechmarks on 10^6 bytes: PosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) real 1m35.429s user 1m17.862s sys 0m1.455s total is 79.317s PCRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) real 0m2.570s user 0m1.702s sys 0m0.219s total is 1.921s BenchBSPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) real 1m32.267s user 1m16.494s sys 0m1.374s total is 77.868s BenchBSPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"]) real 0m1.245s user 0m0.809s sys 0m0.110s total is 0.919s So there was only a little improvement to the previous PosixRE speed. If you want to look at the code, it is in the three Wrap.hsc for regex-posix and regex-tre and regex-pcre for the wrapMatchAll functions. But it appears to be a library issue, not a Haskell issue. I will tend to the Haddock cleanup next. -- Chris