Re: Announcing regex-tre-0.66 and benchmarks

9 Aug 2006

      Simon Marlow wrote:
...
On 09 August 2006 15:14, Chris Kuklewicz wrote:
...
For 10^5 characters on String:
PCRE       0.077s
DFA        0.131s
TRE        0.206s
PosixRE    0.445s
Parsec     0.825s
Old Posix 43.760s  (Text.Regex using splitRegex)
Old Text.Regex took 43.76 seconds on 10^5 characters to do a task
comparable to the one the others did (it ran splitRegex).  The new
PosixRE wrapping took 0.445 seconds instead.  Yes it is two orders of
magnitude faster, and this is because my wrapping only marshals the
String to CString once.  Laziness cannot be worth 2 orders of
magnitude of runtime.  This is why we needed a new wrapping, which has
grown into the new library.
Right, I see the problem with Text.Regex.splitRegex, it repeatedly packs
the String into a CString.  But then why this result:
...
BenchPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
total is 1.294s
.. etc. ... 
BenchPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
total is 91.435s
Was this the old Posix, or your new one?  If the new one, why is it so
slow compared to the others?
Cheers,
  Simon
Your question has prompted me to go back into my PosixRE wrapping code and 
compare it to the PCRE code.  I have made some changes which ought to enhance 
the performance of the PosixRE code.  Let us see the new bechmarks on 10^6 bytes:

PosixRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])

real    1m35.429s
user    1m17.862s
sys     0m1.455s

total is 79.317s

PCRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])

real    0m2.570s
user    0m1.702s
sys     0m0.219s

total is 1.921s

BenchBSPosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])

real    1m32.267s
user    1m16.494s
sys     0m1.374s

total is 77.868s

BenchBSPCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])

real    0m1.245s
user    0m0.809s
sys     0m0.110s

total is 0.919s

So there was only a little improvement to the previous PosixRE speed.  If you 
want to look at the code, it is in the three Wrap.hsc for regex-posix and 
regex-tre and regex-pcre for the wrapMatchAll functions.  But it appears to be a 
library issue, not a Haskell issue.

I will tend to the Haddock cleanup next.

-- 
Chris