Re: Announcing regex-tre-0.66 and benchmarks

10 Aug 2006


      Donald Bruce Stewart wrote:
...
simonmarhaskell:
...
Chris Kuklewicz wrote:
...
Your question has prompted me to go back into my PosixRE wrapping code 
and compare it to the PCRE code.  I have made some changes which ought 
to enhance the performance of the PosixRE code.  Let us see the new 
bechmarks on 10^6 bytes:
PosixRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])
real    1m35.429s
user    1m17.862s
sys     0m1.455s
total is 79.317s
PCRE
(102363,["bcdcd","cdc"],["bbccd","bcc"])
real    0m2.570s
user    0m1.702s
sys     0m0.219s
total is 1.921s
So I still don't understand why PCRE should be 40 times faster than 
PosixRE. Surely this can't be just due to differences in the underlying C 
library?
It could be. The C regex.h is pretty slow.
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all
-- Don
And I notice c++ (g++) gets away with a 3rd party library from boost:
...
// This implementation of regexdna does not use the POSIX regex
// included with the GNU libc. Instead it uses the Boost C++ libraries
//
// http://www.boost.org/libs/regex/doc/index.html
//
// (On Debian: apt-get install libboost-regex-dev before compiling,
//  and then "g++ -O3 -lboost_regex regexdna.cc -o regexdna
//  Gentoo seems to package boost as, well, 'boost')
Which is a strange precedent.

-- 
Chris