
Donald Bruce Stewart wrote:
simonmarhaskell:
Chris Kuklewicz wrote:
Your question has prompted me to go back into my PosixRE wrapping code and compare it to the PCRE code. I have made some changes which ought to enhance the performance of the PosixRE code. Let us see the new bechmarks on 10^6 bytes:
PosixRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
real 1m35.429s user 1m17.862s sys 0m1.455s
total is 79.317s
PCRE (102363,["bcdcd","cdc"],["bbccd","bcc"])
real 0m2.570s user 0m1.702s sys 0m0.219s
total is 1.921s So I still don't understand why PCRE should be 40 times faster than PosixRE. Surely this can't be just due to differences in the underlying C library?
It could be. The C regex.h is pretty slow.
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all
-- Don
And I notice c++ (g++) gets away with a 3rd party library from boost:
// This implementation of regexdna does not use the POSIX regex // included with the GNU libc. Instead it uses the Boost C++ libraries // // http://www.boost.org/libs/regex/doc/index.html // // (On Debian: apt-get install libboost-regex-dev before compiling, // and then "g++ -O3 -lboost_regex regexdna.cc -o regexdna // Gentoo seems to package boost as, well, 'boost')
Which is a strange precedent. -- Chris