Re: [Haskell-cafe] Efficient parallel regular expressions

6 Nov 2008

      Mitchell, Neil wrote:
...
Hi Martijn,
It's not that tricky if you do a regular expression state machine
yourself, but that's probably a bit too much work. One way to get
a speed up might be to take the regular expressions a,b,c,d and
generate a regex a+b+c+d, and one a+b. You can then check any string
s against a+b+c+d, if that matches check a+b, if that matches check
a. At each stage you eliminate half the regular expressions, which
means a match will take log n, where n is the number of regular
expressions.
This assumes the underlying regular expression engine constructs a
finite state machine, making it O(m) where m is the length of the
string to match.
If you're implementing the machine yourself, you can implement it as 
trie automaton which has some "value" associated with each final state. 
That is, rather than just accepting or rejecting, when the automaton 
accepts it returns the value associated with the particular final state 
that it accepted by. This is a trivial extension on DFA/NFA 
implementations and is perfectly suited to the problem of combining 
multiple linear tries (sic: regexes) into a single machine. The 
minimization algorithms are a bit more complex than for DFA/NFAs, but 
still fairly straightforward if you're only doing prefix merging.

And this gets rid of the O(log n) factor.

-- 
Live well,
~wren

Re: [Haskell-cafe] Efficient parallel regular expressions

wren ng thornton