
I like the approach of Russ Cox[2]. One of the great ideas there (which I think he didn't emphasize enough) is to have a queue which allows O(1) testing whether an element is already there [3]. This solves the problem with priorities -- the states are guaranteed to be enqueued in the order of their priorities, and there are no repetitions.
Hm, this sounds great! So then the number of states is just the size of the NFA, so the memory does not depend on the input string length? Have you tried this approach yet? I wouldn't vouch for my code in that gist, I kind of remember trying to eliminate the duplicates while preserving order buy not sure if I did it correctly there.
To be able to process the states in the proper order I'll have to give up Glushkov automaton and probably use something similar to your construction [4].
[2] http://swtch.com/~rsc/regexp/regexp2.html [3] http://code.google.com/p/re2/source/browse/util/sparse_array.h [4] http://t0yv0.blogspot.com/2011/07/combinatory-regular-expressions-in.html
-- Roman I. Cheplyaka :: http://ro-che.info/
-- Kind Regards, Anton Tayanovskyy WebSharperâ„¢ - type-safe JavaScript in F# http://intellifactory.com