regex-tdfa issue with large regexps

Hi and thanks again Christopher.. I just used regex-tdfa for the first time. I'm looking for a native haskell regex lib that's installable and reliable on all platforms. Perl-style regexps would be ideal. I know regex-tdfa doesn't support these (yet ?) but it did install easily on windows which is promising. I noticed it's quite sensitive to regexp size; it will eat all ram on my 1G mac with regexps larger than 500 chars or so. With pcre-light, I could get up to about 32K before a runtime error. Is there an issue tracker for the regex packages ? Best, -Simon

On Mon, Aug 23, 2010 at 11:42 PM, Simon Michael
Hi and thanks again Christopher.. I just used regex-tdfa for the first time. I'm looking for a native haskell regex lib that's installable and reliable on all platforms. Perl-style regexps would be ideal. I know regex-tdfa doesn't support these (yet ?) but it did install easily on windows which is promising.
You could take a look at weighted-regex [1]. It's pure Haskell, takes O(n) memory and O(mn) time where n is the length of the regular expression and m is the length of the query string. [1] http://hackage.haskell.org/package/weighted-regexp Cheers! =) -- Felipe.

Do you have an example of one of the long patterns? I wrote regex-tdfa. The lazy-DFA was educational, but can take up lots of space. I have been toying with a new version as a way to lean some OCaml. The new version does not make a big DFA, and should not take up as much space. Fundamentally an efficient automaton may take lots of space while matching but will run in linear time. PCRE is a backtracking engine that may take exponential time but use negligible space. If the experiment with OCaml ends happily there may eventually be a new major version of regex-tdfa with a port the space-efficient result. On 24/08/2010 03:42, Simon Michael wrote:
Hi and thanks again Christopher.. I just used regex-tdfa for the first time. I'm looking for a native haskell regex lib that's installable and reliable on all platforms. Perl-style regexps would be ideal. I know regex-tdfa doesn't support these (yet ?) but it did install easily on windows which is promising.
I noticed it's quite sensitive to regexp size; it will eat all ram on my 1G mac with regexps larger than 500 chars or so. With pcre-light, I could get up to about 32K before a runtime error.
Is there an issue tracker for the regex packages ?
Best, -Simon
participants (3)
-
Chris Kuklewicz
-
Felipe Lessa
-
Simon Michael