David House wrote:
Hi all. I need a decent regex library and JRegex seems the perfect choice: simple API, yet well-featured, as well as PCRE support.
I "maintain" Text.Regex.Lazy ( http://sourceforge.net/projects/lazy-regex ) so I would mention it does not have full PCRE support. The module's documentation ( summarize here http://sourceforge.net/forum/forum.php?forum_id=554104 ) explains what it does have. In summary of summary: For simple Regex usage (with capture) the Text.Regex.Lazy.Compat module replaces Text.Regex with a better implementation. For simple expressions where a DFA works, the CompatDFA is fastest. For fancier Regexes (such as using lazy pattern with ?? *? and +?) the Text.Regex.Lazy.Full extends Text.Regex.Lazy.Compat. For much fancier regular expressions (e.g. PCRE) you would need to add two hopefully simple pieces: (1) Extend the parsec code used to comprehend the meaning of the regex string. (2) Extend the code that produces the Parsec parser that implements the desired matching semantics. (3) Test cases for the expanded syntax and semantics. Note that Text.Regex.Lazy is an all Haskell solution. There are other haskell projects that wrap the standard regex/pcre libraries. The problem is that marshaling [Char] to c-strings is quite slow and cannot be lazy, so you may want to use the new Fast Packed String (now ByteString) library with foreign functions to call the pcre c-library.
I want to use it on a simple project which involves input files a little larger than typical -- between 100KB and 500KB -- but still small enough so as to not present a problem.
However, and I'm fairly sure JRegex is at fault here, my program segfaults on an input of ~230KB. Has anyone used JRegex successfully in this way before? If so, what tactics did you use?
Thanks in advance.