Re: [Haskell-cafe] JRegex on "large" input sizes

1 Jul 2006

      David House wrote:
...
Hi all. I need a decent regex library and JRegex seems the perfect
choice: simple API, yet well-featured, as well as PCRE support.
I "maintain" Text.Regex.Lazy ( http://sourceforge.net/projects/lazy-regex ) so I 
would mention it does not have full PCRE support.  The module's documentation ( 
summarize here http://sourceforge.net/forum/forum.php?forum_id=554104 ) explains 
what it does have.  In summary of summary:

For simple Regex usage (with capture) the Text.Regex.Lazy.Compat module replaces 
Text.Regex with a better implementation.

For simple expressions where a DFA works, the CompatDFA is fastest.

For fancier Regexes (such as using lazy pattern with ?? *? and +?) the 
Text.Regex.Lazy.Full extends Text.Regex.Lazy.Compat.

For much fancier regular expressions (e.g. PCRE) you would need to add two 
hopefully simple pieces:
(1) Extend the parsec code used to comprehend the meaning of the regex string.
(2) Extend the code that produces the Parsec parser that implements the desired 
matching semantics.
(3) Test cases for the expanded syntax and semantics.

Note that Text.Regex.Lazy is an all Haskell solution.  There are other haskell 
projects that wrap the standard regex/pcre libraries.  The problem is that 
marshaling [Char] to c-strings is quite slow and cannot be lazy, so you may want 
to use the new Fast Packed String (now ByteString) library with foreign 
functions to call the pcre c-library.
...
I want
to use it on a simple project which involves input files a little
larger than typical -- between 100KB and 500KB -- but still small
enough so as to not present a problem.
However, and I'm fairly sure JRegex is at fault here, my program
segfaults on an input of ~230KB. Has anyone used JRegex successfully
in this way before? If so, what tactics did you use?
Thanks in advance.

Re: [Haskell-cafe] JRegex on "large" input sizes

Chris Kuklewicz