ANN: TextRegexLazy-0.56, (=~) and (=~~) are here

Announcing: TextRegexLazy version 0.56 Where: Tarball from http://sourceforge.net/projects/lazy-regex darcs get --partial [--tag=0.56] http://evenmere.org/~chrisk/trl/stable/ License : BSD, except for DFAEngine.hs which is LGPL (derived from CTK light) Development/unstable version is at: darcs get [--partial] http://evenmere.org/~chrisk/trl/devel/ This is the version that has eaten John Meacham's JRegex library and survived to become strong. Thanks John! It now compiles against the posix regexp provided by the c library and the pcre library, in addition to the "full lazy" and the "DFA" backends. All 4 backends can accept regular expressions given as String and as ByteString. All 4 backends can run regular expressions against String and ByteString. In particular, the PosixRE and PCRE can run very efficiently against ByteString. (Though the input for the PosixRE needs to end in a \NUL character for efficiency). So there are 4*2*2 = 16 ways to use to provide input to this library. And the RegexContext class has at least 11 instances that both (=~) and (=~~) can target. So that is 4*2*2*11*2 = 352 things you can do with this library! Get your copy today! To run with cabal before 1.1.4 you will need to comment out the "Extra-Source-Files:" line in the TextRegexLazy.cabal file. The Example.hs file:
{-# OPTIONS_GHC -fglasgow-exts #-} import Text.Regex.Lazy import Text.Regex.Full((=~),(=~~)) -- or DFA or PCRE or PosixRE
main = let b :: Bool b = ("abaca" =~ "(.)a") c :: [MatchArray] c = ("abaca" =~ "(.)a") d :: Maybe (String,String,String,[String]) d = ("abaca" =~~ "(.)a") in do print b print c print d
This produces:
True [array (0,1) [(0,(1,2)),(1,(1,1))],array (0,1) [(0,(3,2)),(1,(3,1))]] Just ("a","ba","ca",["b"])
You can also use makeRegex and makeRegexOpts to compile and save a regular expression which will be used multiple times. Each of the 4 backends has a separate "Regex" data type with its own option types. For low level access, the WrapPCRE and WrapPosix modules expose a typesafe layer around the c libraries. You can query the "getVersion :: Maybe String" to see if the have been compiled into the library. It may be possible to use WrapPCRE and the UTF8 option flags to do unicode regex matching with PCRE. ( The Full and DFA backends use the Haskell unicode Char already ). Adding new types to String/ByteString is a matter of adding instances to the existing classes. Feedback and comments of any length is welcome. -- Chris Kuklewicz

Ooops. I just patched the efficiency of ByteStringPCRE to agree with the original announcement. Use darcs get --partial http://evenmere.org/~chrisk/trl/stable/ to get the fixed version. A new 0.57 tarball will go to sourceforge soon. Chris Kuklewicz wrote:
Announcing: TextRegexLazy version 0.56 Where: Tarball from http://sourceforge.net/projects/lazy-regex darcs get --partial [--tag=0.56] http://evenmere.org/~chrisk/trl/stable/ License : BSD, except for DFAEngine.hs which is LGPL (derived from CTK light)
Development/unstable version is at: darcs get [--partial] http://evenmere.org/~chrisk/trl/devel/
This is the version that has eaten John Meacham's JRegex library and survived to become strong. Thanks John!
It now compiles against the posix regexp provided by the c library and the pcre library, in addition to the "full lazy" and the "DFA" backends.
All 4 backends can accept regular expressions given as String and as ByteString.
All 4 backends can run regular expressions against String and ByteString.
In particular, the PosixRE and PCRE can run very efficiently against ByteString. (Though the input for the PosixRE needs to end in a \NUL character for efficiency).
So there are 4*2*2 = 16 ways to use to provide input to this library. And the RegexContext class has at least 11 instances that both (=~) and (=~~) can target. So that is 4*2*2*11*2 = 352 things you can do with this library! Get your copy today!
To run with cabal before 1.1.4 you will need to comment out the "Extra-Source-Files:" line in the TextRegexLazy.cabal file.
The Example.hs file:
{-# OPTIONS_GHC -fglasgow-exts #-} import Text.Regex.Lazy import Text.Regex.Full((=~),(=~~)) -- or DFA or PCRE or PosixRE
main = let b :: Bool b = ("abaca" =~ "(.)a") c :: [MatchArray] c = ("abaca" =~ "(.)a") d :: Maybe (String,String,String,[String]) d = ("abaca" =~~ "(.)a") in do print b print c print d
This produces:
True [array (0,1) [(0,(1,2)),(1,(1,1))],array (0,1) [(0,(3,2)),(1,(3,1))]] Just ("a","ba","ca",["b"])
You can also use makeRegex and makeRegexOpts to compile and save a regular expression which will be used multiple times. Each of the 4 backends has a separate "Regex" data type with its own option types.
For low level access, the WrapPCRE and WrapPosix modules expose a typesafe layer around the c libraries. You can query the "getVersion :: Maybe String" to see if the have been compiled into the library.
It may be possible to use WrapPCRE and the UTF8 option flags to do unicode regex matching with PCRE. ( The Full and DFA backends use the Haskell unicode Char already ).
Adding new types to String/ByteString is a matter of adding instances to the existing classes.
Feedback and comments of any length is welcome.

Hello Chris, Wednesday, August 2, 2006, 3:16:58 PM, you wrote:
Announcing: TextRegexLazy version 0.56
your feature list is really strong! it will be great now to make it a part of GHC standard distribution afaiu, selection of regex engine implemented via import statements? as long-standing goals i can point to support of lazy bytestrings, UTF-8, filename wildcards. but that is really more the details than essentials and now testing/bundling with GHC is most important (of course, imho) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Chris,
Wednesday, August 2, 2006, 3:16:58 PM, you wrote:
Announcing: TextRegexLazy version 0.56
your feature list is really strong! it will be great now to make it a part of GHC standard distribution
afaiu, selection of regex engine implemented via import statements?
Well, yes. The "makeRegex" is provided by each of the 4 backends to create its own Regex type. So there are 4 different data types: Text.Regex.Full.Regex Text.Regex.DFA.Regex Text.Regex.PCRE.Regex Text.Regex.PosixRE.Regex I might be able to make a "framework" polymorphic type: data Regex backend = .... hmmm....
as long-standing goals i can point to support of lazy bytestrings,
Lazy bytestrings are very specialized. You could probable "unpack" them and send them to the Full or DFA backends.
UTF-8,
The Posix backend will never understand unicode of any type. The PCRE backend, if compiled properly, does understand UTF8. You need an encoder/decoder between Unicode [Char] to UTF8 CStringLen or ByteString. The Full and DFA backends understand Char and are therefore already Unicode compliant.
filename wildcards.
What is a FilePath? Ooops...wrong thread!
but that is really more the details than essentials and now testing/bundling with GHC is most important (of course, imho)
More testing is a must. But the 352 ways to operate it make this annoying. More documentation is just grunt work.

Bulat Ziganshin wrote:
Hello Chris,
Wednesday, August 2, 2006, 3:16:58 PM, you wrote:
Announcing: TextRegexLazy version 0.56
your feature list is really strong! it will be great now to make it a part of GHC standard distribution
afaiu, selection of regex engine implemented via import statements?
You can actually choose which backend (=~) uses at runtime:
{-# OPTIONS_GHC -fglasgow-exts #-} import Text.Regex.Lazy
import qualified Text.Regex.PCRE as R import qualified Text.Regex.PosixRE as S import qualified Text.Regex.Full as F
-- Choose which library to use depending on presence of PCRE library
(=~) :: (RegexMaker R.Regex R.CompOption R.ExecOption a,RegexContext R.Regex b t ,RegexMaker F.Regex F.CompOption F.ExecOption a,RegexContext F.Regex b t ,RegexMaker S.Regex S.CompOption S.ExecOption a,RegexContext S.Regex b t) => b -> a -> t (=~) = case R.getVersion of Just _ -> (R.=~) Nothing -> case S.getVersion of Just _ -> (S.=~) Nothing -> (F.=~)
main = print (("ba" =~ "(.)a") :: (String,String,String,[String]))
The R.getVersion and S.getVersion detect whether it was compiled against PCRE or PosixRE. The (=~) is then chosen at run time.
participants (2)
-
Bulat Ziganshin
-
Chris Kuklewicz