
I would like to announce new versions of the regex-* packages. This announcement covers: Version Package Description 0.83 regex-base -- Type Classes and generic instances 0.90 regex-compat -- Uses regex-posix to provide old API 0.91 regex-dfa -- backend, pure haskell, no submatch capture 0.90 regex-parsec -- backend, pure haskell 0.91 regex-pcre -- backend, links against libpcre 0.91 regex-posix -- backend, links against standard c library 0.92 regex-tdfa -- backend, pure haskell (Posix semantics) 0.91 regex-tre -- backend, links against libtre (currently buggy) These all compile, install, and run a few test correctly. Most notably, I consider regex-tdfa to be of useful quality now. Summary of changes and recommendations: * all packages: ** import Text.Regex.XXX exposes (getVersion_Text_Regex_XXX :: Data.Version) which allows programs to access the current version number of the package ** LICENSE file provided (all are 3 clause BSD except regex-dfa is LGPL) * regex-base: ** BUGFIX: one of the RegexContext instances used tail unsafely ** RegexMaker now has makeRegexM and makeRegexOptsM for better error handling ** Extract has new instances for (Seq Char) and (ByteString.Lazy), as well as the previous [Char] and ByteString instances * all backends: ** Now support [Char], (Seq Char), ByteString, and ByteString.Lazy ** CHANGE: The (=~~) monadic match operators now use makeRegexM and will call 'fail' when a regular expression cannot be parsed. ** CHANGE: (import Text.Regex.BACKEND) now re-exports (module Text.Regex.Base) * regex-dfa: ** BUGFIX: No longer hangs on repeated nullable subpatterns * regex-tdfa: ** New backend in pure haskell that provides true Posix semantics ** Runs with excellent memory usage ** I recommend this backend for Posix extended regular expressions (leftmost longest). * regex-compat: No other changes, still uses regex-posix underneath, not recommended * regex-parsec: No other changes, I recommend regex-tdfa or regex-pcre instead * regex-pcre: No other changes, best provider of Perl's left-biased regular expressions * regex-posix: No other changes, very slow (on OS X the underlying C library is buggy) * regex-tre: No other changes, underlying libtre version 0.7.5 is still buggy Dependencies: All of the above packages have been updated to depend on regex-base>=0.80. I have only tested with GHC 6.6 on Mac OS X 10.4.8 (PPC, 32bit). Porting the backends to other Haskell compilers should be possible, though they may not support the polymorphic type class API that regex-base provides. Porting to GHC 6.4 should work once the support for (Seq Char) and ByteString[.Lazy] has been edited or externally obtained. I think only regex-tdfa actually uses bang patterns at the moment, and those could also be removed when porting. Where to get more information and the packages themselves: There is a slowly developing wiki page at http://haskell.org/haskellwiki/Regular_expressions for holding more documentation relating to these packages. I have uploaded tar.gz sources for each of the packages to hackage: http://hackage.haskell.org/packages/hackage.html They are listed under the "Text" Category: http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Text Development and bug fixes continue in the darcs repositories under http://darcs.haskell.org/packages/regex-unstable/ To checkout one of the above versions with darcs you can use commands like darcs get --partial --tag=0.83 regex-base where the --tag=0.83 may be omitted to get the latest unstable version To install the packages once you have the source: For regex-pcre and regex-tre (and perhaps regex-posix) you might need to edit the end of cabal file to provide Include and Lib directories to the corresponding C library. # Compile Setup.hs for better startup speed ghc --make Setup.hs -o setup # I use my own path and "--user" .I recommend doing this to avoid overwriting # the global regex-* from GHC 6.6 ./setup configure --enable-library-profiling --prefix=YOUR_PATH --user ./setup build ./setup install Producing haddock documentation may not work and may not be up to date, with the important exception of regex-base. Future Plans: * regex-base: add support for generalized indices instead of the current Int * regex-tdfa: Improve DFA algorithm and further limit memory allocation. Try to improve performance of ByteString.Lazy matching. Cheers, Chris Kuklewicz