ANN: New release of regex packages

I would like to announce new versions of the regex-* packages. This announcement covers: Version Package Description 0.83 regex-base -- Type Classes and generic instances 0.90 regex-compat -- Uses regex-posix to provide old API 0.91 regex-dfa -- backend, pure haskell, no submatch capture 0.90 regex-parsec -- backend, pure haskell 0.91 regex-pcre -- backend, links against libpcre 0.91 regex-posix -- backend, links against standard c library 0.92 regex-tdfa -- backend, pure haskell (Posix semantics) 0.91 regex-tre -- backend, links against libtre (currently buggy) These all compile, install, and run a few test correctly. Most notably, I consider regex-tdfa to be of useful quality now. Summary of changes and recommendations: * all packages: ** import Text.Regex.XXX exposes (getVersion_Text_Regex_XXX :: Data.Version) which allows programs to access the current version number of the package ** LICENSE file provided (all are 3 clause BSD except regex-dfa is LGPL) * regex-base: ** BUGFIX: one of the RegexContext instances used tail unsafely ** RegexMaker now has makeRegexM and makeRegexOptsM for better error handling ** Extract has new instances for (Seq Char) and (ByteString.Lazy), as well as the previous [Char] and ByteString instances * all backends: ** Now support [Char], (Seq Char), ByteString, and ByteString.Lazy ** CHANGE: The (=~~) monadic match operators now use makeRegexM and will call 'fail' when a regular expression cannot be parsed. ** CHANGE: (import Text.Regex.BACKEND) now re-exports (module Text.Regex.Base) * regex-dfa: ** BUGFIX: No longer hangs on repeated nullable subpatterns * regex-tdfa: ** New backend in pure haskell that provides true Posix semantics ** Runs with excellent memory usage ** I recommend this backend for Posix extended regular expressions (leftmost longest). * regex-compat: No other changes, still uses regex-posix underneath, not recommended * regex-parsec: No other changes, I recommend regex-tdfa or regex-pcre instead * regex-pcre: No other changes, best provider of Perl's left-biased regular expressions * regex-posix: No other changes, very slow (on OS X the underlying C library is buggy) * regex-tre: No other changes, underlying libtre version 0.7.5 is still buggy Dependencies: All of the above packages have been updated to depend on regex-base>=0.80. I have only tested with GHC 6.6 on Mac OS X 10.4.8 (PPC, 32bit). Porting the backends to other Haskell compilers should be possible, though they may not support the polymorphic type class API that regex-base provides. Porting to GHC 6.4 should work once the support for (Seq Char) and ByteString[.Lazy] has been edited or externally obtained. I think only regex-tdfa actually uses bang patterns at the moment, and those could also be removed when porting. Where to get more information and the packages themselves: There is a slowly developing wiki page at http://haskell.org/haskellwiki/Regular_expressions for holding more documentation relating to these packages. I have uploaded tar.gz sources for each of the packages to hackage: http://hackage.haskell.org/packages/hackage.html They are listed under the "Text" Category: http://hackage.haskell.org/packages/archive/pkg-list.html#cat:Text Development and bug fixes continue in the darcs repositories under http://darcs.haskell.org/packages/regex-unstable/ To checkout one of the above versions with darcs you can use commands like darcs get --partial --tag=0.83 regex-base where the --tag=0.83 may be omitted to get the latest unstable version To install the packages once you have the source: For regex-pcre and regex-tre (and perhaps regex-posix) you might need to edit the end of cabal file to provide Include and Lib directories to the corresponding C library. # Compile Setup.hs for better startup speed ghc --make Setup.hs -o setup # I use my own path and "--user" .I recommend doing this to avoid overwriting # the global regex-* from GHC 6.6 ./setup configure --enable-library-profiling --prefix=YOUR_PATH --user ./setup build ./setup install Producing haddock documentation may not work and may not be up to date, with the important exception of regex-base. Future Plans: * regex-base: add support for generalized indices instead of the current Int * regex-tdfa: Improve DFA algorithm and further limit memory allocation. Try to improve performance of ByteString.Lazy matching. Cheers, Chris Kuklewicz

On Mon, Mar 05, 2007 at 03:51:47PM +0000, Chris Kuklewicz wrote:
I would like to announce new versions of the regex-* packages. [...] Porting the backends to other Haskell compilers should be possible, though they may not support the polymorphic type class API that regex-base provides.
Some of the instances in regex-base rely on late overlapping resolution, which is a GHC-only feature. That is, instances overlap, with neither subsuming the other, but they're only used outside of the overlap. The overlap occurs in the following groups of instances: instance (RegexLike a b) => RegexContext a b (Array Int b) instance (RegexLike a b) => RegexContext a b MatchArray instance (RegexLike a b) => RegexContext a b [Array Int b] instance (RegexLike a b) => RegexContext a b [MatchArray] instance (RegexLike a b) => RegexContext a b [b] instance (RegexLike a b) => RegexContext a b [MatchArray] instance (RegexLike a b) => RegexContext a b [(MatchOffset,MatchLength)] (MatchArray is a synonym for Array Int (MatchOffset, MatchLength)) Apart from that, this package (and thus the others, with a few #ifdef's) would work with Hugs. How about eliminating the above overlaps?

Ross Paterson wrote:
On Mon, Mar 05, 2007 at 03:51:47PM +0000, Chris Kuklewicz wrote:
I would like to announce new versions of the regex-* packages. [...] Porting the backends to other Haskell compilers should be possible, though they may not support the polymorphic type class API that regex-base provides.
Some of the instances in regex-base rely on late overlapping resolution, which is a GHC-only feature. That is, instances overlap, with neither subsuming the other, but they're only used outside of the overlap.
I think I will update the "unstable" regex-base with newtypes to separate the instance.
The overlap occurs in the following groups of instances:
These problems exists because 'b' in (RegexLike a b) is always an Extract instance and I cannot simply specify that the third parameter to RegexContext is "Not an Extract instance" . For reference:
type MatchArray = Array Int (MatchOffset, MatchLength)
hmmm.... Imaginary new syntax: notInstance Extract (MatchOffset, MatchLength) would fix MatchArray from overlapping with (Extract b => Array Int b) I will cook up some newtype's and update regex-base. I only get one "list of _" instance. Since MatchArray is part of the underlying machinery, I think I will choose [MatchArray]. Overlap where b == (MatchOffset, MatchLength) newtype/update:
instance (RegexLike a b) => RegexContext a b (Array Int b) keep: instance (RegexLike a b) => RegexContext a b MatchArray
Overlap where b == (MatchOffset, MatchLength) newtype/update:
instance (RegexLike a b) => RegexContext a b [Array Int b] keep: instance (RegexLike a b) => RegexContext a b [MatchArray]
Overlap where b == (MatchOffset, MatchLength) newtype/update:
instance (RegexLike a b) => RegexContext a b [b] keep: instance (RegexLike a b) => RegexContext a b [(MatchOffset,MatchLength)]
Overlap where b == (MatchArray) newtype/update:
instance (RegexLike a b) => RegexContext a b [b] keep: instance (RegexLike a b) => RegexContext a b [MatchArray]
(MatchArray is a synonym for Array Int (MatchOffset, MatchLength))
Apart from that, this package (and thus the others, with a few #ifdef's) would work with Hugs. How about eliminating the above overlaps?

Ross Paterson wrote:
Apart from that, this package (and thus the others, with a few #ifdef's) would work with Hugs. How about eliminating the above overlaps?
I don't have Hugs installed, so could you try porting the regex-base in the HEAD of http://darcs.haskell.org/packages/regex-unstable/regex-base that I just updated? This creates 4 newtypes in the RegexLike module that are used in RegexContext to disambiguate the semantics and remove the overlap that Hugs complains about: newtype AllSubmatches f b = AllSubmatches {getAllSubmatches :: (f b)} newtype AllTextSubmatches f b = AllTextSubmatches (f b) newtype AllMatches f b = AllMatches (f b) newtype AllTextMatches f b = AllTextMatches (f b) This has another benefit, in that I don't have to choose whether the result type of [b] is the submatches or all the whole matches, since I now use newtypes to provide both flavors. And most things are now available as either a list or an (Array Int). And [[b]] is now available in all 4 list/array flavors. -- Chris

On Wed, Mar 07, 2007 at 11:28:06AM +0000, Chris Kuklewicz wrote:
Ross Paterson wrote:
Apart from that, this package (and thus the others, with a few #ifdef's) would work with Hugs. How about eliminating the above overlaps?
I don't have Hugs installed, so could you try porting the regex-base in the HEAD of http://darcs.haskell.org/packages/regex-unstable/regex-base that I just updated?
It's fine now, thanks.

Ross Paterson wrote:
On Wed, Mar 07, 2007 at 11:28:06AM +0000, Chris Kuklewicz wrote:
Ross Paterson wrote:
Apart from that, this package (and thus the others, with a few #ifdef's) would work with Hugs. How about eliminating the above overlaps?
I don't have Hugs installed, so could you try porting the regex-base in the HEAD of http://darcs.haskell.org/packages/regex-unstable/regex-base that I just updated?
It's fine now, thanks.
Excellent. Now I just have to rewrite the Haddock to describe the new instances. And I may continue to flesh out the types. I'll also take time this weekend to install Hugs or WinHugs so I can double check any changes. Meanwhile: "Regular Expressions" themed humor: http://xkcd.com/c208.html (and check out the comic image's tooltip...) -- Chris
participants (2)
-
Chris Kuklewicz
-
Ross Paterson