A weird bug of regex-pcre

18 Dec 2012

      Attachment is the test text file.
And I tested my regexp as this:

Prelude> :m + Text.Regex.PCRE
Prelude Text.Regex.PCRE> z <- readFile "test.html"
Prelude Text.Regex.PCRE> let (b, m ,a, ss) = z =~ ".*? b
...
n of the Triumvirate</td>\r\n    David Rapoza</td>\r\n
   \r\n      <i>Return to Ravnica</i>\r\n    </td>\r\n
   10/31/2012</td>\r\n  </tr><tr>\r\n  <"
Prelude Text.Regex.PCRE> m
"a href=\"/magic/magazine/article.aspx?x=mtg/daily/activity/1088\">
...
From the value of b and m, it was weird that the matching was moved forward
by 1 char ( the ss (sub matching) was even worse, 2 chars ). Rematch to a
and so on gave correct results. It was only the first matching that was
broken.
Tested with regex-posix (with modified regexp), everything is OK.
$ ghc-pkg describe regex-pcre
name: regex-pcre
version: 0.94.4
id: regex-pcre-0.94.4-d45e00c9e113c7c9352d0785497e1dca
license: BSD3
copyright: Copyright (c) 2006, Christopher Kuklewicz
maintainer: TextRegexLazy@personal.mightyreason.com
stability: Seems to work, passes a few tests
homepage: http://hackage.haskell.org/package/regex-pcre
package-url: http://code.haskell.org/regex-pcre/
synopsis: Replaces/Enhances Text.Regex
description: The PCRE backend to accompany regex-base, see www.pcre.org
category: Text
author: Christopher Kuklewicz
exposed: True
exposed-modules: Text.Regex.PCRE Text.Regex.PCRE.Wrap
                 Text.Regex.PCRE.String Text.Regex.PCRE.Sequence
                 Text.Regex.PCRE.ByteString Text.Regex.PCRE.ByteString.Lazy
hidden-modules:
trusted: False
import-dirs: /home/magicloud/.cabal/lib/regex-pcre-0.94.4/ghc-7.6.1
library-dirs: /home/magicloud/.cabal/lib/regex-pcre-0.94.4/ghc-7.6.1
hs-libraries: HSregex-pcre-0.94.4
extra-libraries: pcre
extra-ghci-libraries:
include-dirs:
includes:
depends: array-0.4.0.1-cbe8814e07792e8f0d66cac77a2c0b6b
         base-4.6.0.0-9108e251636b0c8499261c52a7809ea1
         bytestring-0.10.0.1-11d4f52c4f4ed9833f768577b77050c5
         containers-0.5.2.1-b183418bc7f43ce98b6916ef296c2669
         regex-base-0.93.2-1ee07f806ad6b0c911226883d15b64f2
hugs-options:
cc-options:
ld-options:
framework-dirs:
frameworks:
haddock-interfaces:
/home/magicloud/.cabal/share/doc/regex-pcre-0.94.4/html/regex-pcre.haddock
haddock-html: /home/magicloud/.cabal/share/doc/regex-pcre-0.94.4/html
pkgroot: "/home/magicloud/.ghc/x86_64-linux-7.6.1"

-- 
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.

Magicloud Magiclouds

José Romildo Malaquias

Rico Moorman

Rico Moorman

Magicloud Magiclouds

tags

participants (3)