user error when using Text.Regex.PCRE

Hi all, I'm getting a strange user error when working with Data.ByteString.Char8 and Text.Regex.PCRE. Error I get is: CustomerMaster: user error (Text.Regex.PCRE.ByteString died: (ReturnCode 0,"Ptr parameter was nullPtr in Text.Regex.PCRE.Wrap.wrapMatch cstr")) The part of the code causing problems is at the end of this e-mail. The complete code is attached. I'm working at cleansing customer addresses for a ERP system data migration. The failing function is supposed to split some bytestrings using a list of regular expressions (split on words like suite, building, attention, etc...) stored in the "tags" binding. I use a foldl' to try each regular expression on each address (addr is a list of address part, can be seen as a list of bytestring values). The funny and weird thing is that when I have only one regular expression in tags, it works. When I have more than one regular expression in tags it fails (user error) the first time one of the regular expression matches. A match replaces the current address part with 3 new parts, before, match, after. The next regular expression in the tags list will test the new before and after parts. I presume it's the testing of those newly created parts that causes the problem. Any idea about what could go wrong here? Thanks, Olivier. === code === -- | Split unparsed address parts on keywords (suite, building, doors, etc...) splitOnTags :: AddressState splitOnTags = do addr <- get put $ foldl' f addr tags where f a t = concatMap (split t) a split (T {partType = p, regex = r}) a@(AP X v) | BS.null y = [a] | otherwise = [AP X x, AE p, AP X z] -- new address parts where (x, y, z) = match r v split _ a = [a] -- | Regex and part type representing tags used for splitting addresses tags :: [Tag] tags = [T AT (makeRegexOpts compCaseless execBlank "\\b(?:ATTN|ATTENTION|C/O)\\b[[:punct:]]?") (BS.pack "ATTN: ") ,T PB (makeRegexOpts compCaseless execBlank "\\b(?:(?:P\\.?\\s?O\\.?\\s*)BOX|C\\.P\\.|POSTFACH)\ \[[:punct:]]?\\s+(?:NO.\\s+)?") (BS.pack "PO BOX: ") ,T BLDG (makeRegexOpts compCaseless execBlank "\\b(?:BLDG|BUILDING|HANGAR)[[:punct:]]?") (BS.pack "BLDG: ") ,T DOCK (makeRegexOpts compCaseless execBlank "\\bDOCK[[:punct:]]?") (BS.pack "DOCK: ") ,T STE (makeRegexOpts compCaseless execBlank "\\b(?:SUITE|STE|APT|ROOM)[[:punct:]]?\ \\\s+(?:NO[[:punct:]]?)?") (BS.pack "STE: ") ,T UNIT (makeRegexOpts compCaseless execBlank "\\bUNIT[[:punct:]]?") (BS.pack "UNIT: ") ,T FLOOR (makeRegexOpts compCaseless execBlank "FLOOR") (BS.pack "FLOOR: ") ,T DOORS (makeRegexOpts compCaseless execBlank "\\bDOORS?[[:punct:]]?") (BS.pack "DOORS: ")]

Thank you very much for the error report. I have tracked down the cause. You are searching against an empty Bytestring. This is now represented by
-- | /O(1)/ The empty 'ByteString' empty :: ByteString empty = PS nullForeignPtr 0 0
And while the useAsCString and useAsCStringLen functions never reveal the null pointer, the current library uses unsafeUseAsCStringLen, which returns the null pointer. And this is getting caught by a null pointer check resulting in your crash. I will post a fix later tonight, and announce it. Which regex-prce version are you using? Perhaps from hackage? I want to prioritize the version you need fixed. The earlier repository holds up to version 0.81 at http://darcs.haskell.org/packages/regex-pcre/ The newer repository holds up to version 0.92 at http://darcs.haskell.org/packages/regex-unstable/regex-pcre/regex-pcre.cabal Out of further curiosity: Which version of the pcre library does it use? And what version of ghc? Which version of Data.ByteString? Cheers, Chris Kuklewicz

On Nov 20, 2007 9:36 AM, ChrisK
Thank you very much for the error report. I have tracked down the cause.
You are searching against an empty Bytestring. This is now represented by
-- | /O(1)/ The empty 'ByteString' empty :: ByteString empty = PS nullForeignPtr 0 0
And while the useAsCString and useAsCStringLen functions never reveal the null pointer, the current library uses unsafeUseAsCStringLen, which returns the null pointer.
And this is getting caught by a null pointer check resulting in your crash. I will post a fix later tonight, and announce it.
Which regex-prce version are you using? Perhaps from hackage? I want to prioritize the version you need fixed.
The earlier repository holds up to version 0.81 at http://darcs.haskell.org/packages/regex-pcre/ The newer repository holds up to version 0.92 at
http://darcs.haskell.org/packages/regex-unstable/regex-pcre/regex-pcre.cabal
Out of further curiosity:
Which version of the pcre library does it use? And what version of ghc? Which version of Data.ByteString?
Cheers, Chris Kuklewicz
Hi Chris, I'm using ghc-6.8.1, regex-pcre-0.92, bytestring-0.9.0.1 and libpcre-7.4. Using your information, I might be able to workaround the problem by filtering out empty bytestrings before applying the next regex. Thanks for your reply, Olivier.

haskell:
Thank you very much for the error report. I have tracked down the cause.
You are searching against an empty Bytestring. This is now represented by
-- | /O(1)/ The empty 'ByteString' empty :: ByteString empty = PS nullForeignPtr 0 0
And while the useAsCString and useAsCStringLen functions never reveal the null pointer, the current library uses unsafeUseAsCStringLen, which returns the null pointer.
And this is getting caught by a null pointer check resulting in your crash. I will post a fix later tonight, and announce it.
Right, empty bytestrings are represented as a nullPtr and a 0 length field. When you use unsafeUseAs* operations, this pointer is passed to a C function as is, without copying. The side condition is that the C function should accept NULL as a 0-length string, which the pcre code evidently doesn't, so its unsafe to use unsafeUseAsCStringLen here, I'll add some more explanatory text about the side conditions that need to hold for this use to be safe, and advise not using unsafe things :) -- Don
participants (3)
-
ChrisK
-
Don Stewart
-
Olivier Boudry