Hi all,

I'm getting a strange user error when working with Data.ByteString.Char8 and Text.Regex.PCRE.

Error I get is:
CustomerMaster: user error (Text.Regex.PCRE.ByteString died: (ReturnCode 0,"Ptr
parameter was nullPtr in Text.Regex.PCRE.Wrap.wrapMatch cstr"))

The part of the code causing problems is at the end of this e-mail. The complete code is attached.

I'm working at cleansing customer addresses for a ERP system data migration. The failing function is supposed to split some bytestrings using a list of regular expressions (split on words like suite, building, attention, etc...) stored in the "tags" binding. I use a foldl' to try each regular expression on each address (addr is a list of address part, can be seen as a list of bytestring values).

The funny and weird thing is that when I have only one regular expression in tags, it works. When I have more than one regular expression in tags it fails (user error) the first time one of the regular expression matches. A match replaces the current address part with 3 new parts, before, match, after. The next regular expression in the tags list will test the new before and after parts. I presume it's the testing of those newly created parts that causes the problem.

Any idea about what could go wrong here?

Thanks,

Olivier.

=== code ===

-- | Split unparsed address parts on keywords (suite, building, doors, etc...)

splitOnTags :: AddressState
splitOnTags = do
  addr <- get
  put $ foldl' f addr tags
  where
    f a t   = concatMap (split t) a
    split (T {partType = p, regex = r}) a@(AP X v)
      | BS.null y    = [a]
      | otherwise = [AP X x, AE p, AP X z] -- new address parts
      where
        (x, y, z) = match r v
    split _ a = [a]

-- | Regex and part type representing tags used for splitting addresses

tags :: [Tag]
tags = [T AT
          (makeRegexOpts compCaseless execBlank
            "\\b(?:ATTN|ATTENTION|C/O)\\b[[:punct:]]?")
          (BS.pack "ATTN: ")
       ,T PB
          (makeRegexOpts compCaseless execBlank
            "\\b(?:(?:P\\.?\\s?O\\.?\\s*)BOX|C\\.P\\.|POSTFACH)\
            \[[:punct:]]?\\s+(?:NO.\\s+)?")
          (BS.pack "PO BOX: ")
       ,T BLDG
          (makeRegexOpts compCaseless execBlank
            "\\b(?:BLDG|BUILDING|HANGAR)[[:punct:]]?")
          (BS.pack "BLDG: ")
       ,T DOCK
          (makeRegexOpts compCaseless execBlank
            "\\bDOCK[[:punct:]]?")
          (BS.pack "DOCK: ")
       ,T STE
          (makeRegexOpts compCaseless execBlank
            "\\b(?:SUITE|STE|APT|ROOM)[[:punct:]]?\
            \\\s+(?:NO[[:punct:]]?)?")
          (BS.pack "STE: ")
       ,T UNIT
          (makeRegexOpts compCaseless execBlank
            "\\bUNIT[[:punct:]]?")
          (BS.pack "UNIT: ")
       ,T FLOOR
          (makeRegexOpts compCaseless execBlank
            "FLOOR")
          (BS.pack "FLOOR: ")
       ,T DOORS
          (makeRegexOpts compCaseless execBlank
            "\\bDOORS?[[:punct:]]?")
          (BS.pack "DOORS: ")]