Hi all,
I'm getting a strange user error when working with Data.ByteString.Char8 and Text.Regex.PCRE.
Error I get is:
CustomerMaster: user error (Text.Regex.PCRE.ByteString died: (ReturnCode 0,"Ptr
parameter was nullPtr in Text.Regex.PCRE.Wrap.wrapMatch cstr"))
The part of the code causing problems is at the end of this e-mail. The complete code is attached.
I'm working at cleansing customer addresses for a ERP system data migration. The failing function is supposed to split some bytestrings using a list of regular expressions (split on words like suite, building, attention, etc...) stored in the "tags" binding. I use a foldl' to try each regular expression on each address (addr is a list of address part, can be seen as a list of bytestring values).
The funny and weird thing is that when I have only one regular expression in tags, it works. When I have more than one regular expression in tags it fails (user error) the first time one of the regular expression matches. A match replaces the current address part with 3 new parts, before, match, after. The next regular expression in the tags list will test the new before and after parts. I presume it's the testing of those newly created parts that causes the problem.
Any idea about what could go wrong here?
Thanks,
Olivier.
=== code ===
-- | Split unparsed address parts on keywords (suite, building, doors, etc...)
splitOnTags :: AddressState
splitOnTags = do
addr <- get
put $ foldl' f addr tags
where
f a t = concatMap (split t) a
split (T {partType = p, regex = r}) a@(AP X v)
| BS.null y = [a]
| otherwise = [AP X x, AE p, AP X z] -- new address parts
where
(x, y, z) = match r v
split _ a = [a]
-- | Regex and part type representing tags used for splitting addresses
tags :: [Tag]
tags = [T AT
(makeRegexOpts compCaseless execBlank
"\\b(?:ATTN|ATTENTION|C/O)\\b[[:punct:]]?")
(BS.pack "ATTN: ")
,T PB
(makeRegexOpts compCaseless execBlank
"\\b(?:(?:P\\.?\\s?O\\.?\\s*)BOX|C\\.P\\.|POSTFACH)\
\[[:punct:]]?\\s+(?:NO.\\s+)?")
(BS.pack "PO BOX: ")
,T BLDG
(makeRegexOpts compCaseless execBlank
"\\b(?:BLDG|BUILDING|HANGAR)[[:punct:]]?")
(BS.pack "BLDG: ")
,T DOCK
(makeRegexOpts compCaseless execBlank
"\\bDOCK[[:punct:]]?")
(BS.pack "DOCK: ")
,T STE
(makeRegexOpts compCaseless execBlank
"\\b(?:SUITE|STE|APT|ROOM)[[:punct:]]?\
\\\s+(?:NO[[:punct:]]?)?")
(BS.pack "STE: ")
,T UNIT
(makeRegexOpts compCaseless execBlank
"\\bUNIT[[:punct:]]?")
(BS.pack "UNIT: ")
,T FLOOR
(makeRegexOpts compCaseless execBlank
"FLOOR")
(BS.pack "FLOOR: ")
,T DOORS
(makeRegexOpts compCaseless execBlank
"\\bDOORS?[[:punct:]]?")
(BS.pack "DOORS: ")]