regex and Unicode

I tried to write a program using Text.Regex.PCRE to search through a UTF8-encoded document. It appears that the presence of non-breaking-space characters (Charpoint 160) triggers some weird behavior in my program. This is using the Debian stable(Jessie) packages of ghc 7.6.3 and libraries. Now I find myself at a fork in the road, not sure which direction to head in. Do I: 1) Continue looking (or get help with looking) for bugs in my code? (I have this reduced to a pretty small test case) 2) Assemble a bug-report against debian? 3) Assemble a bug-report against Text.Regex.PCRE (or Text.Regex.Base) for "upstream" 4) Uninstall Text.Regex.PCRE (and/or some other packages) and switch to something that works with Unicode/UTF8? Any ideas?

On Sun, 28 Aug 2016 15:11:24 -0400
Brian Sammon
I tried to write a program using Text.Regex.PCRE to search through a UTF8- encoded document. It appears that the presence of non-breaking-space characters (Charpoint 160) triggers some weird behavior in my program.
Well switching my code to use Text.RegexPR-based searches rather than Text.Regex.PCRE made the problem go away. Text.Regex.PCRE seems to be unmaintained, so I guess I shouldn't be surprised that I had problems with it.
participants (1)
-
Brian Sammon