
Hi Brian,
I tried to write a program using Text.Regex.PCRE to search through a UTF8-encoded document. It appears that the presence of non-breaking-space characters (code point 160) triggers some weird behavior in my program.
I seem to recall that regex-pcre simply binds to the system's pcre library and effectively lets that library do all the work. Now, libpcre has full Unicode support, but that needs to be enabled at compile time to be available. I believe "--enable-unicode-properties" is the appropriate configure flag, but I don't know for sure. Anyway, my point is that your system's libpcre may or may not have that feature enabled. If it does not, then regex-pcre won't be able to deal with Unicode characters properly and that issue should be reported to Debian. If your system library *has* Unicode support, then this issue might be a caused by a bug in regex-pcre (unlikely) or in your code that uses it (more likely). I hope this helps, Peter