
Hello. It seems that the regex-pcre has a bug dealing with utf-8: Prelude> :m + Text.Regex.PCRE Prelude Text.Regex.PCRE> "país:Brasil" =~ "país:(.*)" :: (String,String,String,[String]) ("","pa\237s:Brasil","",["rasil"]) Notice the missing 'B' in the result of the regex matching. With regex-posix this does not happen: Prelude> :m + Text.Regex.Posix Prelude Text.Regex.Posix> "país:Brasil" =~ "país:(.*)" ::(String,String,String,[String]) ("","pa\237s:Brasil","",["Brasil"]) I hope this bug can be fixed soon. Is there a bug tracker to report the bug? If so, what is it? Romildo

On 08/18/2012 06:16 PM, José Romildo Malaquias wrote:
Hello.
It seems that the regex-pcre has a bug dealing with utf-8:
I hope this bug can be fixed soon.
Is there a bug tracker to report the bug? If so, what is it?
You need something like that let pat = makeRegexOpts (compUTF8 .|. defaultCompOpt) defaultExecOpt ("@'(.+?)'@" :: B.ByteString) and than pat will match correctly.

On Tue, Aug 21, 2012 at 10:25:53PM +0300, Konstantin Litvinenko wrote:
On 08/18/2012 06:16 PM, José Romildo Malaquias wrote:
Hello.
It seems that the regex-pcre has a bug dealing with utf-8:
I hope this bug can be fixed soon.
Is there a bug tracker to report the bug? If so, what is it?
You need something like that
let pat = makeRegexOpts (compUTF8 .|. defaultCompOpt) defaultExecOpt ("@'(.+?)'@" :: B.ByteString)
and than pat will match correctly.
The bug is related to String (not ByteString) in a UTF-8 locale. Until it is fixed, I am using the workaround of converting the regular expression and the text to ByteString, doing the matching, and then converting the results back to String. Romildo
participants (2)
-
José Romildo Malaquias
-
Konstantin Litvinenko