
On Saturday 17 July 2010 05:39:00, gate03@landcroft.co.uk wrote:
On Sat 17/07/10 04:17 , Alexander Solla ajs@2piix.com sent:
Why are you performing unsafe IO actions? They don't play nice with laziness.
OK, fair cop, but without the unsafe IO action, it still misbehaves.
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=27650
Michael.
Source-diving reveals: it's a bug. Text.Regex.Posix.ByteString.Lazy is just a thin wrapper around the strict variant, lazy ByteStrings are transformed into strict ones before the functions of Text.Regex.Posix.ByteString are called. To avoid copying twice, if the lazy ByteString does not end with a '\0', a '\0' is snoc'ed to the end before transforming to a strict ByteString. Thus the regexec of Text.Regex.Posix.ByteString takes slices of a longer ByteString than it should and no measures are taken to chop the trailing '\0' off again. A related problem is that ByteStrings (and Strings) may legitimately contain '\0's, but regex-posix (and probably [almost] all other regex packages) treats them as CStrings, so the regex functions will stop processing at the first '\0' (naturally, they call C) but on the Haskell side, that may be only a small part of the string.