Re: [Haskell-cafe] is this a bug ?

17 Jul 2010

      On Saturday 17 July 2010 05:39:00, gate03@landcroft.co.uk wrote:
...
On Sat 17/07/10 04:17 , Alexander Solla ajs@2piix.com sent:
...
Why are you performing unsafe IO actions?  They don't play nice
with laziness.
OK, fair cop, but without the unsafe IO action, it still misbehaves.
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=27650
Michael.
Source-diving reveals: it's a bug.
Text.Regex.Posix.ByteString.Lazy is just a thin wrapper around the strict 
variant, lazy ByteStrings are transformed into strict ones before the 
functions of Text.Regex.Posix.ByteString are called.
To avoid copying twice, if the lazy ByteString does not end with a '\0', a 
'\0' is snoc'ed to the end before transforming to a strict ByteString.
Thus the regexec of Text.Regex.Posix.ByteString takes slices of a longer 
ByteString than it should and no measures are taken to chop the trailing 
'\0' off again.

A related problem is that ByteStrings (and Strings) may legitimately 
contain '\0's, but regex-posix (and probably [almost] all other regex 
packages) treats them as CStrings, so the regex functions will stop 
processing at the first '\0' (naturally, they call C) but on the Haskell 
side, that may be only a small part of the string.

Re: [Haskell-cafe] is this a bug ?

Daniel Fischer