
I had similar issues a while ago. It had to do with UTF-8 encoding as far as I can recall. I wanted to "wrap" a multiline string (code listings) within some pandoc generated HTML of a hakyll page with a container "div". The text to wrap would be determined using a PCRE regex. Here the (probably inefficient) implementation: module Transformations where import Hakyll import qualified Text.Regex.PCRE as RE import qualified Data.ByteString.UTF8 as BSU import qualified Data.ByteString as BS -- Wraps numbered code listings within the page body with a div -- in order to be able to apply some more specific styling. wrapNumberedCodelistings (Page meta body) = Page meta newBody where newBody = regexReplace' regex wrap body regex = "
David Rapoza</td>\r\n | \r\n <i>Return to Ravnica</i>\r\n </td>\r\n | 10/31/2012</td>\r\n </tr><tr>\r\n | <"
Prelude Text.Regex.PCRE> m
"a href=\"/magic/magazine/article.aspx?x=mtg/daily/activity/1088\">
I have a similar issue with non-ascii strings. It seems that the internal representation used by Haskell and pcre are different and one of them is counting bytes and the other is counting code points. So they diverge when a multi-byte representation (like utf8) is used. It has been reported previously. See these threads: http://www.haskell.org/pipermail/haskell-cafe/2012-August/thread.html#102959 http://www.haskell.org/pipermail/haskell-cafe/2012-August/thread.html#103029 I am still waiting for a new release of regex-pcre that fixes this issue. Romildo _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe |