
Exploring the documentation for Text.Regex.PCRE, I've found "CompOption": http://hackage.haskell.org/packages/archive/regex-pcre/0.94.4/doc/html/Text-... The constants are listed below; the one you want is probably compDotAll, to make "." match newlines as well. I'm not 100% sure if this is the module you want, though, and I can't seem to get regex-pcre installed, so I can't test. Apologies! On Wednesday, 6 June 2012 at 4:52 PM, Rico Moorman wrote:
Thank you very much for this suggestion. I just tried the character class you mentioned and it works.
The stackoverflow post you mentioned was a nice read and I surely agree that regular expressions are normally not the way to go for most HTML munging needs. But luckily the generated HTML from pandoc is very specific and the <table> tag I wanted to match (for line-numbered code listings) does not contain any further tables so I thought it should be safe to approach it like this.
The resulting code is now:
-- Wraps numbered code listings within the page body with a div -- in order to be able to apply some more specific styling. wrapNumberedCodelistings (Page meta body) = Page meta newBody where newBody = regexReplace "
]+>[\\s\\S]*?</table>" wrap body wrap x = "
" ++ x ++ "</div>"-- Replaces the whole match for the given regex using the given function regexReplace :: String -> (String -> String) -> String -> String regexReplace regex replace text = go text where go text = case text =~~ regex of Just (before, match, after) -> before ++ replace match ++ go after _ -> text
Don't know though if it could be cleaned up further or even if this is by any means good style (being still fairly new to haskell).
Furthermore I would still be very interested in the right approach to manipulating the HTML structure as a whole and I too hope that another Haskeller could name a more suitable solution for manipulating HTML. Or even how to pass the 's' modifier to Text.Regex.PCRE.
Best regards,
rico
On Wed, Jun 6, 2012 at 7:11 AM, Arlen Cuss wrote:
I'd be more inclined to look at a solution involving manipulating the HTML structure, rather than trying a regexp-based approach, which will probably end up disappointing. (See this: http://stackoverflow.com/a/1732454/499609)
I hope another Haskeller can speak to a library that would be good for this kind of purpose.
To suit what you're doing now, though; if you change .*? to [\s\S]*?, it should work on multiline strings. If you can work out how to pass the 's' modifier to Text.Regexp.PCRE, that should also do it.
—Arlen
On Wednesday, 6 June 2012 at 3:05 PM, Rico Moorman wrote:
Hello,
I have a given piece of multiline HTML (which is generated using pandoc btw.) and I am trying to wrap certain elements (tags with a given class) with a <div>.
I already took a look at the Text.Regex.PCRE module which seemed a reasonable choice because I am already familiar with similar regex implementations in other languages.
I came up with the following function which takes a regex and replaces all matches within the given string using the provided function (which I would use to wrap the element)
import Text.Regex.PCRE ((=~~))
-- Replaces the whole match for the given regex using the given function regexReplace :: String -> (String -> String) -> String -> String regexReplace regex replace text = go text where go text = case text =~~ regex of Just (before, match, after) -> before ++ replace match ++ go after _ -> text
The problem with this function is, that it will not work on multiline strings. I would like to call it like this:
newBody = regexReplace "
" wrap body wrap x = "
" ++ x ++ "</div>"Is there any way to easily pass some kind of multiline modifier to the regex in question?
Or is this approach completely off and would something else be more appropriate/haskelly for the problem at hand?
Thank you very much in advance. _______________________________________________ Beginners mailing list Beginners@haskell.org (mailto:Beginners@haskell.org) (mailto:Beginners@haskell.org) http://www.haskell.org/mailman/listinfo/beginners