
I'm new to Haskell and HaXml and I'm playing around with the latter to clean some (well-formed) 'legacy' html. This works fine except for the following cases. Some of the elements to be cleaned are:
<font size="4"><i>Hello World</i></font> <i><font size="4">Hello World</font></i>
This should become:
<h1 class="subtitle">Hello World</h1>
From what I could gather from the documentation, it should be something like:
foldXml (txt ?> keep :> (attrval("size",AttValue[Left "4"]) `o` tag "font") /> tag "i" ?> replaceTag "h1" :> children)
Is the bracketing correct? I can't remember the precedence of the
operators offhand, but perhaps it should be
foldXml (txt ?> keep :>
(((attrval("size",AttValue[Left "4"]) `o` tag "font")
/> tag "i") ?> replaceTag "h1" :> children))
Yes, the braketing is correct since the following code:
foldXml (txt ?> keep :>
fontSize4 /> tag "em" ?> mkSubtitle :>
children)
fontSize4 = (attrval("size",AttValue[Left "4"]) `o` tag "font")
mkSubtitle = mkElemAttr "h1" [("class", ("subtitle"!))]
[children]
now transforms
<font size="4"><em>Hello World</em></font>
into
<h1 class="subtitle"><em>