
Hi Gwern, Sorry for not noticing this sooner, my haskell-cafe@ reading is somewhat behind right now!
After an hour, I came up with a nice clean little script:
----
import Text.HTML.TagSoup.Render import Text.HTML.TagSoup
main :: IO () main = interact convertPre
convertPre :: String -> String convertPre = renderTags . map convertToHaskell . canonicalizeTags . parseTags
convertToHaskell :: Tag -> Tag convertToHaskell x | isTagOpenName "pre" x = TagOpen "haskell" (extractAttribs x) | isTagCloseName "pre" x = TagClose "haskell" | otherwise = x where extractAttribs :: Tag -> [Attribute] extractAttribs (TagOpen _ y) = y extractAttribs _ = error "The impossible happened."
convertToHaskell (TagOpen "pre" atts) = TagOpen "haskell" atts convertToHaskell (TagClose "pre") = TagClose "haskell" convertToHaskell x = x Direct pattern matching is much easier and simpler.
Anyway, so my script seems to work. I ran the wiki output through it and this is the diff: http://haskell.org/haskellwiki/?title=User%3AGwern%2Fkenn&diff=22827&oldid=22811.
Ok, good, it replaces all the tags... But wait, what's all this other stuff? It is replacing all my apostrophes with '! No doubt this has something to do with XML/HTML/SGML or whatever, but it's not ideal. Even if it doesn't break the formatting (as I think it does), it's still cluttering up the source.
The escaping of ' is caused by renderTags, so instead call: renderTagsOptions (renderOptions{optEscape = (:[])}) For no escaping of any characters, or more likely do something like <,
and & conversions. See the docs: http://hackage.haskell.org/packages/archive/tagsoup/0.6/doc/html/Text-HTML-T...
Am I just barking up the wrong tree and should be writing a simple-minded search-and-replace sed script which replaces <pre> with <haskell>, </pre> with </haskell>...?
Not necessarily. If you literally just want to replace "<haskell>" with "<pre>" then sed is probably the easy choice. However, its quite likely you'll want to make more fixes, and tagsoup gives you the flexibility to extend in that direction. Thanks Neil