
Hiya Neil. So recently I've been trying to come up with some automated system to turn The Monad Reader articles like those in http://sneezy.cs.nott.ac.uk/darcs/TMR/Issue11 into wiki-formatted articles for putting on Haskell.org. Thus far, I've had the most success with SVN Pandoc. Pandoc does a good job - you can see an example conversion at http://haskell.org/haskellwiki/?title=User:Gwern/kenn&oldid=22808. Modulo the errors which are largely due to haskell.org problems and a few limitations in Pandoc (no comments, no real support for references), it's fine. But Pandoc's author will not support <haskell></haskell> tags inasmuch as they are an extension to MediaWiki and not universal; he prefers <pre> or <pre class="haskell"> tags. He suggested I use TagSoup to convert them into <haskell> tags. Well, alright. They're tags, TagSoup does tags - seems natural. After an hour, I came up with a nice clean little script: ---- import Text.HTML.TagSoup.Render import Text.HTML.TagSoup main :: IO () main = interact convertPre convertPre :: String -> String convertPre = renderTags . map convertToHaskell . canonicalizeTags . parseTags convertToHaskell :: Tag -> Tag convertToHaskell x | isTagOpenName "pre" x = TagOpen "haskell" (extractAttribs x) | isTagCloseName "pre" x = TagClose "haskell" | otherwise = x where extractAttribs :: Tag -> [Attribute] extractAttribs (TagOpen _ y) = y extractAttribs _ = error "The impossible happened." ---- On an aside, may I note that TagSoup doesn't seem to support transformations particularly well? Or if it does, I didn't notice any examples. I spent most of my time just figuring out how to convert the 'x' from a <pre>stuff to <haskell>stuff. Also, it might be nice to define an 'interact' alike, which is (String -> String), and defined, I supposed, as 'interact f = renderTags . f . canonicalizeTags . parseTags'. Extraction functions would be good as well - you'd only need 3 groups, I think; 1 for the 2 items in TagOpen, 1 for TagPosition's 2 positions, and 1 which extracts the String from the rest. Anyway, so my script seems to work. I ran the wiki output through it and this is the diff: http://haskell.org/haskellwiki/?title=User%3AGwern%2Fkenn&diff=22827&oldid=22811. Ok, good, it replaces all the tags... But wait, what's all this other stuff? It is replacing all my apostrophes with '! No doubt this has something to do with XML/HTML/SGML or whatever, but it's not ideal. Even if it doesn't break the formatting (as I think it does), it's still cluttering up the source. So, how can I fix this? Am I just barking up the wrong tree and should be writing a simple-minded search-and-replace sed script which replaces <pre> with <haskell>, </pre> with </haskell>...? -- gwern USS Enforcers SORO Morwenstow MOD Albright MI5 AOL 701 GCHQ