Unescaping with HaXmL (or anything else!)

I want to unescape an encoded XML or HTML string, e.g. converting " to the quote character, etc. Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no luck, e.g. with HaXml 1.19.1: let (CString _ s _) = head $ xmlUnEscapeContent stdXmlEscaper $ [CString False "This is a "quoted string"" ()] in s The result is unchanged, i.e. "This is a "quoted string"". Am I doing something wrong, or are my expectations wrong, or is this a bug? Or, is there any other library that includes a simple unescape function for XML or HTML? (The Network.URI module includes an unescape function, but that's specific to URIs, naturally.) Anton

On Thu, 27 Mar 2008, Anton van Straaten wrote:
I want to unescape an encoded XML or HTML string, e.g. converting " to the quote character, etc.
Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no luck, e.g. with HaXml 1.19.1:
let (CString _ s _) = head $ xmlUnEscapeContent stdXmlEscaper $ [CString False "This is a "quoted string"" ()] in s
The result is unchanged, i.e. "This is a "quoted string"".
Am I doing something wrong, or are my expectations wrong, or is this a bug?
Or, is there any other library that includes a simple unescape function for XML or HTML?
Tagsoup must contain such a function but it doesn't seem to export it.

On Fri, Mar 28, 2008 at 4:26 AM, Anton van Straaten wrote:
I want to unescape an encoded XML or HTML string, e.g. converting " to the quote character, etc. Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no luck
Hi Anton, I only noticed your post today, sorry for the delay. I also need this. In fact, it seems to me that it would be generally useful. I hope that simple functions to escape/unescape a string will be added to the API. In the meantime, you are right that it is a bit tricky to do this in HaXml. Besides the wrappers that you found to be needed, there are two other issues: One issue is that you need to lex and then parse the text first. If you tell HaXml that your string is a CString, it will believe you and just use the text the way it is without any further processing. The other issue is that HaXml's lexer currently can only deal with XML content that begins with an XML tag. (I've pointed this out to Malcolm Wallace, the author of HaXml.) So in order to use it, you need to wrap your content in a tag and then unwrap it after parsing. The code below works for me (obviously it would be better to remove the "error" calls): Regards, Yitz import Text.XML.HaXml import Text.XML.HaXml.Parse (xmlParseWith, document) import Text.XML.HaXml.Lex (xmlLex) unEscapeXML :: String -> String unEscapeXML = concatMap ctext . xmlUnEscapeContent stdXmlEscaper . unwrapTag . either error id . fst . xmlParseWith document . xmlLex "oops, lexer failed" . wrapWithTag "t" where ctext (CString _ txt _) = txt ctext (CRef (RefEntity name) _) = '&' : name ++ ";" -- skipped by escaper ctext (CRef (RefChar num) _) = '&' : '#' : show num ++ ";" -- ditto ctext _ = error "oops, can't unescape non-cdata" wrapWithTag t s = concat ["<", t, ">", s, "", t, ">"] unwrapTag (Document _ _ (Elem _ _ c) _) = c unwrapTag _ = error "oops, not wrapped"
participants (3)
-
Anton van Straaten
-
Henning Thielemann
-
Yitzchak Gale