
Hi! I'm trying to extract HCards (http://microformats.org/wiki/hcard) from HTML documents. HCard is a microformat. Microformats is an attempt to add semantic information to XML documents without adding any new tags. This is done by adding semantic information in class attributes instead (see the 'testXml' string below). I'm trying to find a good way to extract HCards into Haskell records. To do this I need to map XML elements with certain attribute values onto record fields. Some of the elements are optional in the XML and I represent that using Maybe fields in my record. The order of the elements is not guaranteed only the way they are nested. This makes it more difficult to first extract the fields I want into a list of Strings and then map that onto my record since I need to tag each string with the value it represents. So my question is. How can I write the function 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool for the job and I should be trying to walk the DOM tree instead?
module HCard where
import Text.XML.HXT.Arrow
data HCard = HCard { familyName :: String, givenName :: String org :: Maybe String url :: Maybe String } deriving Show
parseHCards xml = runX $ parseXml xml
parseXml xml = readString [(a_parse_html, v_1)] xml >>> deep (hasClassName "vcard") >>> extractElementsIntoRecords
extractElementsIntoRecords = undefined
hasClassName s = hasAttrValue "class" (elem s . words)
testXml = "
" ++ " http://tantek.com/\">" ++ "