This is how I did it using the HXT library :

Prelude Text.XML.HXT.Parser.XmlParsec Text.XML.HXT.Arrow.XmlIOStateArrow Text.XML.HXT.Arrow> runX (readString [] "<tag>123</tag>" >>> getXPathTrees "tag" >>> getChildren >>> getText)
["123"]


Everything after "Prelude" upto the first ">" is what you have to import to make this work.
-"readString" converts the input string into a internal representation of an XML tree
-"getXPathTrees" sets the path to all <tag>'s,
-"getChildren" narrows it down to the data between <tag> and </tag>,
-"getText" extracts all the data between those tags,
-"runX" fires up the whole process and returns the results as a list in the IO Monad.

hth,
deech

On Tue, Sep 29, 2009 at 2:25 PM, Robert Ziemba <rziemba@gmail.com> wrote:
I have been working with the regular expression package (Text.Regex.Posix).  My hope was to find a simple way to remove a pair of XML tags from a short string.  

I have something like this "<tag>Data</tag>" and would like to extract 'Data'.  There is only one tag pair, no nesting, and I know exactly what the tag is.  

My first attempt was this:  

  "<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String

result:  "123"

Upon further experimenting I realized that it only works with more than 2 digits in 'Data'.  I occured to me that my thinking on how this regular expression works was not correct - but I don't understand why it works at all for 3 or more digits. 

Can anyone help me understand this result and perhaps suggest another strategy?  Thank you.

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners