remove XML tags using Text.Regex.Posix

29 Sep 2009

      I have been working with the regular expression package (Text.Regex.Posix).
 My hope was to find a simple way to remove a pair of XML tags from a short
string.

I have something like this "<tag>Data</tag>" and would like to extract
'Data'.  There is only one tag pair, no nesting, and I know exactly what the
tag is.

My first attempt was this:

  "<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String

result:  "123"

Upon further experimenting I realized that it only works with more than 2
digits in 'Data'.  I occured to me that my thinking on how this regular
expression works was not correct - but I don't understand why it works at
all for 3 or more digits.

Can anyone help me understand this result and perhaps suggest another
strategy?  Thank you.

Robert Ziemba

Colin Paul Adams

Patrick LeBoutillier

Magnus Therning

Lyndon Maydwell

Magnus Therning

aditya siram

Christian Maeder

tags

participants (7)