
Robert,
On Tue, Sep 29, 2009 at 3:25 PM, Robert Ziemba
I have been working with the regular expression package (Text.Regex.Posix). My hope was to find a simple way to remove a pair of XML tags from a short string.
I have something like this "<tag>Data</tag>" and would like to extract 'Data'. There is only one tag pair, no nesting, and I know exactly what the tag is.
My first attempt was this:
"<tag>123</tag>" =~ "[^<tag>].+[^</tag>]"::String
result: "123"
Upon further experimenting I realized that it only works with more than 2 digits in 'Data'. I occured to me that my thinking on how this regular expression works was not correct - but I don't understand why it works at all for 3 or more digits.
Can anyone help me understand this result and perhaps suggest another strategy? Thank you.
The regex you are using here can be described as such: "Match a character not in the set '<,t,a,g,>', followed by 1 or more of anything, followed by a character not in the set '<,/,t,a,g,>'." Effectively, it will not match if your data has less than 3 characters and is probably not the correct regex for this job, i.e. it would also match "x123x". What you need is regex capturing, but I don't know if that is available in that regex library (I'm not an expert Haskeller). If you really need a regex to locate the tag, you could use a function like this to extract it: getTagData tag s = let match = s =~ ("<" ++ tag ++ ">.*" ++ tag ++ ">")::String dropTag = drop (length tag + 2) s getData = take (length match - (2 * length tag + 5)) dropTag in if length match > 0 then Just getData else Nothing *Main> getTagData "tag" "<tag>123</tag>" Just "123" Patrick
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada