gettext =  (many1 $ noneOf "><") >>= (return . Body)

works for your case.



On Thu, Jul 19, 2012 at 6:37 PM, Christian Maeder <Christian.Maeder@dfki.de> wrote:
Am 19.07.2012 14:53, schrieb C K Kashyap:

Dear gentle Haskellers,

I was trying to whet my Haskell by trying out Parsec today to try and
parse out XML. Here's the code I cam up with -

I wanted some help with the "gettext" parser that I've written. I had to
do a dummy "char '  ') in there just to satisfy the "many" used in the
xml parser. I'd appreciate it very much if someone could give me some
feedback.

You don't want empty bodies! So use many1 in gettext.

  gettext = fmap Body $ many1 $ letter <|> digit

If you have spaces in your bodies, skip them or allow them with
noneOf "<".

HTH Christian



data XML =  Node String [XML]
           | Body String deriving Show

gettext = do
              x <- many (letter <|> digit )
              if (length x) > 0 then
                 return (Body x)
              else (char ' ' >> (return $ Body ""))

xml :: Parser XML
xml = do {
           name <- openTag
         ; innerXML <- many innerXML
         ; endTag name
         ; return (Node name innerXML)
          }

innerXML = do
          x <- (try xml <|> gettext)
          return x

openTag :: Parser String
openTag = do
         char '<'
         content <- many (noneOf ">")
         char '>'
         return content

endTag :: String -> Parser String
endTag str = do
         char '<'
         char '/'
         string str
         char '>'
         return str

h1 = parse xml "" "<a>A</a>"
h2 = parse xml "" "<a><b>A</b></a>"
h3 = parse xml "" "<a><b><c></c></b></a>"
h4 = parse xml "" "<a><b></b><c></c></a>"

Regards,
Kashyap


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
I drink I am thunk.