
The following should work. The basic idea is:
* Try to parse a <person>
* If it's not a <person>, recursively try again.
{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Stream.Parse
import Data.Text (Text, unpack)
import Control.Monad (join)
import Data.Enumerator (Iteratee)
import Data.XML.Types (Event)
data Person = Person { age :: Int, name :: Text }
deriving Show
parsePerson :: Monad m => Iteratee Event m (Maybe [Person])
parsePerson = tagName "person" (requireAttr "age") $ \age -> do
name <- content
return [Person (read $ unpack age) name]
parseWrapper :: Monad m => Iteratee Event m (Maybe [Person])
parseWrapper =
parsePerson `orE`
(fmap . fmap) concat (tagPredicate (const True) ignoreAttrs (const
$ many parseWrapper))
main = parseFile_ def "people.xml" $ force "people required" parseWrapper
Michael
On Tue, Nov 8, 2011 at 7:28 AM, Sean Hess
Thanks all for your help so far. Using xml-enumerator, is there any way to parse the following xml, and ignore the people tag? In other words, can I parse it by only providing an Iteratee for Person, no matter where a <person> tag appears nested within a document?
<?xml version="1.0" encoding="utf-8"?> <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people>
On Nov 8, 2011, at 7:33 AM, Michael Snoyman wrote:
Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4 transition. I'll update later, and add a link to the blog post, and release a new version to Hackage.
On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess
wrote: Thanks so much to both of you that sent that link.
Sorry, my email totally wasn't clear. I meant that the example in the
package description doesn't
run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
I'll read through that article.
On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package:
http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover
the streaming interface, but it might give you a good overview of the
package in general. I'm not sure what you mean by "it doesn't run,"
but you'll need at least a basic understanding of enumerators to get
off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
wrote: I cannot seem to find a working example of xml-enumerator. It doesn't run:
the names seem to have changed for some things, and I'm too much of a
beginner to figure it out easily.
http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and
libxml-enumerator [2]. They are the SAX parsers you know from the
imperative world but much easier to write =). In particular, you
don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator
[2] http://hackage.haskell.org/package/libxml-enumerator
--
Felipe.
_______________________________________________
Beginners mailing list
Beginners@haskell.org