Re: [Haskell-beginners] Laziness and XML parsing

9 Nov 2011

      The following should work. The basic idea is:

* Try to parse a <person>
* If it's not a <person>, recursively try again.

{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Stream.Parse
import Data.Text (Text, unpack)
import Control.Monad (join)
import Data.Enumerator (Iteratee)
import Data.XML.Types (Event)

data Person = Person { age :: Int, name :: Text }
    deriving Show

parsePerson :: Monad m => Iteratee Event m (Maybe [Person])
parsePerson = tagName "person" (requireAttr "age") $ \age -> do
    name <- content
    return [Person (read $ unpack age) name]

parseWrapper :: Monad m => Iteratee Event m (Maybe [Person])
parseWrapper =
    parsePerson `orE`
    (fmap . fmap) concat (tagPredicate (const True) ignoreAttrs (const
$ many parseWrapper))

main = parseFile_ def "people.xml" $ force "people required" parseWrapper

Michael

On Tue, Nov 8, 2011 at 7:28 AM, Sean Hess  wrote:
...
Thanks all for your help so far. Using xml-enumerator, is there any way to
parse the following xml, and ignore the people tag? In other words, can I
parse it by only providing an Iteratee for Person, no matter where a
<person> tag appears nested within a document?
<?xml version="1.0" encoding="utf-8"?>
 <people>
     <person age="25">Michael</person>
     <person age="2">Eliezer</person>
 </people>
On Nov 8, 2011, at 7:33 AM, Michael Snoyman wrote:
Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4
transition. I'll update later, and add a link to the blog post, and
release a new version to Hackage.
On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess  wrote:
Thanks so much to both of you that sent that link.
Sorry, my email totally wasn't clear. I meant that the example in the
package description doesn't
run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
I'll read through that article.
On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package:
http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover
the streaming interface, but it might give you a good overview of the
package in general. I'm not sure what you mean by "it doesn't run,"
but you'll need at least a basic understanding of enumerators to get
off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess  wrote:
I cannot seem to find a working example of xml-enumerator. It doesn't run:
the names seem to have changed for some things, and I'm too much of a
beginner to figure it out easily.
http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess  wrote:
I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and
libxml-enumerator [2].  They are the SAX parsers you know from the
imperative world but much easier to write =).  In particular, you
don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator
[2] http://hackage.haskell.org/package/libxml-enumerator
--
Felipe.
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners