
I want to parse a large xml file (2GB), without putting the whole thing into memory. It's pretty simple with a sax parser in most languages, you just stream bytes to the sax parser, and wait for sax events. Here's what I think the equivalent is in Haskell - https://gist.github.com/1346854 Is the xml file being read lazily? It seems lazy, but it also seems like all the sax events would be loaded into memory. If not, how is that possible? In order to be lazy, it seems like parse would have to be an impure function, so that it could back to the disk to get more stuff. ~sean

On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
I want to parse a large xml file (2GB), without putting the whole thing into memory. It's pretty simple with a sax parser in most languages, you just stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and libxml-enumerator [2]. They are the SAX parsers you know from the imperative world but much easier to write =). In particular, you don't need to rely on lazyness. Cheers, [1] http://hackage.haskell.org/package/xml-enumerator [2] http://hackage.haskell.org/package/libxml-enumerator -- Felipe.

I cannot seem to find a working example of xml-enumerator. It doesn't run: the names seem to have changed for some things, and I'm too much of a beginner to figure it out easily. http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/... On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into memory. It's pretty simple with a sax parser in most languages, you just stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and libxml-enumerator [2]. They are the SAX parsers you know from the imperative world but much easier to write =). In particular, you don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator [2] http://hackage.haskell.org/package/libxml-enumerator
-- Felipe.

Here's a blog post on the package:
http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover
the streaming interface, but it might give you a good overview of the
package in general. I'm not sure what you mean by "it doesn't run,"
but you'll need at least a basic understanding of enumerators to get
off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
I cannot seem to find a working example of xml-enumerator. It doesn't run: the names seem to have changed for some things, and I'm too much of a beginner to figure it out easily. http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and libxml-enumerator [2]. They are the SAX parsers you know from the imperative world but much easier to write =). In particular, you don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator [2] http://hackage.haskell.org/package/libxml-enumerator
-- Felipe.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Thanks so much to both of you that sent that link. Sorry, my email totally wasn't clear. I meant that the example in the package description doesn't run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/... I'll read through that article. On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package: http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover the streaming interface, but it might give you a good overview of the package in general. I'm not sure what you mean by "it doesn't run," but you'll need at least a basic understanding of enumerators to get off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
wrote: I cannot seem to find a working example of xml-enumerator. It doesn't run: the names seem to have changed for some things, and I'm too much of a beginner to figure it out easily. http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and libxml-enumerator [2]. They are the SAX parsers you know from the imperative world but much easier to write =). In particular, you don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator [2] http://hackage.haskell.org/package/libxml-enumerator
-- Felipe.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4
transition. I'll update later, and add a link to the blog post, and
release a new version to Hackage.
On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess
Thanks so much to both of you that sent that link. Sorry, my email totally wasn't clear. I meant that the example in the package description doesn't run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/... I'll read through that article.
On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package: http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover the streaming interface, but it might give you a good overview of the package in general. I'm not sure what you mean by "it doesn't run," but you'll need at least a basic understanding of enumerators to get off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
wrote: I cannot seem to find a working example of xml-enumerator. It doesn't run:
the names seem to have changed for some things, and I'm too much of a
beginner to figure it out easily.
http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and
libxml-enumerator [2]. They are the SAX parsers you know from the
imperative world but much easier to write =). In particular, you
don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator
[2] http://hackage.haskell.org/package/libxml-enumerator
--
Felipe.
_______________________________________________
Beginners mailing list
Beginners@haskell.org

Thanks all for your help so far. Using xml-enumerator, is there any way to parse the following xml, and ignore the people tag? In other words, can I parse it by only providing an Iteratee for Person, no matter where a <person> tag appears nested within a document?
<?xml version="1.0" encoding="utf-8"?> <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people>
On Nov 8, 2011, at 7:33 AM, Michael Snoyman wrote:
Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4 transition. I'll update later, and add a link to the blog post, and release a new version to Hackage.
On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess
wrote: Thanks so much to both of you that sent that link. Sorry, my email totally wasn't clear. I meant that the example in the package description doesn't run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/... I'll read through that article.
On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package: http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover the streaming interface, but it might give you a good overview of the package in general. I'm not sure what you mean by "it doesn't run," but you'll need at least a basic understanding of enumerators to get off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
wrote: I cannot seem to find a working example of xml-enumerator. It doesn't run:
the names seem to have changed for some things, and I'm too much of a
beginner to figure it out easily.
http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and
libxml-enumerator [2]. They are the SAX parsers you know from the
imperative world but much easier to write =). In particular, you
don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator
[2] http://hackage.haskell.org/package/libxml-enumerator
--
Felipe.
_______________________________________________
Beginners mailing list
Beginners@haskell.org

The following should work. The basic idea is:
* Try to parse a <person>
* If it's not a <person>, recursively try again.
{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Stream.Parse
import Data.Text (Text, unpack)
import Control.Monad (join)
import Data.Enumerator (Iteratee)
import Data.XML.Types (Event)
data Person = Person { age :: Int, name :: Text }
deriving Show
parsePerson :: Monad m => Iteratee Event m (Maybe [Person])
parsePerson = tagName "person" (requireAttr "age") $ \age -> do
name <- content
return [Person (read $ unpack age) name]
parseWrapper :: Monad m => Iteratee Event m (Maybe [Person])
parseWrapper =
parsePerson `orE`
(fmap . fmap) concat (tagPredicate (const True) ignoreAttrs (const
$ many parseWrapper))
main = parseFile_ def "people.xml" $ force "people required" parseWrapper
Michael
On Tue, Nov 8, 2011 at 7:28 AM, Sean Hess
Thanks all for your help so far. Using xml-enumerator, is there any way to parse the following xml, and ignore the people tag? In other words, can I parse it by only providing an Iteratee for Person, no matter where a <person> tag appears nested within a document?
<?xml version="1.0" encoding="utf-8"?> <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people>
On Nov 8, 2011, at 7:33 AM, Michael Snoyman wrote:
Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4 transition. I'll update later, and add a link to the blog post, and release a new version to Hackage.
On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess
wrote: Thanks so much to both of you that sent that link.
Sorry, my email totally wasn't clear. I meant that the example in the
package description doesn't
run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
I'll read through that article.
On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
Here's a blog post on the package:
http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover
the streaming interface, but it might give you a good overview of the
package in general. I'm not sure what you mean by "it doesn't run,"
but you'll need at least a basic understanding of enumerators to get
off the ground.
On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess
wrote: I cannot seem to find a working example of xml-enumerator. It doesn't run:
the names seem to have changed for some things, and I'm too much of a
beginner to figure it out easily.
http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and
libxml-enumerator [2]. They are the SAX parsers you know from the
imperative world but much easier to write =). In particular, you
don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator
[2] http://hackage.haskell.org/package/libxml-enumerator
--
Felipe.
_______________________________________________
Beginners mailing list
Beginners@haskell.org

Check out this chapter of the yesod book for good examples:
http://www.yesodweb.com/blog/2011/10/xml-enumerator
On Tue, Nov 8, 2011 at 8:38 AM, Sean Hess
I cannot seem to find a working example of xml-enumerator. It doesn't run: the names seem to have changed for some things, and I'm too much of a beginner to figure it out easily. http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/...
On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess
wrote: I want to parse a large xml file (2GB), without putting the whole thing into
memory. It's pretty simple with a sax parser in most languages, you just
stream bytes to the sax parser, and wait for sax events.
I recommend you taking a look at xml-enumerator [1] and libxml-enumerator [2]. They are the SAX parsers you know from the imperative world but much easier to write =). In particular, you don't need to rely on lazyness.
Cheers,
[1] http://hackage.haskell.org/package/xml-enumerator [2] http://hackage.haskell.org/package/libxml-enumerator
-- Felipe.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (4)
-
David McBride
-
Felipe Almeida Lessa
-
Michael Snoyman
-
Sean Hess