
On Mon, Jun 8, 2009 at 3:39 PM, Henning
Thielemann
I think you could use the parser as it is and do the name parsing later. Due to lazy evaluation both parsers would run in an interleaved way.
I've been trying to figure out how to get this to work with lazy evaluation, but haven't made much headway. Tips? The only way I can think of to get incremental parsing working is to maintain explicit state, but I also can't figure out how to achieve this with the parsers I've tested (HaXml, HXT, hexpat). Here's a working example of what I'm trying to do, in Python. It reads XML from stdin, prints events as they are parsed, and will terminate when the document ends: ########################## from xml.sax import handler, saxutils, expatreader class ContentHandler (handler.ContentHandler): def __init__ (self): self.events = [] self.level = 0 def startElementNS (self, ns_name, lname, attrs): self.events.append (("BEGIN", ns_name, lname, dict (attrs))) self.level += 1 def endElementNS (self, ns_name, lname): self.events.append (("END", ns_name, lname)) self.level -= 1 def characters (self, content): self.events.append (("TEXT", content)) def main (): parser = expatreader.ExpatParser () content = ContentHandler () parser.setFeature (handler.feature_namespaces, True) parser.setContentHandler (content) got_events = False while content.level > 0 or (not got_events): text = raw_input ("Enter XML:\n") parser.feed (text) print content.events content.events = [] got_events = True if __name__ == "__main__": main() ############################### $ python incremental.py Enter XML: <test xmlns="urn:test"><test2><test3> [('BEGIN', (u'urn:test', u'test'), u'test', {}), ('BEGIN', (u'urn:test', u'test2'), u'test2', {}), ('BEGIN', (u'urn:test', u'test3'), u'test3', {})] Enter XML: </test3></test2><test2 a="b"/>text content goes here [('END', (u'urn:test', u'test3'), None), ('END', (u'urn:test', u'test2'), None), ('BEGIN', (u'urn:test', u'test2'), u'test2', {(None, u'a'): u'b'}), ('END', (u'urn:test', u'test2'), None), ('TEXT', u'text content goes here')] Enter XML: </test> [('END', (u'urn:test', u'test'), None)] ############################# As demonstrated, the parser retains state (namespaces, nesting) between text inputs. Are there any XML parsers for Haskell that support this incremental behavior?