
correcting parser
I would think this is a rather specialized requirement. I certainly don't want a "correcting" parser for my work. But I can see that some applications might...
Especially if you are aiming for a genarally applicable library. What's really needed is a 'Strictness' switch for the parser, so you can select validating, or non-validating. The parser I wrote uses heuristsics that specify how to fix mistakes (like missing close tags) - this enables the XML parser to parse HTML with the correct set of rules.
This seems reasonable, and I'd expect a reasonable implementation (of a filter) to stream via lazy evaluation where that matches the final usage pattern. The outline I sketched (copied below) was intended to be built upon something like HaXML's filter idea, so that streaming processing would (in principle) be possible.
If you don't use a list based representation, the only other way to get 'lazy' behaviour
is to use the 'event' model - which is a lot more complex, but possible using one thread
to do the parsing, and a Channel to pass the events through.
I think you misunderstand what the parser and renderer do, the parser takes String input
and outputs a stream of elements based on the BNF specification for XML... It looks like:
data XmlElement = XMLDecl [XmlAttribute]
| DocType XmlTagName XmlSystemLiteral XmlPubidLiteral
| EmptyTag XmlTagName [XmlAttribute]
| STag XmlTagName [XmlAttribute]
| ETag XmlTagName
| Text [XmlElement]
| CharData String
| CharRef Int
| EntityRef XmlTagName
| PERef XmlTagName
| CDSect String
| PI XmlTagName String
| Comment String
| Flush
| Undefined
| Unparsed String deriving (Show,Eq)
The renderer takes a stream of these elements and converts to a String.
All the filtering/reading/writing is done on streams of these elements.
For example a filter could select only