
correcting parser
I would think this is a rather specialized requirement. I certainly don't want a "correcting" parser for my work. But I can see that some applications might...
Especially if you are aiming for a genarally applicable library. What's really needed is a 'Strictness' switch for the parser, so you can select validating, or non-validating. The parser I wrote uses heuristsics that specify how to fix mistakes (like missing close tags) - this enables the XML parser to parse HTML with the correct set of rules.
This seems reasonable, and I'd expect a reasonable implementation (of a filter) to stream via lazy evaluation where that matches the final usage pattern. The outline I sketched (copied below) was intended to be built upon something like HaXML's filter idea, so that streaming processing would (in principle) be possible.
If you don't use a list based representation, the only other way to get 'lazy' behaviour
is to use the 'event' model - which is a lot more complex, but possible using one thread
to do the parsing, and a Channel to pass the events through.
I think you misunderstand what the parser and renderer do, the parser takes String input
and outputs a stream of elements based on the BNF specification for XML... It looks like:
data XmlElement = XMLDecl [XmlAttribute]
| DocType XmlTagName XmlSystemLiteral XmlPubidLiteral
| EmptyTag XmlTagName [XmlAttribute]
| STag XmlTagName [XmlAttribute]
| ETag XmlTagName
| Text [XmlElement]
| CharData String
| CharRef Int
| EntityRef XmlTagName
| PERef XmlTagName
| CDSect String
| PI XmlTagName String
| Comment String
| Flush
| Undefined
| Unparsed String deriving (Show,Eq)
The renderer takes a stream of these elements and converts to a String.
All the filtering/reading/writing is done on streams of these elements.
For example a filter could select only

At 15:37 14/05/04 +0100, MR K P SCHUPKE wrote:
If you don't use a list based representation, the only other way to get 'lazy' behaviour is to use the 'event' model - which is a lot more complex, but possible using one thread to do the parsing, and a Channel to pass the events through.
Surely a list isn't the *only* structure that will exhibit the required streaming behaviour; e.g. HaXML uses a tree structure built around this: [[ data Element = Elem Name [Attribute] [Content] data ElemTag = ElemTag Name [Attribute] -- ^ intermediate for parsing type Attribute = (Name, AttValue) data Content = CElem Element | CString Bool CharData -- ^ bool is whether whitespace is significant | CRef Reference | CMisc Misc ]] Provided the resulting tree of Element/Content values (ignoring the attribute "fluff" here) is traversed depth-first, left-to-right, I'd expect the details to be retrieved in the order that they are made accessible by a left-to-right parser ... at each active (i.e. partially evaluated) node in the tree, elements to the right of the most recently evaluated 'Content' value would be unevaluated. A parser that can generate a simple list of values in the style you describe could also progressively build a tree in this way. My view is that this allows an application designer to make the appropriate space/convenience trade-offs: non-sequential access is possible, at a cost of memory usage, by using different tree-traversal approach. Hmmm... I guess what I describe above depends rather on the "processed" part of the tree being garbage-collectable, which presumes that the application isn't "hanging on" to a reference to an "Element" value once it starts to process the next one. So I guess what I describe is possible, but needs to be handled carefully if the memory usage is to be contained. I think it should be possible to write a function that traverses the tree and returns a list of the kind you describe (using a technique like ShowS to build the list); as long as the calling program uses the resulting list sequentially, one at a time, then I'd expect the desired effect on memory usage. (Would this correspond to your "channel" noted above?) #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
participants (2)
-
Graham Klyne
-
MR K P SCHUPKE