[Haskell-cafe]

14 May 2004

...
...
correcting parser
...
I would think this is a rather specialized requirement.  I certainly don't 
want a "correcting" parser for my work.  But I can see that some 
applications might...
Especially if you are aiming for a genarally applicable library. What's really needed
is a 'Strictness' switch for the parser, so you can select validating, or non-validating.

The parser I wrote uses heuristsics that specify how to fix mistakes (like missing close
tags) - this enables the XML parser to parse HTML with the correct set of rules.
...
This seems reasonable, and I'd expect a reasonable implementation (of a 
filter) to stream via lazy evaluation where that matches the final usage 
pattern.  The outline I sketched (copied below) was intended to be built 
upon something like HaXML's filter idea, so that streaming processing would 
(in principle) be possible.
If you don't use a list based representation, the only other way to get 'lazy' behaviour
is to use the 'event' model - which is a lot more complex, but possible using one thread
to do the parsing, and a Channel to pass the events through. 

I think you misunderstand what the parser and renderer do, the parser takes String input
and outputs a stream of elements based on the BNF specification for XML... It looks like:

data XmlElement = XMLDecl [XmlAttribute] 
   | DocType XmlTagName XmlSystemLiteral XmlPubidLiteral
   | EmptyTag XmlTagName [XmlAttribute]
   | STag XmlTagName [XmlAttribute]
   | ETag XmlTagName
   | Text [XmlElement]
   | CharData String
   | CharRef Int
   | EntityRef XmlTagName
   | PERef XmlTagName
   | CDSect String 
   | PI XmlTagName String
   | Comment String
   | Flush
   | Undefined
   | Unparsed String deriving (Show,Eq)

The renderer takes a stream of these elements and converts to a String.

All the filtering/reading/writing is done on streams of these elements.

For example a filter could select only  records from an XML
data source, the person specific reader would then convert to a specific
representation of a Person...

	myReadey :: [(XmlTreeDepth,XmlElement)] -> [Person]

a writer does the opposite.

	Regards,
	Keean.

[Haskell-cafe]

MR K P SCHUPKE