
MR K P Shupke wrote:
Can you give me an example where processing HAS to be in a different order?
No, but I can give an example where a list based encoding is less sufficient. I'm coming from a completely different view here, namely that of using Haskell not to query or transform but to create XML. I'm currently in a project[1] trying to fix a working implementation of Haskell Server Pages[2], i.e. using Haskell as a server-side scripting language for dynamic webpages. Clearly the most used mode of operation in this setting is constructing an XML tree rather than querying or transforming an existing one. With this in mind, the list based encoding would be less efficient than a treelike datatype. Building the XML tree is likely to be done bottom-up, and adding a parent to a subtree would be O(n) with a list, but O(1) with a tree datatype. Building a whole tree from the bottom up is O(n^2) for a list, but O(n) for the tree datatype. MR K P Shupke wrote:
The nested lists does not allow any extra flexibility in traversal, at the cost of greater complexity. The one case it is simpler is if you want to look up a single node by its 'tree address' - This is more complex with the list-rep, as you have to count nodes at a given level, skipping irrelevent nodes...
The fact that the list based representation is indeed more complex is in my opinion a clear indication that it should not be the standard. Compare this to Haskell strings where the standard is the simple but ineffective [Char], but where there is another more efficient representation, PackedString, if you really need it. Simplicity and ease of understanding is not to be taken lightly. By using a tree datatype representation you also get pattern matching directly on the structure of the tree, a very useful feature to have. MR K P Shupke wrote:
The list rep is very good at data extraction, if you want to extract a sub-tree of a given type you just ignore all tags until you find one of the correct type, then output all tags until the depth returns to the level of the initial tag.
That might possibly be a programmatic advantage, though I don't see how it would be more difficult with a tree datatype. In any case it is no more efficient, both are O(n). As far as I can tell there is no single area where the list representation would outperform a tree datatype, other than the special case of lazy streaming. My suggestions are thus: The standard representation should be a tree datatype, both because it is more general and efficiently supports a wide range of operation modes (construction, querying, transformation, rendering), and also because that is what lies closest to the intuition of XML. However, the ability to stream large XML structures lazily is clearly useful and is difficult to achieve using a tree datatype. Why not add it as a specialisation? The standard representation would be a datatype XML living in Data.XML (or similar), and the specialized would be a type XMLStream living in Data.XML.Stream. There should of course also be functions to convert between the types. Comment away, /Niklas [1] http://www.dtek.chalmers.se/~d00nibro/hsp/ [2] http://www.research.microsoft.com/~emeijer/Papers/HSP.pdf _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963