
On 09/13/2011 05:15 PM, Malcolm Wallace wrote:
I am the first to admit that HaXml's documentation is not as good as it could be, and I am sorry that you have had a bad experience.
Sorry for the tirade =) That was a while ago, but I definitely felt some sympathy for the guy in the quote.
One thing I am puzzled about, is just how extremely difficult it must be, to click on "Detailed documentation of the HaXml APIs" from the HaXml homepage, look for a moment until you see "Text.XML.HaXml.Parse" in the list of modules, click on it, and find, right at the top of the page, a function that parses a String into an XML document tree.
As someone who just wants to parse an XML file, here's what happens. First, I click on the API docs. I'm presented with a list: * Text o XML + Text.XML.HaXml # Text.XML.HaXml.ByteStringPP # Text.XML.HaXml.Combinators # DtdToHaskell * Text.XML.HaXml.DtdToHaskell.Convert * Text.XML.HaXml.DtdToHaskell.Instance * Text.XML.HaXml.DtdToHaskell.TypeDef # Text.XML.HaXml.Escape # Html * Text.XML.HaXml.Html.Generate * Text.XML.HaXml.Html.Parse * Text.XML.HaXml.Html.ParseLazy * Text.XML.HaXml.Html.Pretty # Text.XML.HaXml.Lex # Text.XML.HaXml.Namespaces # Text.XML.HaXml.OneOfN # Text.XML.HaXml.Parse # Text.XML.HaXml.ParseLazy # Text.XML.HaXml.Posn # Text.XML.HaXml.Pretty # Text.XML.HaXml.SAX # Schema * Text.XML.HaXml.Schema.Environment * Text.XML.HaXml.Schema.HaskellTypeModel * Text.XML.HaXml.Schema.NameConversion * Text.XML.HaXml.Schema.Parse * Text.XML.HaXml.Schema.PrettyHaskell * Text.XML.HaXml.Schema.PrimitiveTypes * Text.XML.HaXml.Schema.Schema * Text.XML.HaXml.Schema.TypeConversion * Text.XML.HaXml.Schema.XSDTypeModel # Text.XML.HaXml.ShowXmlLazy # Text.XML.HaXml.TypeMapping # Text.XML.HaXml.Types # Text.XML.HaXml.Util # Text.XML.HaXml.Validate # Text.XML.HaXml.Verbatim # Text.XML.HaXml.Wrappers # Text.XML.HaXml.XmlContent * Text.XML.HaXml.XmlContent.Haskell * Text.XML.HaXml.XmlContent.Parser # Xtract * Text.XML.HaXml.Xtract.Combinators * Text.XML.HaXml.Xtract.Lex * Text.XML.HaXml.Xtract.Parse Jesus! /You/ know that I want to look in Text.XML.HaXml.Parse, but /I/ don't. Let's say I choose the first link: Text.XML.HaXml. It's a list of modules, along with their documentation. All blank! Hitting the back button. The first thing I notice is that there seems to be specialized parser modules for different content types, e.g. Text.XML.HaXml.Html.Parse. Maybe I want Text.XML.HaXml.Schema.Parse? I mean, I want to parse something with a schema, right? Nope, it's for parsing XSDs. How about Text.XML.HaXml.Util? This looks right... Only a small module containing some helper functions to extract xml content - I would have added this to Types but I've put it into an additional module - to avoid circular references (Verbatim - Types) and it's got a function called docContent which is supposed to "Get the main element of the document..." Great. Its type is, docContent :: i -> Document i -> Content i so now, to have any hope of using this function (or figure out that I'm in the wrong place entirely), I have to go figure out what those types are. Document has one constructor, Document Prolog (SymTab EntityDef) (Element i) [Misc] which leads me to, Prolog (Maybe XMLDecl) [Misc] (Maybe DocTypeDecl) [Misc] XMLDecl VersionInfo (Maybe EncodingDecl) (Maybe SDDecl) type VersionInfo = String newtype EncodingDecl = EncodingDecl String type SDDecl = Bool data Misc = Comment Comment | PI ProcessingInstruction type Comment = String type ProcessingInstruction = (PITarget, String) type PITarget = String data DocTypeDecl = DTD QName (Maybe ExternalID) [MarkupDecl] data QName = N Name | QN Namespace Name type Name = String data Namespace = Namespace {nsPrefix :: String, nsURI :: String} data ExternalID = SYSTEM SystemLiteral | PUBLIC PubidLiteral SystemLiteral newtype SystemLiteral = SystemLiteral String newtype PubidLiteral = PubidLiteral String data MarkupDecl = Element ElementDecl | AttList AttListDecl | Entity EntityDecl | Notation NotationDecl | MarkupMisc Misc data ElementDecl = ElementDecl QName ContentSpec data ContentSpec = EMPTY | ANY | Mixed Mixed | ContentSpec CP ... ... most of which are completely undocumented. I have no idea what any of this stuff means! As a result, I don't know what the 'docContent' function does, or whether or not I'm even looking in the right place. At this point, I'm probably googling for blog entries and wondering why I'm wasting my time when all I really need is a "hello, world" example. If by some miracle I do discover Text.XML.HaXml.Parse.xmlParse (do I want ParseLazy? What's the difference?) I can get myself a Document. Now what? Do I try to understand that giant type hierarchy above? There's nothing else in the Parse module that looks useful. All of the good stuff, it turns out, is in the ambiguously-named Text.XML.HaXml.Combinators. Ok, the paper helps a little bit here, if you want to include a few years of college as a prerequisite for parsing XML. There are some things here that look promising, like 'elm' and 'tag'. However, they all have mysterious types: elm, txt :: CFilter i What's a CFilter? type CFilter i = Content i -> [Content i] The Content type actually contains words I recognize! Awesome! But wait, I don't have Content! I have a Document! How do I get Content out of my Document? Argh... This is bringing back bad memories =)
In fact, my wish as a library author would be: please tell me what you, as a beginner to this library, would like to do with it when you first pick it up? Then perhaps I could write a tutorial that answers the questions people actually ask, and tells them how to get the stuff done that they want to do. I have tried writing documentation, but it seems that people do not know how to find, or use it. Navigating an API you do not know is hard. I'd like to signpost it better.
I was trying to parse user timelines from the Twitter API. I threw away most stuff, but wanted to go through the tree and extract the name, body, date, etc. from the individual entries. What's really missing in my opinion is an overview of how everything fits together, along with examples. There are a couple "big" types that you need to know to use the library. Document, Content, and CFilter come to mind. All of those should be well-documented: * What do they represent? * How do they fit together? * Where can I get them, i.e. what functions produce them? * What can I do with them? The examples don't need to be too complicated. How to read/write a file, how to get an element's name, attributes, and text, etc. Anything is better than nothing. Most of the examples in blog posts and other people's code are out of date; while the differences may be small, a new user has no way of knowing that. GHC is just going to throw a type error that may as well be Chinese.