Announcement: Typeful [x]html combinators -- pre-release 0

... or, "whoops! I've writtten another html combinator library." History: I was surprised to find that all the Haskell html generating stuff I've tried allowed one to construct invalid HTML. I thought to myself that this might make an undergraduate project, but then I thought "maybe it's too hard? I'd better write some of it to see how hard it is." After a while I decided that it was probably a bit too big for an undergraduate project, but by that time I'd written too much of it to throw away, and just had to keep going so here it is. I'm announcing it here because I hope the audience is fairly small but discerning, and I'm announcing it at all because I've run out of steam for the moment. This is the first code above the size of a nonce-programme that I've written since I got ill years ago, so don't go too hard on me! I still am ill, so don't expect swift responses either. I'm mainly interested in opinions on the questions below, and in test cases that fail validation. You can get it with darcs get --partial http://homepage.ntlworld.com/jon.fairbairn/Typeful/Text/HTMLs and the documentation (such as it is) is at http://homepage.ntlworld.com/jon.fairbairn/Typeful/Text/HTMLs/Documentation/... Here's the announcement (which might better be called the release note, except that I'm not sure this counts as a relase). ------ Typeful HTMLs library Library of types and combinators to produce valid xhtml 1.0 (and eventually html 4.01) -- if you use this library (and if it's not faulty), you won't need to run the validator over the output of your code. Types prevent invalid nesting of elements (including the prohibitions of appendix B of the xhtml standard) and ensure an attribute is only passed to elements that has it. Most attributes are given types that reflect their content rather than just being strings (although some are just wrappers around strings for the moment). Includes full set of xhtml character entity names Includes type with names for all IANA-registered character sets. Provides the option of generating output in a choice of character sets (currently with the wide choice of utf_8, us_ascii, latin1 or iso_8859_5 (cyrillic)). Provides the option of generating xhtml without an xml version declaration (to avoid putting IE into quirks mode). It's generated by TemplateHaskell, but otherwise Haskell98. Questions: * I want to add a function to produce HTML 4.01. Unfortunately there are trivial differences (body takes a nonempty list in HTML but an ordinary list in xhtml, and similar). One approach (and the one I'd choose if left to my own devices) would be to restrict the document tree to the common subset, so the types would be constrained by whichever language was the more restrictive in any given case. An alternative would be to transform the xhtml tree by adding empty div elements. Which do people prefer? * Having a monad inside the result types of the *_allowed_in classes complicates the types, complicates programmatic generation of elements and makes some error messages much worse. It wouldn't be necessary if we simply wrote Haskell lists instead of using the +++ operator, ie instead of body << (h1 << ... +++ p << ... +++ p << ...) we had to write body << [h1 << ..., p << ..., p << ...] Admittedly, it is rather more awkward for nonempty lists: ul << (li <<... +++ li<< ... +++ li<<...) would have to be written ul << (li<<... +:[li<<...,li<<...]). Would anyone really be bothered by that? (Either that, or can someone come up with a way of rearranging the types so that +++ works without the monad in the result types of elements?) * What should I do about REQUIRED attributes? Some alternatives are listed in the TODO file. * The preference carrying monad used for Render is backwards; I tried making it go forwards, but the Laziness test programme took about 1.7 times as long to run. I'd like to put some state in there, which can't be done backwards. Can I do this somehow without the performance hit? * This pre release has a very restrictive license (barely falling short of "You may not use this!") My preference is, I think, to release it under LGPL. Is that going to cause people undue grief? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Jon Fairbairn wrote:
... or, "whoops! I've writtten another html combinator library." History: I was surprised to find that all the Haskell html generating stuff I've tried allowed one to construct invalid HTML...
This is a wonderful idea, and it is clear that you have done some serious work here.
I'm announcing it here because I hope the audience is fairly small but discerning
I believe that the audience could be large. As for discerning, that remains to be seen. :) What you are doing here is enforcing a DTD using the Haskell type system. Apart from your plans for HTML 4.01, perhaps a better generalization would be to generate typeful combinators that could validate at compile time for any given XML DTD. That would be hard. But now that you have fully worked out the first special case, a good chunk of the work has already been done. Regards, Yitz

"Yitzchak Gale"
Jon Fairbairn wrote:
... or, "whoops! I've writtten another html combinator library." History: I was surprised to find that all the Haskell html generating stuff I've tried allowed one to construct invalid HTML...
This is a wonderful idea, and it is clear that you have done some serious work here.
Thanks. It certainly felt that way... in case anyone reading my announcement thought otherwise, the library is pretty much complete, but there are a few design decisions that need more heads than just my own. [I've thought of another one: should I group together the prohibited elements corresponding to %pre.exclusion, %formctrl (in the html dtd) to reduce the number of type arguments?]
I'm announcing it here because I hope the audience is fairly small but discerning
I believe that the audience could be large. As for discerning, that remains to be seen. :)
:-) Well, I mean here on the libraries list. If I get some feedback from here I'll announce it on one of the other Haskell lists.
What you are doing here is enforcing a DTD using the Haskell type system. Apart from your plans for HTML 4.01, perhaps a better generalization would be to generate typeful combinators that could validate at compile time for any given XML DTD.
I think that would be the job of something like HaXml or HXT, and there are plenty of good people working on them. To clarify, what these types enforce is something stronger than an XML DTD, more like the SGML DTD of html (and as I write this I am inclining myself further towards making the document tree types fit the common subset); it enforces the restrictions that are only expressed in prose in the xhtml1.0 standard. You could think of it as enforcing a schema (that the w3c didn't provide). The reason for wanting to produce HTML4.01 is that it's widely understood by current browsers, and serving xhtml as html is (IMHO) rather questionable (the sets of attributes aren't the same for one thing). I only started with xhtml because I could use HaXml to get it off the ground. As to other future work, I'm more inclined to try to push typefulness out into other libraries that this one could use, such as Network.URI. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Wed, 2007-12-19 at 15:06 +0000, Jon Fairbairn wrote:
"Yitzchak Gale"
writes: What you are doing here is enforcing a DTD using the Haskell type system. Apart from your plans for HTML 4.01, perhaps a better generalization would be to generate typeful combinators that could validate at compile time for any given XML DTD.
I think that would be the job of something like HaXml or HXT, and there are plenty of good people working on them. To clarify, what these types enforce is something stronger than an XML DTD, more like the SGML DTD of html (and as I write this I am inclining myself further towards making the document tree types fit the common subset); it enforces the restrictions that are only expressed in prose in the xhtml1.0 standard. You could think of it as enforcing a schema (that the w3c didn't provide).
It seems that they did actually[1]. Making a program for generating ADT's and combinators from XML schemas to provide type safe XML is something i'd very much like to do someday. One thing i'm uncertain of is whether supporting namespaces would be a problem or not. It would be a shame to have a bunch of generated XML-standards in Haskell that can't be used together (ie. embedding MathML or SVG in XHTML). Mattias 1: http://www.w3.org/TR/xhtml1-schema/

Mattias Bengtsson
On Wed, 2007-12-19 at 15:06 +0000, Jon Fairbairn wrote:
I think that would be the job of something like HaXml or HXT, and there are plenty of good people working on them. To clarify, what these types enforce is something stronger than an XML DTD, more like the SGML DTD of html (and as I write this I am inclining myself further towards making the document tree types fit the common subset); it enforces the restrictions that are only expressed in prose in the xhtml1.0 standard. You could think of it as enforcing a schema (that the w3c didn't provide).
It seems that they did actually[1].
Thanks for finding that (I do hate the way that the W3C issues proper formal definitions of things but makes the informal one the normative one); I can use it to check attribute types and so on. Does it enforce the nesting restrictions (described in app. B of the xhtml1 dtd)? Looking through it with little knowledge of schemas, I can't see that it does.
Making a program for generating ADT's and combinators from XML schemas to provide type safe XML is something i'd very much like to do someday.
It would be a good thing to do, though not something I'm thinking of just now (a bit too much for me at the moment).
One thing i'm uncertain of is whether supporting namespaces would be a problem or not. It would be a shame to have a bunch of generated XML-standards in Haskell that can't be used together (ie. embedding MathML or SVG in XHTML).
Yes. I'd like the document tree in the typeful HTMLs library to be compatible with such things, but again, I'm not up to doing the big concept stuff. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Tue, 2008-01-08 at 10:31 +0000, Jon Fairbairn wrote:
Thanks for finding that (I do hate the way that the W3C issues proper formal definitions of things but makes the informal one the normative one); I can use it to check attribute types and so on. Does it enforce the nesting restrictions (described in app. B of the xhtml1 dtd)? Looking through it with little knowledge of schemas, I can't see that it does.
Yes this is really frustrating. I think the schema is supposed to enforce nesting restrictions. At least that's how i interpret paragraph 1.3[1] in the Note. It doesn't seem to be able to enforce everything though (eg. the legend example in 1.3).
Making a program for generating ADT's and combinators from XML schemas to provide type safe XML is something i'd very much like to do someday.
It would be a good thing to do, though not something I'm thinking of just now (a bit too much for me at the moment).
Same here i think. It's on my TODO though. :) Mattias 1: http://www.w3.org/TR/xhtml1-schema/#why

Mattias Bengtsson
On Tue, 2008-01-08 at 10:31 +0000, Jon Fairbairn wrote:
Thanks for finding that (I do hate the way that the W3C issues proper formal definitions of things but makes the informal one the normative one); [...]
Yes this is really frustrating.
Indeed!
I think the schema is supposed to enforce nesting restrictions. At least that's how i interpret paragraph 1.3[1] in the Note. It doesn't seem to be able to enforce everything though (eg. the legend example in 1.3).
You may be right. The schema is not the easiest thing in the world to read (whoever said that it was strange how people who complained for years that lisp had too many brackets now endorse XML had it right), but I think the "content model for exclusions" parts cover this. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Mattias Bengtsson
Making a program for generating ADT's and combinators from XML schemas to provide type safe XML is something i'd very much like to do someday. One thing i'm uncertain of is whether supporting namespaces would be a problem or not. It would be a shame to have a bunch of generated XML-standards in Haskell that can't be used together (ie. embedding MathML or SVG in XHTML).
How /can/ one check validity of namespace-using XML? Do you have a separate DTD (or Schema, if you think DTDs are too easy to read) for each namespace? If so, how do you specify where foreign namespaces are legal? If the top-level DTD specifies -- and thus needs to know, in advance, the details about -- the sub-namespaces, then what's the point of namespaces at all? I'm confused. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, 2008-01-09 at 20:52 +0100, Ketil Malde wrote:
How /can/ one check validity of namespace-using XML? Do you have a separate DTD (or Schema, if you think DTDs are too easy to read) for each namespace?
I'm not 100% sure but i believe so yes. XML Schemas enforces more properties than a DTD and are hence of more value (to me at least).
If so, how do you specify where foreign namespaces are legal? If the top-level DTD specifies -- and thus needs to know, in advance, the details about -- the sub-namespaces, then what's the point of namespaces at all?
This is exactly what i'm also having trouble understanding. Need to read more!
I'm confused.
So am i. :) Mattias

Mattias Bengtsson
If so, how do you specify where foreign namespaces are legal? If the top-level DTD specifies -- and thus needs to know, in advance, the details about -- the sub-namespaces, then what's the point of namespaces at all?
This is exactly what i'm also having trouble understanding. Need to read more!
I'm confused.
So am i. :)
Well - I just found some links. A schema can contain import statements, basically #include'ing other schemas: http://lists.xml.org/archives/xml-dev/200211/msg00880.html For DTD's, I think you'd have to write the DTD to use the prefix (I forget the proper name for it) and the local name - but I've lost the link that said so. So for both Schema and DTD validation, you'd need to know in advance all the tags you'd like to use, but Schema lets you more easily reuse 'chunks' of the document definition (or XML grammar). Stray thought: I'm currently working on a storage for various data types. Due to the variety, I'm considering XML rather than SQL, and it occurs to me that SQL relations are product types, while XML defines an algebraic data type. E.g., we can straightforwardly translate: <!ELEMENT element-name (child-name)> => data ElementName = ElementName ChildName <!ELEMENT element-name (child1,child2)> => data ElementName = ElementName Child1 Child2 <!ELEMENT element-name (child?)> => data ElementName = ElementName (Maybe Child) <!ELEMENT element-name (child*)> => data ElementName = ElementName [Child] <!ELEMENT element-name (child1|child2)> => data ElementName = ElementName1 Child1 | ElementName2 Child2 Well - presumably, something like that is exactly what you do to produce verifiable XML, right? In my mind, the 'algebraic' data types have always been a practical thing, but I suppose there is a formal framework where both Haskell data types and XML are unified as examples of 'algebraically complete' systems? Basica stuff I guess, it just never occurred to me to combine the terms 'universal algebra' and 'XML' before. :-) -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
If so, how do you specify where foreign namespaces are legal? If the top-level DTD specifies -- and thus needs to know, in advance, the details about -- the sub-namespaces, then what's the point of namespaces at all? Well - I just found some links. A schema can contain import statements, basically #include'ing other schemas: http://lists.xml.org/archives/xml-dev/200211/msg00880.html ...So for both Schema and DTD validation, you'd need to know in advance all the tags you'd like to use,
You can combine them using Schematron: http://www.schematron.com/overview.html It even allows you to mix validation languages, e.g., DTD, XML Schema, Relax NG, etc. -Yitz

On Tue, Dec 18, 2007 at 05:38:47PM +0000, Jon Fairbairn wrote:
... or, "whoops! I've writtten another html combinator library."
History: I was surprised to find that all the Haskell html generating stuff I've tried allowed one to construct invalid HTML.
Did you take a look at Peter Thiemann's WASH/HTML? It contains a monadic combinator library that checks proper nesting of HTML tags at compile time. The library also has an unchecked version, and the checked version doesn't seem to be used too often, even by the author himself. Last time I checked it still lacked the obvious rule to allow putting BODY in HTML, and I had to define that instance myself. But this is only a small wart, and the library proved to be quite useful for me.
I'm announcing it here because I hope the audience is fairly small but discerning, and I'm announcing it at all because I've run out of steam for the moment. This is the first code above the size of a nonce-programme that I've written since I got ill years ago, so don't go too hard on me!
I hope you won't consider my response hard ;-) Perhaps you could borrow some ideas from Peter's code, or vice versa. Best regards Tomasz

Tomasz Zielonka
On Tue, Dec 18, 2007 at 05:38:47PM +0000, Jon Fairbairn wrote:
... or, "whoops! I've writtten another html combinator library."
History: I was surprised to find that all the Haskell html generating stuff I've tried allowed one to construct invalid HTML.
Did you take a look at Peter Thiemann's WASH/HTML?
I did.
It contains a monadic combinator library that checks proper nesting of HTML tags at compile time. The library also has an unchecked version,
It's possible that I only tried the unchecked version: I just thought of an invalid example, read what documentation I could find and generated some invalid html. However, Thiemann's thesis says The current library implements neither inclusions nor exceptions. So I hope I might be forgiven if I overlooked a difference between the distribution and the thesis! Does the checked version now enforce appendix B and prevent <a> appearing anywhere within <a> and so on?
I'm announcing it here because I hope the audience is fairly small but discerning, and I'm announcing it at all because I've run out of steam for the moment. This is the first code above the size of a nonce-programme that I've written since I got ill years ago, so don't go too hard on me!
I hope you won't consider my response hard ;-)
No; and if WASH /can/ now enforce all the restrictions but isn't generally used that way, I think that's a pity (and making it easier to find how to do this would be a big improvement).
Perhaps you could borrow some ideas from Peter's code, or vice versa.
It turns out that the mechanism I've used to enforce the restrictions is almost the same as the one he mentions in the thesis as being too awkward, except that I don't in fact need 98 type parameters, just fourteen... and I disagree with his statement that moving to xhtml means that implementing the restrictions is unnecesary -- W3C says that the informal description is the normative definition, not the DTD. Another difference is that I haven't used any non-Haskell 98 constructs other than using Template Haskell to generate class declarations and instances (were one so inclined, one could get ghc to output the splices and [clean them up by hand to] produce an entirely H98 version). -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Tue, Jan 08, 2008 at 02:43:08PM +0000, Jon Fairbairn wrote:
It contains a monadic combinator library that checks proper nesting of HTML tags at compile time. The library also has an unchecked version,
It's possible that I only tried the unchecked version: I just thought of an invalid example, read what documentation I could find and generated some invalid html.
That's an easy trap to fall into, especially because the author seems to use only unchecked combinators in examples ;-)
However, Thiemann's thesis says
The current library implements neither inclusions nor exceptions.
So I hope I might be forgiven if I overlooked a difference between the distribution and the thesis! Does the checked version now enforce appendix B and prevent <a> appearing anywhere within <a> and so on?
I am not familiar with the HTML standards enough to understand everything you say here, but I've just checked that you can't put <a> inside <a> using the checked combinators. But I can't say if it checks everything your library checks.
Another difference is that I haven't used any non-Haskell 98 constructs other than using Template Haskell to generate class declarations and instances (were one so inclined, one could get ghc to output the splices and [clean them up by hand to] produce an entirely H98 version).
I can't think of any non-haskell 98 extensions that were used in WASH/HTML, but I may be overlooking something. At least the interface of WASH.HTML.HTMLMonad98 looks quite standard. Best regards Tomasz

Tomasz Zielonka
On Tue, Jan 08, 2008 at 02:43:08PM +0000, Jon Fairbairn wrote:
It contains a monadic combinator library that checks proper nesting of HTML tags at compile time. The library also has an unchecked version,
It's possible that I only tried the unchecked version: I just thought of an invalid example, read what documentation I could find and generated some invalid html.
That's an easy trap to fall into, especially because the author seems to use only unchecked combinators in examples ;-)
That's a bit of a strange thing to do. Having gone to the trouble of setting things up so that it can check (some degree of) validity, why not use it? Isn't this what Haskell is about? Exploring WaSH isn't made easy by the documentation; if I try build_document (html (head (title (text "foo")) ## (body (h1 (text "foo"))))) -- essentially the first example from the pdf of the paper "A Typed Representation for HTML and XML documents in Haskell" -- I get <interactive>:1:16: No instance for (AddTo HTML HTML) arising from use of `html' at <interactive>:1:16-75 Possible fix: add an instance declaration for (AddTo HTML HTML) In the first argument of `build_document', namely `(html ((head (title (text "foo"))) ## (body (h1 (text "foo")))))' In the expression: build_document (html ((head (title (text "foo"))) ## (body (h1 (text "foo"))))) In the definition of `it': it = build_document (html ((head (title (text "foo"))) ## (body (h1 (text "foo"))))) and I don't know what's changed since the paper, or what I'm doing wrong -- how do I get it to output something? (This is with WASH.HTML.HTMLPrelude, but similar questions arise for WASH.HTML.HTMLMonad98) unELT $ html (body (h1 $ img empty)) (make DOCUMENT) outputs something, but it's not valid (no head), so that can't be right.
Thiemann's thesis says
Actually, I meant the abovementioned paper.
The current library implements neither inclusions nor exceptions.
So I hope I might be forgiven if I overlooked a difference between the distribution and the thesis! Does the checked version now enforce appendix B and prevent <a> appearing anywhere within <a> and so on?
I am not familiar with the HTML standards enough to understand everything you say here,
I'd sort-of hope that a proper HTML library would relieve you of that responsibility!
but I've just checked that you can't put <a> inside <a> using the checked combinators. But I can't say if it checks everything your library checks.
It doesn't: the issue is not <a> directly in <a>: *WASH.HTML.HTMLPrelude> :t a (a (text "foo")) <interactive>:1:3: No instance for (AddTo A A) arising from use of `a' at <interactive>:1:3-16 Possible fix: add an instance declaration for (AddTo A A) In the first argument of `a', namely `(a (text "foo"))' (it properly rejects that), but <a> anywhere within <a>: *WASH.HTML.HTMLPrelude> :t a (span (a (text "foo"))) a (span (a (text "foo"))) :: (AddTo s A) => ELT s -> ELT s which should also be rejected. Here's what happens with my version: Prelude Typeful.Text.HTMLs> :t a << a << string "foo" <interactive>:1:5: No instance for (Is_A A_not_allowed_in_A) arising from use of `a' at <interactive>:1:5 Possible fix: add an instance declaration for (Is_A A_not_allowed_in_A) In the first argument of `(<<)', namely `a' In the second argument of `(<<)', namely `a << (string "foo")' and Typeful.Text.HTMLs> :t a << span << a << string "foo" <interactive>:1:13: No instance for (Is_A A_not_allowed_in_A) arising from use of `a' at <interactive>:1:13 Possible fix: add an instance declaration for (Is_A A_not_allowed_in_A) In the first argument of `(<<)', namely `a' In the second argument of `(<<)', namely `a << (string "foo")' In the second argument of `(<<)', namely `span << (a << (string "foo"))' I'm not especially enamoured of the "<<" syntax; it's just what's used in the current html and xhtml libraries, so I did something similar.
Another difference is that I haven't used any non-Haskell 98 constructs other than using Template Haskell to generate class declarations and instances (were one so inclined, one could get ghc to output the splices and [clean them up by hand to] produce an entirely H98 version).
I can't think of any non-haskell 98 extensions that were used in WASH/HTML, but I may be overlooking something. At least the interface of WASH.HTML.HTMLMonad98 looks quite standard.
count the number of arguments of the class WithHTML (or AddTo in the above)... ;-) -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
participants (5)
-
Jon Fairbairn
-
Ketil Malde
-
Mattias Bengtsson
-
Tomasz Zielonka
-
Yitzchak Gale