HaXml and the XHTML 1.0 Strict DTD

Is anyone using HaXml to validate XHTML Strict? The old 1.13.2 version has some bugs in how it handles attributes that stop me from using it. It handled the DTD parsing fine. The most-recent darcs version relies on a newer ByteString than I have, so it is not easy for me to test it. A recent (this year) darcs version failed to parse the DTD, yielding this error: validate: In a sequence: in content spec of ELEMENT decl: head When looking for a non-empty sequence with separators: In a sequence: Expected % but found | in file xhtml1 at line 252 col 50 when looking for a content particle when looking for a content particle This is the context: <!--================ Document Head =======================================--> <!ENTITY % head.misc "(script|style|meta|link|object)*"> <!-- content model is %head.misc; combined with a single title and an optional base element in any order --> <!ELEMENT head (%head.misc;, ((title, %head.misc;, (base, %head.misc;)?) | (base, %head.misc;, (title, %head.misc;))))> I appreciate any advice on this. cheers peter

Peter Gammie
The most-recent darcs version relies on a newer ByteString than I have, so it is not easy for me to test it.
I believe there was a patch to fix this. Apparently only one version of the bytestring package (0.9.0.1) ever exported the 'join' function, and a different version with the same number (but not exporting 'join') was uploaded to Hackage! 'Join' has since been replaced by 'intercalate', which is available in all versions 0.9.x.
A recent (this year) darcs version failed to parse the DTD, yielding this error:
I didn't try the full XHTML DTD, but the fragment you included in your message was parsed just fine by the darcs version of HaXml/DtdToHaskell. Regards, Malcolm

On Wed, 2008-04-30 at 11:32 +0100, Malcolm Wallace wrote:
Peter Gammie
wrote: The most-recent darcs version relies on a newer ByteString than I have, so it is not easy for me to test it.
I believe there was a patch to fix this. Apparently only one version of the bytestring package (0.9.0.1) ever exported the 'join' function, and a different version with the same number (but not exporting 'join') was uploaded to Hackage! 'Join' has since been replaced by 'intercalate', which is available in all versions 0.9.x.
Just goes to show that we need a tool to compare and check package APIs so that packages that want to follow a versioning policy can check that they really are. Doing these things manually is prone to mistakes like this one (and another that I'm aware of in the same package). Duncan

On 30/04/2008, at 5:32 PM, Malcolm Wallace wrote:
Peter Gammie
wrote: The most-recent darcs version relies on a newer ByteString than I have, so it is not easy for me to test it.
I believe there was a patch to fix this. Apparently only one version of the bytestring package (0.9.0.1) ever exported the 'join' function, and a different version with the same number (but not exporting 'join') was uploaded to Hackage! 'Join' has since been replaced by 'intercalate', which is available in all versions 0.9.x.
Thanks. I don't doubt it works with a newer bytestring, I just can't readily use such a thing.
A recent (this year) darcs version failed to parse the DTD, yielding this error:
I didn't try the full XHTML DTD, but the fragment you included in your message was parsed just fine by the darcs version of HaXml/ DtdToHaskell.
Can you please try the full XHTML 1.0 Strict DTD? At the same time, can you verify that it handles this part of it properly (circa line 854): <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell: data Table = Table Table_Attrs (Maybe Caption) (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe Tfoot) (OneOf2 (List1 Tbody) (List1 Tr)) deriving (Eq,Show) My expectation is that we can have a <table> without a <col> or <colgroup> child. The W3 validator seems to agree with that interpretation. When I use the HaXml validator with this DTD I get this (e.g.): Element <table> should contain (caption?,(col*| colgroup*),thead?,tfoot?,(tbody+|tr+)) but does not. Element <table> should contain (col*|colgroup*) but does not. cheers peter

Peter Gammie
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell:
data Table = Table Table_Attrs (Maybe Caption) (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe Tfoot) (OneOf2 (List1 Tbody) (List1 Tr)) deriving (Eq,Show)
This looks entirely correct to me.
My expectation is that we can have a <table> without a <col> or <colgroup> child.
Ah, yes I can see why that is permitted, but I guess HaXml's validator is not yet smart enough to be able to choose whether it has seen an empty list of <col> or an empty list of <colgroup>. :-) Here is a suggested fix. Let me know if it works for you. In src/Text/XML/HaXml/Validate.hs, around line 220, use the following diff over the local defn of 'choice': choice elem ns cps = -- return only those parses that don't give any errors [ rem | ([],rem) <- map (\cp-> checkCP elem (definite cp) ns) cps ] + ++ [ ns | all possEmpty cps ] where definite (TagName n Query) = TagName n None definite (Choice cps Query) = Choice cps None definite (Seq cps Query) = Seq cps None definite (TagName n Star) = TagName n Plus definite (Choice cps Star) = Choice cps Plus definite (Seq cps Star) = Seq cps Plus definite x = x + possEmpty (TagName _ mod) = mod `elem` [Query,Star] + possEmpty (Choice cps None) = all possEmpty cps + possEmpty (Choice _ mod) = mod `elem` [Query,Star] + possEmpty (Seq cps None) = all possEmpty cps + possEmpty (Seq _ mod) = mod `elem` [Query,Star] Are there other places, apart from the validator, where a similar problem arises? Regards, Malcolm

On 21/05/2008, at 5:44 PM, Malcolm Wallace wrote:
Peter Gammie
wrote: <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell:
data Table = Table Table_Attrs (Maybe Caption) (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe Tfoot) (OneOf2 (List1 Tbody) (List1 Tr)) deriving (Eq,Show)
This looks entirely correct to me.
I realised that as soon as I sent it. :-)
My expectation is that we can have a <table> without a <col> or <colgroup> child.
Ah, yes I can see why that is permitted, but I guess HaXml's validator is not yet smart enough to be able to choose whether it has seen an empty list of <col> or an empty list of <colgroup>. :-)
Here is a suggested fix. Let me know if it works for you. In src/Text/XML/HaXml/Validate.hs, around line 220, use the following diff over the local defn of 'choice':
choice elem ns cps = -- return only those parses that don't give any errors [ rem | ([],rem) <- map (\cp-> checkCP elem (definite cp) ns) cps ] + ++ [ ns | all possEmpty cps ] where definite (TagName n Query) = TagName n None definite (Choice cps Query) = Choice cps None definite (Seq cps Query) = Seq cps None definite (TagName n Star) = TagName n Plus definite (Choice cps Star) = Choice cps Plus definite (Seq cps Star) = Seq cps Plus definite x = x + possEmpty (TagName _ mod) = mod `elem` [Query,Star] + possEmpty (Choice cps None) = all possEmpty cps + possEmpty (Choice _ mod) = mod `elem` [Query,Star] + possEmpty (Seq cps None) = all possEmpty cps + possEmpty (Seq _ mod) = mod `elem` [Query,Star]
Fantastic, thanks, that seems to work fine. A couple of nits: your use of `elem` refers to Prelude.elem, so I added the Prelude as a qualified import as P and changed those shadowed references to `P.elem`. I will try to send you a patch against 1.13.3 with all these little bits and pieces, when my project is finished. Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is 1.19.x stable, ...?) This would help for new-ish projects like mine.
Are there other places, apart from the validator, where a similar problem arises?
I do not know, I am merely using the DTD and HTML parsers, the CFilter combinators, the pretty printer and the validator. They all seem fine on a cursory check. (In general HaXml has been working quite well. Thanks for producing such a long-lived and well-thought-out library.) cheers peter

Peter Gammie
Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is 1.19.x stable, ...?) This would help for new-ish projects like mine.
The 1.13.x stable branch sees minimal maintenance only, mostly to repair it to build after each new release of ghc breaks something. Versions 1.14 - 1.19 (i.e. the darcs repo) introduce several API changes. I think those have now pretty-much stablised, but unfortunately the work to realise the benefit of those changes throughout the codebase is still incomplete in some places. That is why I have not frozen and released this branch as 2.0 yet. For forward compatibility I would definitely recommend that a new project using HaXml should start with the 1.19 branch, not 1.13. Regards, Malcolm

On 21/05/2008, at 7:40 PM, Malcolm Wallace wrote:
Peter Gammie
wrote: Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is 1.19.x stable, ...?) This would help for new-ish projects like mine.
The 1.13.x stable branch sees minimal maintenance only, mostly to repair it to build after each new release of ghc breaks something.
Versions 1.14 - 1.19 (i.e. the darcs repo) introduce several API changes. I think those have now pretty-much stablised, but unfortunately the work to realise the benefit of those changes throughout the codebase is still incomplete in some places. That is why I have not frozen and released this branch as 2.0 yet.
For forward compatibility I would definitely recommend that a new project using HaXml should start with the 1.19 branch, not 1.13.
Thanks for your advice. Due to GHC 6.6.1 not being ByteString- upgradable, I have been slow in using the darcs version of HaXml. I am now using GHC 6.8.2 and so can try it out. My earlier-reported bug for the DTD parser stands: $ ~/bin/DtdToHaskell xhtml1-strict.dtd DtdToHaskell: In a sequence: in content spec of ELEMENT decl: head When looking for a non-empty sequence with separators: In a sequence: Expected % but found | in file xhtml1-strict.dtd at line 252 col 50 when looking for a content particle when looking for a content particle That is the XHTML 1.0 Strict DTD from the W3. Do you have any ideas what might have caused this? If not, I will have a poke around. It did work fine in 1.13.3, as I remarked earlier. cheers peter
participants (3)
-
Duncan Coutts
-
Malcolm Wallace
-
Peter Gammie