XML DTD for the package configuration file

This is my first attempt at an XML DTD. Does anyone have any suggestions for good XML references either a book or a web site? Google seems to be suffering from too much XML information. Is there a Haddock / Javadoc type way to notate a DTD? Should I be using schemas? Are they overkill? What do you think of the style? Sometimes I have trouble choosing between using an attribute, and using a nested element. For instance: <envitem><var>x</var><val>y</val></envitem> verses <envitem var="x">y</envitem> Here's a shot at a DTD for the packages configuration file and an example config file that validates with the DTD (I used xmllint --noout --valid) and the Haskell datatype that it's meant to reflect. I expect it will change a lot as I learn more about XML and about packaging, but I wanted to get it out there in case I'm going horribly wrong somehow. I'm not really satisfied with the version stuff or the versionrange stuff (mostly the Haskell code). Slight background for people who know less about xml than me: a DTD is a means of verifying that an XML file has the correct structure. It seems rather limited in that there are no types, and the regular expressions are weaker than one might expect (I can't say "the data in this field should correspond to the following regexp). XML Schemas seem to fix some of these problems. But DTDs seem to be a nice way to document and share information about the format of an XML file. peace, isaac ------------------------------------------------------------ Packages.dtd ------------------------------------------------------------ -- CUT HERE -- <!ELEMENT packageconf (package+)> <!ATTLIST packageconf version CDATA #REQUIRED> <!-- package --> <!ELEMENT package (version?,importdirs?,sourcedirs?,librarydirs?,hslibraries?,extralibraries?,depends?,builddepends?,ccopts?,ldopts?,environment?,graftpoint?,extraframeworks?)> <!ATTLIST package version CDATA #IMPLIED> <!ATTLIST package name CDATA #REQUIRED> <!ATTLIST package auto (false | true) "false"> <!ELEMENT packageident (#PCDATA)> <!-- Path related --> <!ELEMENT filepath (#PCDATA)> <!ELEMENT sourcedirs (filepath+)> <!ELEMENT importdirs (filepath+)> <!ELEMENT librarydirs (filepath+)> <!-- Libraries --> <!ELEMENT hslibraries (library+)> <!ELEMENT cincludes (include+)> <!ELEMENT extralibraries (library+)> <!ELEMENT library (#PCDATA)> <!ELEMENT include (#PCDATA)> <!-- Options --> <!ELEMENT ccopts (option+)> <!ELEMENT ldopts (option+)> <!ELEMENT option (#PCDATA)> <!ATTLIST option type (short | long | verbatim) "verbatim"> <!-- Depends --> <!ELEMENT builddepends (dependency+)> <!ELEMENT depends (dependency+)> <!ELEMENT dependency (versionrange?)> <!ATTLIST dependency name CDATA #REQUIRED> <!ATTLIST dependency version CDATA #IMPLIED> <!-- Documentation --> <!ELEMENT haddockhtmlroot (#PCDATA)> <!ELEMENT haddockinterface (#PCDATA)> <!-- Versioning --> <!ELEMENT versionrange (version,version?)> <!-- second one is for range --> <!ATTLIST versionrange type (any | orlater | exactly | orearlier | between) "orlater"> <!-- I'm not exactly satisfied with this. I should also be able to say <= --> <!-- and >=, but these are already pretty verbose. --> <!ELEMENT version (numberversion | simpleversion | dateversion)> <!ELEMENT numberversion (major?,minor?,patch?)> <!ELEMENT simpleversion (#PCDATA)> <!ELEMENT major (#PCDATA)> <!ELEMENT minor (#PCDATA)> <!ELEMENT patch (#PCDATA)> <!ELEMENT dateversion (year,month?,day?,patch?)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> <!-- Simpleversions should be a fixed format: 1.2.3-4 or so --> <!-- Environment --> <!ELEMENT environment (envitem+)> <!ELEMENT envitem (var,val)+> <!ELEMENT var (#PCDATA)> <!ELEMENT val (#PCDATA)> <!-- Or maybe this would be preferred? --> <!-- <envitem> var="boing">boo</envitem> --> <!-- Misc --> <!ELEMENT graftpoint (#PCDATA)> <!ELEMENT extraframeworks (framework+)> <!ELEMENT framework (#PCDATA)> -- CUT HERE -- ------------------------------------------------------------ Packages.xml ------------------------------------------------------------ -- CUT HERE -- <?xml version="1.0"?> <!DOCTYPE packageconf SYSTEM "Packages.dtd"> <packageconf version="0.1"> <package name="packageName" version="1.0" auto="true"> <importdirs><filepath>/foo</filepath> <filepath>/foo/bar/bang</filepath></importdirs> <sourcedirs><filepath>/foo</filepath> <filepath>/foo/bar/bang</filepath></sourcedirs> <graftpoint>System.IO</graftpoint> </package> <!-- ============================== --> <package name="dependExample" version="3"> <builddepends> <dependency name="otherPkg"> <versionrange type="exactly"> <version><simpleversion>3.4.5-6</simpleversion></version> </versionrange> </dependency> <dependency name="otherPkg2"> <versionrange type="orlater"> <version><dateversion><year>1998</year></dateversion></version> </versionrange> </dependency> <dependency name="otherPkg3" version="2.0.0-0" /> <dependency name="otherPkg3"> <versionrange type="between"> <version><simpleversion>3.3</simpleversion></version> <version><simpleversion>3.5</simpleversion></version> </versionrange> </dependency> </builddepends> </package> <!-- ============================== --> <package name="optionExample"> <version><numberedversion><major>10</major> <minor>11</minor> <patch>12</patch></numberedversion></version> <ccopts> <option type="short">b</option> <option type="long">foo</option> <option type="verbatim">--bar</option> <option>--bang</option> </ccopts> <environment> <envitem><var>foo</var><val>bar</val></envitem> <envitem><var>bang</var><val>baz</val></envitem> </environment> </package> </packageconf> -- CUT HERE -- ------------------------------------------------------------ Package Config data structure ------------------------------------------------------------ -- CUT HERE -- data PkgIdentifier = PkgIdentifier {pkgName::String, pkgVersion::Version} {- ^Often need name and version since multiple versions of a single package can exist on a system. -} data PackageConfig = Package { pkgIdent :: PkgIdentifier, license :: License, auto :: Bool, <!-- provides :: [String], --> <!-- {- ^A bit pie-in-the-sky; might indicate that this package provides --> <!-- functionality that other packages also provide, such as a compiler --> <!-- or GUI framework, and upon which other packages might depend. -} --> <!-- isDefault :: Bool, --> <!-- think through isDefault more, maybe we actually want a list of defaults --> import_dirs :: [FilePath], source_dirs :: [FilePath], library_dirs :: [FilePath], include_dirs :: [FilePath], hs_libraries :: [String], extra_libraries :: [String], c_includes :: [String], build_deps :: [Dependency], -- build dependencies depends :: [Dependency], -- use dependencies <!-- extra_ghc_opts :: [String], --> extra_cc_opts :: [String], extra_ld_opts :: [String], framework_dirs :: [String], haddock_html_root :: String, haddock_interface :: String, default_grafting_point :: String, -- ^Related to new packages proposal vars :: [(String, String)], -- ^Variable, value pairs, whatever author wants here extra_frameworks:: [String]} data Version = DateVersion {versionYear :: Integer, versionMonth :: Month, versionDay :: Integer} | NumberedVersion {versionMajor :: Integer, versionMinor :: Integer, versionPatchLevel :: Integer} data License = GPL | LGPL | BSD | {- ... | -} OtherLicense FilePath data Dependency = Dependency String VersionRange data VersionRange = AnyVersion | OrLaterVersion Version | ExactlyThisVersion Version | OrEarlierVersion Version type PackageMap = FiniteMap PkgIdentifier PackageConfig -- CUT HERE --

Hi all, Although creating package description files in XML sounds neat, it also sounds like over-design at this stage. Why don't we use Haskell *values* to describe the packages? If we describe packages just like ghc-pkg is doing, as a Haskell record, we get: - very simple code for reading and writing those - syntax that is understood by all haskell programmers - optional elements (by using records) - list elements (by using lists) Might be not as powerful as using XML, but it might be just right for the thing we are trying to do. Just my 2cents, -- Daan Leijen
-----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Isaac Jones Sent: woensdag 15 oktober 2003 6:55 To: libraries Subject: XML DTD for the package configuration file
This is my first attempt at an XML DTD. Does anyone have any suggestions for good XML references either a book or a web site? Google seems to be suffering from too much XML information. Is there a Haddock / Javadoc type way to notate a DTD? Should I be using schemas? Are they overkill? What do you think of the style? Sometimes I have trouble choosing between using an attribute, and using a nested element. For instance:
<envitem><var>x</var><val>y</val></envitem>
verses
<envitem var="x">y</envitem>
Here's a shot at a DTD for the packages configuration file and an example config file that validates with the DTD (I used xmllint --noout --valid) and the Haskell datatype that it's meant to reflect. I expect it will change a lot as I learn more about XML and about packaging, but I wanted to get it out there in case I'm going horribly wrong somehow.
I'm not really satisfied with the version stuff or the versionrange stuff (mostly the Haskell code).
Slight background for people who know less about xml than me: a DTD is a means of verifying that an XML file has the correct structure. It seems rather limited in that there are no types, and the regular expressions are weaker than one might expect (I can't say "the data in this field should correspond to the following regexp). XML Schemas seem to fix some of these problems. But DTDs seem to be a nice way to document and share information about the format of an XML file.
peace,
isaac
------------------------------------------------------------ Packages.dtd ------------------------------------------------------------ -- CUT HERE -- <!ELEMENT packageconf (package+)> <!ATTLIST packageconf version CDATA #REQUIRED>
<!-- package --> <!ELEMENT package (version?,importdirs?,sourcedirs?,librarydirs?,hslibraries?,ex tralibraries?,depends?,builddepends?,ccopts?,ldopts?,environme nt?,graftpoint?,extraframeworks?)> <!ATTLIST package version CDATA #IMPLIED>
<!ATTLIST package name CDATA #REQUIRED> <!ATTLIST package auto (false | true) "false">
<!ELEMENT packageident (#PCDATA)>
<!-- Path related --> <!ELEMENT filepath (#PCDATA)> <!ELEMENT sourcedirs (filepath+)> <!ELEMENT importdirs (filepath+)> <!ELEMENT librarydirs (filepath+)>
<!-- Libraries --> <!ELEMENT hslibraries (library+)> <!ELEMENT cincludes (include+)> <!ELEMENT extralibraries (library+)> <!ELEMENT library (#PCDATA)> <!ELEMENT include (#PCDATA)>
<!-- Options --> <!ELEMENT ccopts (option+)> <!ELEMENT ldopts (option+)> <!ELEMENT option (#PCDATA)> <!ATTLIST option type (short | long | verbatim) "verbatim">
<!-- Depends --> <!ELEMENT builddepends (dependency+)> <!ELEMENT depends (dependency+)>
<!ELEMENT dependency (versionrange?)> <!ATTLIST dependency name CDATA #REQUIRED> <!ATTLIST dependency version CDATA #IMPLIED>
<!-- Documentation --> <!ELEMENT haddockhtmlroot (#PCDATA)> <!ELEMENT haddockinterface (#PCDATA)>
<!-- Versioning --> <!ELEMENT versionrange (version,version?)> <!-- second one is for range --> <!ATTLIST versionrange type (any | orlater | exactly | orearlier | between) "orlater">
<!-- I'm not exactly satisfied with this. I should also be able to say <= --> <!-- and >=, but these are already pretty verbose. -->
<!ELEMENT version (numberversion | simpleversion | dateversion)> <!ELEMENT numberversion (major?,minor?,patch?)> <!ELEMENT simpleversion (#PCDATA)>
<!ELEMENT major (#PCDATA)> <!ELEMENT minor (#PCDATA)> <!ELEMENT patch (#PCDATA)>
<!ELEMENT dateversion (year,month?,day?,patch?)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> <!-- Simpleversions should be a fixed format: 1.2.3-4 or so -->
<!-- Environment --> <!ELEMENT environment (envitem+)> <!ELEMENT envitem (var,val)+> <!ELEMENT var (#PCDATA)> <!ELEMENT val (#PCDATA)> <!-- Or maybe this would be preferred? --> <!-- <envitem> var="boing">boo</envitem> -->
<!-- Misc --> <!ELEMENT graftpoint (#PCDATA)> <!ELEMENT extraframeworks (framework+)> <!ELEMENT framework (#PCDATA)> -- CUT HERE --
------------------------------------------------------------ Packages.xml ------------------------------------------------------------ -- CUT HERE -- <?xml version="1.0"?> <!DOCTYPE packageconf SYSTEM "Packages.dtd"> <packageconf version="0.1"> <package name="packageName" version="1.0" auto="true"> <importdirs><filepath>/foo</filepath> <filepath>/foo/bar/bang</filepath></importdirs> <sourcedirs><filepath>/foo</filepath> <filepath>/foo/bar/bang</filepath></sourcedirs>
<graftpoint>System.IO</graftpoint> </package> <!-- ============================== --> <package name="dependExample" version="3"> <builddepends> <dependency name="otherPkg"> <versionrange type="exactly"> <version><simpleversion>3.4.5-6</simpleversion></version> </versionrange> </dependency>
<dependency name="otherPkg2"> <versionrange type="orlater">
<version><dateversion><year>1998</year></dateversion></version> </versionrange> </dependency>
<dependency name="otherPkg3" version="2.0.0-0" />
<dependency name="otherPkg3"> <versionrange type="between"> <version><simpleversion>3.3</simpleversion></version> <version><simpleversion>3.5</simpleversion></version> </versionrange> </dependency>
</builddepends> </package> <!-- ============================== --> <package name="optionExample"> <version><numberedversion><major>10</major> <minor>11</minor>
<patch>12</patch></numberedversion></version> <ccopts> <option type="short">b</option> <option type="long">foo</option> <option type="verbatim">--bar</option> <option>--bang</option> </ccopts>
<environment> <envitem><var>foo</var><val>bar</val></envitem> <envitem><var>bang</var><val>baz</val></envitem> </environment> </package> </packageconf> -- CUT HERE -- ------------------------------------------------------------ Package Config data structure ------------------------------------------------------------ -- CUT HERE -- data PkgIdentifier = PkgIdentifier {pkgName::String, pkgVersion::Version} {- ^Often need name and version since multiple versions of a single package can exist on a system. -}
data PackageConfig = Package { pkgIdent :: PkgIdentifier, license :: License, auto :: Bool, <!-- provides :: [String], --> <!-- {- ^A bit pie-in-the-sky; might indicate that this package provides --> <!-- functionality that other packages also provide, such as a compiler --> <!-- or GUI framework, and upon which other packages might depend. -} -->
<!-- isDefault :: Bool, --> <!-- think through isDefault more, maybe we actually want a list of defaults -->
import_dirs :: [FilePath], source_dirs :: [FilePath], library_dirs :: [FilePath], include_dirs :: [FilePath], hs_libraries :: [String], extra_libraries :: [String], c_includes :: [String], build_deps :: [Dependency], -- build dependencies depends :: [Dependency], -- use dependencies <!-- extra_ghc_opts :: [String], --> extra_cc_opts :: [String], extra_ld_opts :: [String], framework_dirs :: [String], haddock_html_root :: String, haddock_interface :: String, default_grafting_point :: String, -- ^Related to new packages proposal vars :: [(String, String)], -- ^Variable, value pairs, whatever author wants here extra_frameworks:: [String]}
data Version = DateVersion {versionYear :: Integer, versionMonth :: Month, versionDay :: Integer} | NumberedVersion {versionMajor :: Integer, versionMinor :: Integer, versionPatchLevel :: Integer}
data License = GPL | LGPL | BSD | {- ... | -} OtherLicense FilePath
data Dependency = Dependency String VersionRange
data VersionRange = AnyVersion | OrLaterVersion Version | ExactlyThisVersion Version | OrEarlierVersion Version
type PackageMap = FiniteMap PkgIdentifier PackageConfig -- CUT HERE -- _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/librar> ies

"Daan Leijen"
Hi all,
Although creating package description files in XML sounds neat, it also sounds like over-design at this stage. Why don't we use Haskell *values* to describe the packages? If we describe packages just like ghc-pkg is doing, as a Haskell record, we get:
- very simple code for reading and writing those (SNIP)
My problem with this is that sure, at first we have a one-line parser, but if we change the data structure, then we'll have to write bits of the parser by hand, since read won't work. (I note that ghc-pkg looks like it might just use read, but actually has a happy parser). Also, the user will have to provide an empty value for fields they don't care about, right? That is a little annoying for a simple package. Is there a tool to parse ghc's package file like a tree I can walk? Also it is possible that non-haskell tools (like package managers) will want to muck around with the packages file. I'm hoping to avoid this with haskell-config (or whatever), but we might not get everything right. Remember that a major goal of this file is a kind of interoperablity between Haskell and an operating system's package manager. I don't know why XML is over-design. It's pretty easy to use and there is HaXml. Do you have any argument against it?
Just my 2cents,
This is the kinda thing I'm looking for when I asked, "Am I going horribly wrong". :) peace, isaac

On Wed, Oct 15, 2003 at 01:53:11PM +0200, Daan Leijen wrote:
Although creating package description files in XML sounds neat, it also sounds like over-design at this stage. Why don't we use Haskell *values* to describe the packages? If we describe packages just like ghc-pkg is doing, as a Haskell record, we get:
- very simple code for reading and writing those - syntax that is understood by all haskell programmers - optional elements (by using records) - list elements (by using lists)
Might be not as powerful as using XML, but it might be just right for the thing we are trying to do.
I strongly strongly agree with this. XML is almost always the wrong way to go. I have been involved with several projects where for one reason or another we jumped on the XML bandwagon and have always been disapointed by the results. and excising XML once it has made headway into a project is no small task. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Isaac Jones writes:
Is there a Haddock / Javadoc type way to notate a DTD?
It's not exactly what you asked for, but you might want to take a look at LiveDTD: http://www.sagehill.net/livedtd/
Should I be using schemas?
My impression is that schemas are impossible to write (or to understand) without special editing tools, because they're so incredibly verbose. I also tend to favor DTDs because they are understood by SGML parsers as well (with minor changes). Last but not least, there is a DtdToHaskell converter, but no SchemaToHaskell converter. :-) Peter

Peter Simons wrote:
Isaac Jones writes:
Should I be using schemas?
My impression is that schemas are impossible to write (or to understand) without special editing tools, because they're so incredibly verbose.
Seconded. (But please: use the term "W3C XML Schema" to describe these things, not "schema". There are _many_ schema languages for XML, including DTDs, Relax NG, Schematron, and serveral others. W3C XML Schema is just the one that's gotten the most press.)
I also tend to favor DTDs because they are understood by SGML parsers as well (with minor changes). Last but not least, there is a DtdToHaskell converter, but no SchemaToHaskell converter. :-)
I'll add that a WXSToHaskell converter would be very difficult: there's a huge impedance mismatch between the W3C XML Schema type system and the Haskell type system. --Joe English jenglish@flightlab.com

hello, this mail does not contain anything constructive, but i thought i'd post it just so that it doesn't seem that haskell users are indifferent. i sincerely hope we don't use xml (or any variation of it) in the haskell libraries. good reasons of why not to use it have already been posted. bye iavor Isaac Jones wrote:
This is my first attempt at an XML DTD. Does anyone have any suggestions for good XML references either a book or a web site? Google seems to be suffering from too much XML information. Is there a Haddock / Javadoc type way to notate a DTD? Should I be using schemas? Are they overkill? What do you think of the style? Sometimes I have trouble choosing between using an attribute, and using a nested element. For instance:
-- ================================================== | Iavor S. Diatchki, Ph.D. student | | Department of Computer Science and Engineering | | School of OGI at OHSU | | http://www.cse.ogi.edu/~diatchki | ==================================================

G'day all.
Quoting Iavor Diatchki
this mail does not contain anything constructive, but i thought i'd post it just so that it doesn't seem that haskell users are indifferent. i sincerely hope we don't use xml (or any variation of it) in the haskell libraries. good reasons of why not to use it have already been posted.
Well I'm personally in favour of XML as a general principle (mind you, I get paid for it, so I'm probably a little biassed), however there is absolutely no reason to use it here. You don't want to invent a new syntax + parser if you can avoid it. It's not hard, but it's something that users have to learn. XML is therefore a candidate, but since interoperability with non-Haskell code is very low on the list of priorities, this isn't a compelling argument in its favour. Using a subset of Haskell syntax seems ideal. It's simple, people already know it, and it's easy to work with. Cheers, Andrew Bromage

Isaac will surely correct me in case I got this wrong, but I thought the XML format is meant to be _one_ possible representation of the Haskell data types, which constitute a "package". If you prefer to write Haskell syntax, just do. Dumping your data into an XML file takes _one_ function call with HaXml. The important part is the _internal_ data format. Whether these records are written to disk as XML, ATerm, RPM specs, Gentoo ebuilds, or Microsoft Word documents does not matter. Why is there so much excitement about this issue? :-) Peter

Acceptance by Haskell developers is probably the most important factor to this choice, and I'm definitely willing to go with the consensus here, but I do want to make sure that XML gets a fair hearing. I'm really not married to it, and I like the Haskell syntax better, but it seems to me that XML offers package management systems (written in other languages) an easy way to deal with the Haskell package configuration file if they need to. This is actually a big deal if we want this system to integrate nicely with a wide variety of platforms. Let's say that someone wants to write a tool to look at a package's configuration file and build a package (RPM, DEB, etc) for their system. Are all such tools going to have to be written in Haskell? What if emacs wants to parse the packages.conf file to track down the source code for a library? Furthermore, XML has a good way of specifying character encodings, and it's pretty easy to parse and understand. As for the assertion that it is unusable by both humans and computers, I disagree. In fact, it only takes a short time to learn, and for something like the packages file, someone who has never seen XML before could probably understand it and alter it. Users of the package system will not have to learn XML, and few programmers will have to learn it. The argument that we should not rely on other tools is definitely valid, especially for this project, but we're only talking about a single Haskell library (HaXml, which "locate" tells me is included with GHC 6.0). You say "Relying on other tools", I say "code reuse". If I write a parser with Happy, as has been done for ghc-pkg, we're still relying on other tools. (I realize that there are other parsing options.) In short, I'm ready to agree that we drop XML, but I really don't feel that anyone has given me anything to sink my teeth into with this decision. So to change the subject a little, has anyone written a generic parser for the Haskell syntax? It would be neat to have a parser that could be used across projects to build an AST. peace, isaac
participants (7)
-
ajb@spamcop.net
-
Daan Leijen
-
Iavor Diatchki
-
Isaac Jones
-
Joe English
-
John Meacham
-
Peter Simons