xml in fptools?

older
Cabal 1.1.4 + Haddock 0.7 + GADT =...

Isaac Jones

15 May 2006 15 May '06

4:40 a.m.

Greetings, I'm wondering if we've ever discussed putting an XML parser into fptools. I'm asking because cabal-install would like to use xml-rpc, but it drags in a bunch of dependencies that aren't in fptools. What do other languages do: * Python seems to have XML and XML-RPC as standard libraries * Same with ruby * Not java (shocking): http://java.sun.com/j2se/1.3/docs/api/ I guess that HaXml probably can't get into fptools because of license incompatibilities, but hxml and HXmlToolbox both have MIT licenses, and so could probably be put into fptools: http://www.flightlab.com/~joe/hxml/ http://www.fh-wedel.de/~si/HXmlToolbox/ What do others think? peace, isaac

Show replies by date

Lemmih

15 May 15 May

5:17 a.m.

On 5/15/06, Isaac Jones wrote:

...

Greetings,

I'm wondering if we've ever discussed putting an XML parser into fptools. I'm asking because cabal-install would like to use xml-rpc, but it drags in a bunch of dependencies that aren't in fptools.

What do other languages do: * Python seems to have XML and XML-RPC as standard libraries * Same with ruby * Not java (shocking): http://java.sun.com/j2se/1.3/docs/api/

I guess that HaXml probably can't get into fptools because of license incompatibilities, but hxml and HXmlToolbox both have MIT licenses, and so could probably be put into fptools:

http://www.flightlab.com/~joe/hxml/ http://www.fh-wedel.de/~si/HXmlToolbox/

What do others think?

XML and xml-rpc are the wrong tools for cabal-install, IMO. Picking a tool that isn't suited for the task will not result in anything good, no matter how common and popular that tool happens to be. I only chose xml-rpc because it was easily available, and it resulted in several dirty hacks. Please, lets avoid enterprise, XML and managers in suits; The /right/ tools are out there. End of way too emotional rant. -- Friendly, Lemmih

Simon Marlow

10:51 a.m.

Isaac Jones wrote:

...

I'm wondering if we've ever discussed putting an XML parser into fptools. I'm asking because cabal-install would like to use xml-rpc, but it drags in a bunch of dependencies that aren't in fptools.

What do other languages do: * Python seems to have XML and XML-RPC as standard libraries * Same with ruby * Not java (shocking): http://java.sun.com/j2se/1.3/docs/api/

I guess that HaXml probably can't get into fptools because of license incompatibilities, but hxml and HXmlToolbox both have MIT licenses, and so could probably be put into fptools:

http://www.flightlab.com/~joe/hxml/ http://www.fh-wedel.de/~si/HXmlToolbox/

At this stage I don't think we should be asking what is "in fptools". My aim for GHC 6.6 is to have a core set of packages shipped with GHC (probably base, haskell98, template-haskell, unix/Win32, Cabal) and any other packages are shipped at the disgression of the distributor. In practice this will mean that our standard binary dists (.tar.bz2 and Win32 installers) will probably continue to contain a similar set of packages as in 6.4.x, modulo some reorganisation. However, on systems with decent package managers like Gentoo/Debian/*BSD there will be a basic GHC installation and packages can be added and upgraded using the system package tools. Note that this avoids licensing issues; if there are multiple XML packages with different licenses, the programmer gets to choose which one to use. I'd like to see the community settle on a single XML API if possible, but I imagine that will probably happen over time in any case. Perhaps someone could offer to lead a group to work on standardising an API? Cheers, Simon

Graham Klyne

21 May 21 May

2:15 p.m.

Simon Marlow wrote:

...

...
I guess that HaXml probably can't get into fptools because of license incompatibilities, but hxml and HXmlToolbox both have MIT licenses, and so could probably be put into fptools:

http://www.flightlab.com/~joe/hxml/ http://www.fh-wedel.de/~si/HXmlToolbox/

[...]

...

I'd like to see the community settle on a single XML API if possible, but I imagine that will probably happen over time in any case. Perhaps someone could offer to lead a group to work on standardising an API?

When I looked into XML libraries a while ago, I settled for HaXML because: (a) while hxml was clean and easy to use, it's functionality was very limited; e.g. no validation or entity handling, IIRC. (b) HXmlToolbox was the most functional of the packages in terms of XML support: mainly, being the only one with namespace support "out of the box". But I found the interface was too wedded to the IO monad for my purposes; I had cases when I wanted a pure function to process the XML "internal subset" from a string value, and return a result free of the IO monad. In the end, I added namespace support (and xml:lang, xml:base) to HaXml. I think that Malcolm has now incorporated much of that functionality into the mainstream version. From memory, the HaXML API seems very Haskelloid (?), so I'd suggest that as a starting point. I don't know about the licence issues, though. BTW, if we converge on an XML API, I'd be interested to see if that can be built upon to include RDF. Also, query APIs might be good to think about (e.g. for exploiting XQuery, SPARQL; Andy Seaborne at HP Labs has done some work on a SPARQL implementation in Java whose design allows queries to be composed and manipulated programmatically - that kind of thing would, I think, be bread-and-butter for Haskell). #g -- Graham Klyne For email: http://www.ninebynine.org/#Contact

Malcolm Wallace

23 May 23 May

9:19 a.m.

...

Simon Marlow wrote:

...
I'd like to see the community settle on a single XML API if possible, but I imagine that will probably happen over time in any case. Perhaps someone could offer to lead a group to work on standardising an API?

The motivation here is to have a standard API for e.g. parsing an XML file. I'm sure it would be easy to agree on some signatures like readXml :: String -> Maybe XML showXml :: XML -> String fReadXml :: FilePath -> IO XML fWriteXml :: FilePath -> XML -> IO () hGetXml :: Handle -> IO XML hPutXml :: Handle -> XML -> IO () but the real question, or difference between implementations, is the representation of the datatype called XML above. HaXml already has a version of these functions using a class instead of a type, i.e. replace all instances of XML with (XmlContent a => a). But once the XML file has been parsed, the rest of the program is going to want to do some processing on the tree. So ultimately, any one program is going to fix the instantiated type of the class to something in particular, whether that be HaXml's generic Document, or HXT's equivalent, or hxml's event stream, or some DtdToHaskell-generated types. As such, I don't see how the common API helps much. The basic parsing job might share the same name across libraries, but the much more important processing steps will not (and perhaps cannot?). In the OO world, they came up with a common generic API (the DOM) because the physical representation of the tree is hidden - you can only access it by function calls, not pattern-matching. Good FP style tends to do the opposite, revealing the representation. Regards, Malcolm

Ashley Yakeley

6:54 p.m.

Malcolm Wallace wrote:

...

The motivation here is to have a standard API for e.g. parsing an XML file. I'm sure it would be easy to agree on some signatures like

readXml :: String -> Maybe XML showXml :: XML -> String

fReadXml :: FilePath -> IO XML fWriteXml :: FilePath -> XML -> IO ()

hGetXml :: Handle -> IO XML hPutXml :: Handle -> XML -> IO ()

Would we also want some sort of stream API, similar to SAX? streamXML :: (Monad m) => m Char -> m SAXEvent -- Ashley Yakeley, Seattle WA

Malcolm Wallace

7:18 p.m.

Ashley Yakeley writes:

...

Would we also want some sort of stream API, similar to SAX?

streamXML :: (Monad m) => m Char -> m SAXEvent

HaXml has a module Text.XML.HaXml.SAX with the signature saxParse :: String -- ^ The filename -> String -- ^ The content of the file -> ([SaxElement],Maybe String) -- ^ A tuple of the parsed elements and @Nothing@, if no -- error occured, or @Just@ @String@ if an error occured. But again, resolving how different libraries represent the datatypes SaxElement/SAXEvent is the key question. Regards, Malcolm

Ashley Yakeley

10 p.m.

Malcolm Wallace wrote:

...

HaXml has a module Text.XML.HaXml.SAX with the signature

saxParse :: String -- ^ The filename -> String -- ^ The content of the file -> ([SaxElement],Maybe String) -- ^ A tuple of the parsed elements and @Nothing@, if no -- error occured, or @Just@ @String@ if an error occured.

That's not very good for streaming, though, unless you happen to have lazy IO available for your text stream source. It seems to me the whole point of SAX is that you get some XML information even when you don't have all the input yet. I usually model monadic stream sources as "m Char", or "m (Maybe Char)" if one cares about "End Of Source". Equally, sinks can be represented as "Char -> m ()". -- Ashley Yakeley, Seattle WA

Malcolm Wallace

10:45 p.m.

Ashley Yakeley writes:

...

...
HaXml has a module Text.XML.HaXml.SAX with the signature

saxParse :: String -- ^ The filename -> String -- ^ The content of the file -> ([SaxElement],Maybe String) -- ^ A tuple of the parsed elements and @Nothing@, if no -- error occured, or @Just@ @String@ if an error occured.

That's not very good for streaming, though, unless you happen to have lazy IO available for your text stream source. It seems to me the whole point of SAX is that you get some XML information even when you don't have all the input yet.

Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems. Regards, Malcolm

Ashley Yakeley

25 May 25 May

11:41 p.m.

Malcolm Wallace wrote:

...

Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html -- Ashley Yakeley, Seattle WA

Duncan Coutts

26 May 26 May

12:33 a.m.

On Thu, 2006-05-25 at 16:41 -0700, Ashley Yakeley wrote:

...

Malcolm Wallace wrote:

...
Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html

Assuming we do have imprecise exceptions what is wrong with lazy IO? In an monadic IO style you have a loop that reads stuff from a Handle and then processes it in chunks. Each time you read from the Handle it could raise an IO exception. So you get to either locally deal with the exception or let the exception propagate (ie jumping out of your loop). The latter is probably more common since in your inner loop you probably don't know how to handle it, eg do you retry or do something else. So if we're usually handling the error outside the IO loop then what is different with doing that by using lazy IO and then catching the error (thrown from pure code and caught in IO code) ? We could probably even arrange to keep the partially processed input up to the current point so that we could retry or deal with partial input rather than having to restart from the beginning of the stream (which may not be possible say for a pipe or socket). Besides, it's not even true that error handling at that level is easy. As soon as you start building abstractions you blur the notion of where you are in the stream, and so precise error locations disappear. The only way to retain that is to invert your control flow so that you push data into your algorithm rather than your algorithm pulling data from a stream. It's not very pleasant writing code that way. </rant> Duncan

John Meacham

2:16 a.m.

On Fri, May 26, 2006 at 01:33:05AM +0100, Duncan Coutts wrote:

...

On Thu, 2006-05-25 at 16:41 -0700, Ashley Yakeley wrote:

...
Malcolm Wallace wrote:

...
Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html

Assuming we do have imprecise exceptions what is wrong with lazy IO?

gosh I really dislike imprecise exceptions. I would hate to use them for any expected code path and am wary of using them for even unexpected ones. You have to be very intimatly familier with exactly what gets evaluated when or insert a bunch of deepSeqs and hope you don't evaluate something too early. I think they encourage very bad and sloppy programming practices and we wouldn't want to encourage their use as a general tool. The only types of uses I approve of them for are things like ghci, where you don't want errors in user typed code to abort the interpreter, and for things like catching type errors in jhc, where it will abort the program still, but will annotate the message with what was happening before and after and various other useful details for tracking down the cause of the error. That said, Lazy IO is useful, a lot of times aborting your program on failure of IO is the right thing to do and lazyness can make things more elegant and faster, jhcs load time decreased signifigantly since I started reading the 'ho' files lazily. I like having the option of Lazy IO, I don't think it should be the default and most definitly not the only way people interact with the world though. John -- John Meacham - ⑆repetae.net⑆john⑈

Simon Marlow

10:52 a.m.

Duncan Coutts wrote:

...

On Thu, 2006-05-25 at 16:41 -0700, Ashley Yakeley wrote:

...
Malcolm Wallace wrote:

...
Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html

Assuming we do have imprecise exceptions what is wrong with lazy IO?

I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible. There have been several discussions on this topic, not everyone shares my views :-) eg. http://www.haskell.org//pipermail/haskell-cafe/2003-October/005188.html Actually, in a way I really hope I'm wrong. I would love to use lazy I/O, but I believe we need to have a good story for error handling and possible evaluation-order issues first. Cheers, Simon

Ross Paterson

11:26 a.m.

On Fri, May 26, 2006 at 11:52:34AM +0100, Simon Marlow wrote:

...

I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible.

Each time you say I/O here, you mean "input", I think.

Bulat Ziganshin

11:36 a.m.

New subject: Re[2]: xml in fptools?

Hello Simon, Friday, May 26, 2006, 2:52:34 PM, you wrote:

...

...
Assuming we do have imprecise exceptions what is wrong with lazy IO?

...

I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible.

of course it can if, or example, we evaluate two hGetContents on the same Handle; or read the file that is written by other computation i think that lazy I/O is like the ST threads - it's imperative code that works with it's own part of World, and it is safe as far as we don't touch this part of world in any other way. the only difference is that for ST we can be sure that this part of world cannot be used outside, while for lazy I/O we should himself give guaranties -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Simon Marlow

3:17 p.m.

Bulat Ziganshin wrote:

...

Hello Simon,

Friday, May 26, 2006, 2:52:34 PM, you wrote:

...
...
Assuming we do have imprecise exceptions what is wrong with lazy IO?

...
I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible.

of course it can if, or example, we evaluate two hGetContents on the same Handle; or read the file that is written by other computation

Both of those are illegal in Haskell. It might be possible to construct an example using pipes, though. Cheers, Simon

Malcolm Wallace

3:44 p.m.

Simon Marlow wrote:

...

...
I'm not even certain that lazy I/O doesn't upset referential transparency. ... the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order.

It might be possible to construct an example using pipes, though.

Haskell program output is piped to a process which feeds the Haskell program's input pipe. External process computes the time it takes for successive items to appear on its input. Voila, the input to Haskell land is dependent on how long it takes to generate its output, and therefore (indirectly) on evaluation order. It's a bit contrived, and I'm not sure it really breaks ref trans either. This kind of feedback oscillator might have interesting chaotic behaviour though! Regards, Malcolm

Robert Dockins

1:57 p.m.

On May 26, 2006, at 6:52 AM, Simon Marlow wrote:

...

Duncan Coutts wrote:

...
On Thu, 2006-05-25 at 16:41 -0700, Ashley Yakeley wrote:

...
Malcolm Wallace wrote:

...
Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html Assuming we do have imprecise exceptions what is wrong with lazy IO?

I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible.

I'm personally with you on this one. However, I think it is mostly just a problem with current state-of-the-art filesysems. We could solve this problem with a filesystem that models files as persistent data structures (on-disk ropes maybe?). Then "writing" to a file just creates a new version of the persistent datastructure and updates the "name table" to point to the newest version; in fact, updating the name would be optional! I can think of a couple of use cases for modifying on-disk files and keeping them private to my process. Doing "lazy I/O" on a file just means keeping a reference to a particular version of the datastructure instead of getting the latest one from the name table. Obviously, there's some issues with garbage collecting unreferenced file versions, but I don't think the issues are any more complicated than for a journaling FS (caveat: I'm not at all a filesystem guru; I could be wrong). Actually, the more I think about it, the more I like this idea... Anyway, what I'm saying is that I think a filesystem/OS that supports persistent files could solve the problems with lazy I/O. After all, the proof obligation incurred by using lazy I/O is essentially "this file will not change between now and the last reference I make to its contents".

...

There have been several discussions on this topic, not everyone shares my views :-) eg.

http://www.haskell.org//pipermail/haskell-cafe/2003-October/ 005188.html

Actually, in a way I really hope I'm wrong. I would love to use lazy I/O, but I believe we need to have a good story for error handling and possible evaluation-order issues first.

...

Cheers, Simon

Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Simon Marlow

3:15 p.m.

Robert Dockins wrote:

...

On May 26, 2006, at 6:52 AM, Simon Marlow wrote:

...
Duncan Coutts wrote:

...
On Thu, 2006-05-25 at 16:41 -0700, Ashley Yakeley wrote:

...
Malcolm Wallace wrote:

...
Oh, I always assume lazy I/O. It is one of the most useful parts of Haskell, and I rely on it all the time for both interactivity and avoidance of space problems.

Lazy I/O is problematic and probably a bad idea for libraries: http://haskell.org/pipermail/haskell/2006-May/017998.html

Assuming we do have imprecise exceptions what is wrong with lazy IO?

I'm not even certain that lazy I/O doesn't upset referential transparency. It seems hard to construct a concrete counter example though. My intuition is something like this: if evaluating a thunk can cause IO to take place, then the act of evaluating that thunk might affect the value of another lazy I/O computation, and hence it should be possible to get different results by evaluating the thunks in a different order. I'm concerned that in the presence of dependencies between lazy I/O computations, the order of evaluation might be visible.

I'm personally with you on this one. However, I think it is mostly just a problem with current state-of-the-art filesysems. We could solve this problem with a filesystem that models files as persistent data structures (on-disk ropes maybe?). Then "writing" to a file just creates a new version of the persistent datastructure and updates the "name table" to point to the newest version; in fact, updating the name would be optional! I can think of a couple of use cases for modifying on-disk files and keeping them private to my process. Doing "lazy I/O" on a file just means keeping a reference to a particular version of the datastructure instead of getting the latest one from the name table. Obviously, there's some issues with garbage collecting unreferenced file versions, but I don't think the issues are any more complicated than for a journaling FS (caveat: I'm not at all a filesystem guru; I could be wrong). Actually, the more I think about it, the more I like this idea...

Haskell prevents you from opening a file for writing while you already have it open for reading (even if the reading is being done lazily). This is an oft-forgotten part of the Haskell I/O library specification, perhaps because only GHC implements it (and even GHC doesn't implement it on all platforms). But in principle I agree, we would like to think of hGetContents as pulling lazilly from an immutable array of bytes, and if that were the case then there would be no problem. Cheers, Simon

S. Alexander Jacobson

23 May 23 May

8:38 p.m.

Can we talk about Haskell use cases? It seems like we want different APIs for parsing, transforming, and producing XML. Most of the time, I am less interested in XML as a datastructure per se than I am in converting from XML to a Haskell data type or producing XML from a Haskell data type. Has anyone here played with HWSProxyGen or Haifa? -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com On Tue, 23 May 2006, Ashley Yakeley wrote:

...

Malcolm Wallace wrote:

...
The motivation here is to have a standard API for e.g. parsing an XML file. I'm sure it would be easy to agree on some signatures like

readXml :: String -> Maybe XML showXml :: XML -> String

fReadXml :: FilePath -> IO XML fWriteXml :: FilePath -> XML -> IO ()

hGetXml :: Handle -> IO XML hPutXml :: Handle -> XML -> IO ()

Would we also want some sort of stream API, similar to SAX?

streamXML :: (Monad m) => m Char -> m SAXEvent

-- Ashley Yakeley, Seattle WA

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Graham Klyne

26 May 26 May

5:43 p.m.

S. Alexander Jacobson wrote:

...

Can we talk about Haskell use cases? It seems like we want different APIs for parsing, transforming, and producing XML.

My use case was this: To parse XML from some internal or external source (resolving internal and maybe external entities), yielding a structure that was recognizable like the XML infoset, which I then used to write an XMl->RDF parser based closely on the RDF syntax specification (which is defined in terms of the infoset). Which suggests, in response to Malcolm, that an internal representation might usefully be based on the XML infoset specification. A number of web standards define XML -based languages in terms of the infoset, so that irrelevant difference in the character-level syntax don't get in the way (e.g. <x></x> vs <x/>). Also, as I write, I'm on my way back from WWW2006, where there has been much talk of "microformats" and RDFa, both of which layer machine readable data on top ox XML syntax as a way of merging HTML with "semantic" content. For these cases, I think that access to the XML data model is needed. In summary, I think there are significant numbers of cases where what we are trying to get at is not the XML structure represented as Haskell data types, but something encoded using XML structures, those being what one might want to yield a Haskell data type. So I'd be wary of saying that we don't want access to the XML structure per se. #g -- Graham Klyne For email: http://www.ninebynine.org/#Contact

S. Alexander Jacobson

30 May 30 May

9:22 p.m.

The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers. Haskell has all these great grammar tools. Is there any reason we can't use one of them and just treat XML as a lexer? Do we need to validate incoming XML? Or can we assume it is okay? -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com On Fri, 26 May 2006, Graham Klyne wrote:

...

S. Alexander Jacobson wrote:

...
Can we talk about Haskell use cases? It seems like we want different APIs for parsing, transforming, and producing XML.

My use case was this:

To parse XML from some internal or external source (resolving internal and maybe external entities), yielding a structure that was recognizable like the XML infoset, which I then used to write an XMl->RDF parser based closely on the RDF syntax specification (which is defined in terms of the infoset).

Which suggests, in response to Malcolm, that an internal representation might usefully be based on the XML infoset specification. A number of web standards define XML -based languages in terms of the infoset, so that irrelevant difference in the character-level syntax don't get in the way (e.g. <x></x> vs <x/>).

Also, as I write, I'm on my way back from WWW2006, where there has been much talk of "microformats" and RDFa, both of which layer machine readable data on top ox XML syntax as a way of merging HTML with "semantic" content. For these cases, I think that access to the XML data model is needed. In summary, I think there are significant numbers of cases where what we are trying to get at is not the XML structure represented as Haskell data types, but something encoded using XML structures, those being what one might want to yield a Haskell data type. So I'd be wary of saying that we don't want access to the XML structure per se.

#g

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Udo Stenzel

9:58 p.m.

S. Alexander Jacobson wrote:

...

The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?) Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

S. Alexander Jacobson

10:27 p.m.

Again, my point is that it depends on the use cases we want to target. My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype. But that is just my hope. My immediate need is probably to adapt HWSProxyGen or HAifa to talk SOAP to paypal's api. Other people may have other needs. -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com On Tue, 30 May 2006, Udo Stenzel wrote:

...

S. Alexander Jacobson wrote:

...
The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?)

Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

Graham Klyne

31 May 31 May

9:44 a.m.

Well, part of my point was that, AFAICT, your approach doesn't serve the use-cases I envisage and did development for. It seems to me that a good basic XML parser would be a prerequisite to supporting the use-case you describe, and the Haskell type-conversion could be layered on top. As I understand it, that's how HaXML is constructed. As for the <textarea/> case you raise, this could be an area where HTML and XML give rise to differing requirements. Personally, I'd prefer an *XML* parser to stick to XML specifications. #g -- S. Alexander Jacobson wrote:

...

Again, my point is that it depends on the use cases we want to target.

My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype. But that is just my hope. My immediate need is probably to adapt HWSProxyGen or HAifa to talk SOAP to paypal's api.

Other people may have other needs.

-Alex-

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Tue, 30 May 2006, Udo Stenzel wrote:

...
S. Alexander Jacobson wrote:

...
The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?)

Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

S. Alexander Jacobson

1 Jun 1 Jun

5:18 p.m.

Ok, but my original question is whether one XML tool makes sense. For example, if we are consuming XML, it seems like we would want something layered on top of Parsec or PArrows (so we can also parse the contents of CDATA etc). And, if we are producing XML, then we just need some data type that represents the XML infoset and a function for presenting that infoset as XML. And if we are transforming XML, then perhaps the HaXML approach makes the most sense. Note: I am using a wrapper around HaXML for producing XML in HAppS. And if we are *transacting* XML, then a tool like Haifa or HWSProxyGen or perhaps DTDToHaskell seems to make the most sense. All of these seem like different needs/tools. What were your use-cases? -Alex- ______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com On Wed, 31 May 2006, Graham Klyne wrote:

...

Well, part of my point was that, AFAICT, your approach doesn't serve the use-cases I envisage and did development for.

It seems to me that a good basic XML parser would be a prerequisite to supporting the use-case you describe, and the Haskell type-conversion could be layered on top. As I understand it, that's how HaXML is constructed.

As for the <textarea/> case you raise, this could be an area where HTML and XML give rise to differing requirements. Personally, I'd prefer an *XML* parser to stick to XML specifications.

#g --

S. Alexander Jacobson wrote:

...
Again, my point is that it depends on the use cases we want to target.

My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype. But that is just my hope. My immediate need is probably to adapt HWSProxyGen or HAifa to talk SOAP to paypal's api.

Other people may have other needs.

-Alex-

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Tue, 30 May 2006, Udo Stenzel wrote:

...
S. Alexander Jacobson wrote:

...
The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?)

Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

S. Alexander Jacobson

7:23 p.m.

Note, for transforming XML to HTML and MIME, I use XSLT rather than Haskell. -Alex- On Thu, 1 Jun 2006, S. Alexander Jacobson wrote:

...

Ok, but my original question is whether one XML tool makes sense.

For example, if we are consuming XML, it seems like we would want something layered on top of Parsec or PArrows (so we can also parse the contents of CDATA etc).

And, if we are producing XML, then we just need some data type that represents the XML infoset and a function for presenting that infoset as XML.

And if we are transforming XML, then perhaps the HaXML approach makes the most sense. Note: I am using a wrapper around HaXML for producing XML in HAppS.

And if we are *transacting* XML, then a tool like Haifa or HWSProxyGen or perhaps DTDToHaskell seems to make the most sense.

All of these seem like different needs/tools. What were your use-cases?

-Alex-

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Wed, 31 May 2006, Graham Klyne wrote:

...
Well, part of my point was that, AFAICT, your approach doesn't serve the use-cases I envisage and did development for.

It seems to me that a good basic XML parser would be a prerequisite to supporting the use-case you describe, and the Haskell type-conversion could be layered on top. As I understand it, that's how HaXML is constructed.

As for the <textarea/> case you raise, this could be an area where HTML and XML give rise to differing requirements. Personally, I'd prefer an *XML* parser to stick to XML specifications.

#g --

S. Alexander Jacobson wrote:

...
Again, my point is that it depends on the use cases we want to target.

My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype. But that is just my hope. My immediate need is probably to adapt HWSProxyGen or HAifa to talk SOAP to paypal's api.

Other people may have other needs.

-Alex-

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Tue, 30 May 2006, Udo Stenzel wrote:

...
S. Alexander Jacobson wrote:

...
The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?)

Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

Graham Klyne

8:02 p.m.

S. Alexander Jacobson wrote:

...

Ok, but my original question is whether one XML tool makes sense.

I missed that bit... a fair question, but one that begs "what constitutes a tool?". I would suggest that XML is sufficiently quirky and complex to parse that we (as a community) probably don't want to invest effort in supporting more than one XML *parser*. But other tools may usefully be layered on top of that parser. As for the use cases you offer, I think that's one way, but not the only way, to slice the problem space:

...

For example, if we are consuming XML, it seems like we would want something layered on top of Parsec or PArrows (so we can also parse the contents of CDATA etc).

HaXML is layered on something like that, viz. HMW parser combinators. I suppose it could be layered on Parsec, and if starting afresh that might be a good option, but it doesn't seem to me to be a critical issue. As for parsing the contents of CDATA sections, I'd suggest that (except for very specific applications with demanding performance requirements) is something to be tackled *after* the XML has been parsed.

...

And, if we are producing XML, then we just need some data type that represents the XML infoset and a function for presenting that infoset as XML.

I see *producing* XML as being a different, albeit related, problem to that of *parsing* XML.

...

And if we are transforming XML, then perhaps the HaXML approach makes the most sense. Note: I am using a wrapper around HaXML for producing XML in HAppS.

So, here, you use a common (one of 3?) underlying XML *parser*. I don't see how one can, in general, transform XML without first parsing it.

...

And if we are *transacting* XML, then a tool like Haifa or HWSProxyGen or perhaps DTDToHaskell seems to make the most sense.

Hmmm... you lost me there. In this context, I'm not sure what you mean by "transacting". Does this avoid the need to parse it in the first place?

...

All of these seem like different needs/tools. What were your use-cases?

I assume that's rhetorical? (As I said earlier, mine was parsing RDF/XML to yield something that was easily processed in accordance with the RDF abstract syntax specification. A generic XML parser yielding something close to XML infoset was exactly what I wanted for this.) ... So, in summary, I do see value in having a common XML parser, yielding a data structure that is easy to process as an abstraction of the XML data model (like the XML infoset), upon which other tools can be built. It seems to me that the other use-cases for consuming XML, that don't call for a generic XML parser, are more likely to be specific applications that don't need the generality of full XML parsing. I'm ambivalent about the appropriateness of following such an approach, but I note that Tim Bray (the XML pioneer) has argued quite forcefully against the deployment of XML subsets for specific applications (I don't have a specific reference to hand, but this came up some time ago in IETF discussions of protocols based on XML; maybe Jabber or Beep or XmlConf). #g --

...

On Wed, 31 May 2006, Graham Klyne wrote:

...
Well, part of my point was that, AFAICT, your approach doesn't serve the use-cases I envisage and did development for.

It seems to me that a good basic XML parser would be a prerequisite to supporting the use-case you describe, and the Haskell type-conversion could be layered on top. As I understand it, that's how HaXML is constructed.

As for the <textarea/> case you raise, this could be an area where HTML and XML give rise to differing requirements. Personally, I'd prefer an *XML* parser to stick to XML specifications.

#g --

S. Alexander Jacobson wrote:

...
Again, my point is that it depends on the use cases we want to target.

My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype. But that is just my hope. My immediate need is probably to adapt HWSProxyGen or HAifa to talk SOAP to paypal's api.

Other people may have other needs.

-Alex-

______________________________________________________________ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com

On Tue, 30 May 2006, Udo Stenzel wrote:

...
S. Alexander Jacobson wrote:

...
The problem with the infoset is that <textarea></textarea> and <textarea/> mean different things for some web browsers.

So do <textarea/> and <textarea />. What's the point of pointing out that some browsers are broken? (Actually most are somehow broken when it comes to application/xml, but who's counting?)

Udo. -- "There are three ways to make money. You can inherit it. You can marry it. You can steal it." -- conventional wisdom in Italy

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- Graham Klyne For email: http://www.ninebynine.org/#Contact

Malcolm Wallace

31 May 31 May

9:47 a.m.

"S. Alexander Jacobson" wrote:

...

My bias is that we should be targetting conversion between XML and application specific Haskell data types. Speculatively, I imagine a tool that generates Haskell datatypes and a parser from a RelaxNG specification and another that generates a RelaxNG spec from a haskell datatype.

If you read "DTD" instead of "RelaxNG", then HaXml already gives you this. If you read "XML Schema", then Haifa (mostly) gives you this. Is RelaxNG sufficently widely used, that it would be worth spending some effort on implementing that translation too? Regards, Malcolm

6971

Age (days ago)

6988

Last active (days ago)

List overview

Download

28 comments

13 participants

participants (13)

Ashley Yakeley
Bulat Ziganshin
Duncan Coutts
Graham Klyne
Isaac Jones
John Meacham
Lemmih
Malcolm Wallace
Robert Dockins
Ross Paterson
S. Alexander Jacobson
Simon Marlow
Udo Stenzel

xml in fptools?

Bulat Ziganshin

tags

participants (13)