
At 10:44 03/02/04 +0000, Simon Marlow wrote:
(1) Network.URI
I've written a new parser, and extended the module interface slightly, thus:
[snip]
If you'd like to become the maintainer of this module and incorporate your changes, you're entirely welcome (I'm not using this code actively at the moment).
I'd be happy to do that. What are the ground rules for potentially non-backward-compatible changes? What are the procedures for lodging new releases (CVS?).
You mentioned that there were problems with the existing implementation - perhaps you could explain further? As far as I'm aware, the regular expression and the test cases were taken directly from RFC 2396, and the implementation was correct at the time - did something change? The current testcases are in testsuite/tests/ghc-regress/lib/net/uri001.hs.
I don't think it was a problem with the regular expression per se: (a) the regex in RFC2396 doesn't tell you (reliably) if a URI is or is not valid. What it does do is, assuming a valid URI is presented, is pick apart the various components. (b) I would have stuck with the regex-based implementation here, except that the regex module used is not available on Windows. For me, it was easier to construct a URI parser using Parsec, which doesn't depend on system-dependent modules. (c) there are some small changes in syntax that might affect the regex implementation: reserving '[' and ']' for use in IPv6 literals comes to mind. I haven't checked the details. My parser follows the syntax in the RFC2396bis proposal very closely. As such, it will reject some URIs that the regex implementation would accept. My own test suite includes all the RFC2396 test cases. (The RFC2396 proposal already has extensive review and broad consensus in the URI working group; my Haskell work is providing some implementation feedback.) The problems with behaviour of the current implementation that I did note are covered below...
I have some concerns about the way URI strings are reassembled from the component parts using the current URI module interface (e.g. problem with empty fragment handling noted in a previous message). I think the URI implementation should be changed so that all the punctuation characters ("//", "?", "#", etc.) are stored as part of the component values in a URI structure, but I don't know what impact that might have on existing code.
If that's an unforced change I'd vote to keep the current behaviour, to avoid breaking code.
It's not entirely "unforced"... it has to do with the way a URI is stored internally, and the consequences for reconstructing a URI string from the URI components; e.g. file:///path/name is reconstructed as: file:/path/name http://example.org/path/resource# is reconstructed as: http://example.org/path/resource I have a question [1] outstanding with the URI WG about the validity of the first, and do believe that the second is incorrect (there has been some discussion that the presence of a fragment is significant in some web applications). There is a general presumption in Web circles that a URI should be used in exactly the form given; cf. [2]. [1] http://www.w3.org/mid/5.1.0.14.2.20040202132114.00bd6ec8@127.0.0.1 (Can't get proper URI yet ... lists.w3.org is down as I write) [2] http://www.w3.org/2001/tag/webarch/#lc-uri-chars I suppose it would be possible to make a new implementation of the URI structure that presents the same interface, but remembers the presence of empty fields, but I'm concerned that would be locking in undesirable complexity and propagating a debatable design. (Question: why would one wish the URI components to be stored without their distinguishing punctuation?) ... Here's a proposal: (a) change URI thus: [[ data URI = URI { uriScheme :: String -- ^ @http:@ , uriAuthority :: String -- ^ @//www.haskell.org@ , uriPath :: String -- ^ @/ghc@ , uriQuery :: String -- ^ @?query@ , uriFragment :: String -- ^ @#frag@ } ]] (b) implement access functions that behave like the original field selectors. Then the visible change in behaviour would be that 'show' of any URI would reconstruct exactly the string supplied to construct it. If it turns out that the alternative access functions are not needed, they could be dropped in a later revision (hmmm... do Haskell impleemnetations offer a deprecated flag?). #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
participants (1)
-
Graham Klyne