the Network.URI parser

Hello, I'm wondering what the state of this parser is. It parses the contents of the src attribute in the following: <p><img src="javascript:alert('XSS');" alt=""/></p> which causes IE 5.5 (and probably 6) to show a dialog box. (I lifted this example from the list at http://ha.ckers.org/xss.html) I was hoping the parser in Network.URI would choke on it - the parentheses are reserved, at least. cheers peter

Hi Peter,
<p><img src="javascript:alert('XSS');" alt=""/></p>
That's a bad example, since its a bit dodgy, and possibly a security flaw. I prefer the example: <a href="javascript:alert('XSS');">foo</a> This works in all browsers. For a URI, if you have javascript: as the prefix, the rest can be any javascript expression - including brackets etc. If you have javascript as the protocol, its not really a URI pointing at a document anymore. Thanks Neil

Neil, On 27/05/2008, at 3:19 PM, Neil Mitchell wrote:
<p><img src="javascript:alert('XSS');" alt=""/></p>
That's a bad example, since its a bit dodgy, and possibly a security flaw. I prefer the example:
<a href="javascript:alert('XSS');">foo</a>
This works in all browsers. For a URI, if you have javascript: as the prefix, the rest can be any javascript expression - including brackets etc. If you have javascript as the protocol, its not really a URI pointing at a document anymore.
It most certainly is a security flaw. If you read that page I pointed to before (it's safe, I think, but best not use IE, ok? :-) you will find a whole pile of dodgy URIs. Most get culled (in my case) by the HaXml parser and/or XHTML 1.0 Strict validation, and now I hope to eliminate the rest by carefully handling the URIs. On that topic, does anyone have any good advice for handling these things? I can imagine whitelisting schemes (ftp/http/???) and doing the slashdot-thing: <a href="link">anchor text [authority]</a> for links coming from untrusted sources. If anyone knows of the state-of-the-art in this area, I'd appreciate a pointer. http://htmlpurifier.org/live/smoketests/printDefinition.php doesn't seem to think the style attribute is unsafe. Have they not been following the MySpace fiascos? (Sorry if this is a bit off-topic.) cheers peter

Peter, I haven't looked at this code in a while, but... as far as I'm aware it's stable and reliable. The parser was written to follow, as closely as I could manage, the specification in RFC2396 (http://www.ietf.org/rfc/rfc2396.txt) - experience in writing this parser was used as feedback (among many others) in the development of RFC2396. The parser does not attempt to be in any respect scheme-aware. The parentheses here are, as far as I'm aware, quite legitimate in a generic URI, and I think no warning or refusal is appropriate for a generic URI parser. (URIs can be and are used in many places other than web pages.) However, there are additional constraints that may be appropriate for specific URI schemes - maybe like reserving parentheses as you suggest - and were I to implement these I would do so in a layer built upon the generic URI parser: given the generic parse, a lookup on the scheme name could select an additional validation function. #g -- Peter Gammie wrote:
Hello,
I'm wondering what the state of this parser is.
It parses the contents of the src attribute in the following:
<p><img src="javascript:alert('XSS');" alt=""/></p>
which causes IE 5.5 (and probably 6) to show a dialog box. (I lifted this example from the list at http://ha.ckers.org/xss.html)
I was hoping the parser in Network.URI would choke on it - the parentheses are reserved, at least.
cheers peter
-- Graham Klyne Contact info: http://www.ninebynine.org/#Contact

Peter, [I should be more careful ... I meant RFC 3986, as in:] I haven't looked at this code in a while, but... as far as I'm aware it's stable and reliable. The parser was written to follow, as closely as I could manage, the specification in RFC3986 (http://www.ietf.org/rfc/rfc3986.txt) - experience in writing this parser was used as feedback (among many others) in the development of RFC2396. The parser does not attempt to be in any respect scheme-aware. The parentheses here are, as far as I'm aware, quite legitimate in a generic URI, and I think no warning or refusal is appropriate for a generic URI parser. (URIs can be and are used in many places other than web pages.) However, there are additional constraints that may be appropriate for specific URI schemes - maybe like reserving parentheses as you suggest - and were I to implement these I would do so in a layer built upon the generic URI parser: given the generic parse, a lookup on the scheme name could select an additional validation function. #g -- Peter Gammie wrote:
Hello,
I'm wondering what the state of this parser is.
It parses the contents of the src attribute in the following:
<p><img src="javascript:alert('XSS');" alt=""/></p>
which causes IE 5.5 (and probably 6) to show a dialog box. (I lifted this example from the list at http://ha.ckers.org/xss.html)
I was hoping the parser in Network.URI would choke on it - the parentheses are reserved, at least.
cheers peter
-- Graham Klyne Contact info: http://www.ninebynine.org/#Contact

Graham: Thanks for your sterling efforts here. I concur with your professional opinion. I will investigate what I can layer on top of your library. cheers peter On 27/05/2008, at 3:52 PM, Graham Klyne wrote:
Peter,
[I should be more careful ... I meant RFC 3986, as in:]
I haven't looked at this code in a while, but... as far as I'm aware it's stable and reliable. The parser was written to follow, as closely as I could manage, the specification in RFC3986 (http://www.ietf.org/rfc/rfc3986.txt) - experience in writing this parser was used as feedback (among many others) in the development of RFC2396.
The parser does not attempt to be in any respect scheme-aware. The parentheses here are, as far as I'm aware, quite legitimate in a generic URI, and I think no warning or refusal is appropriate for a generic URI parser. (URIs can be and are used in many places other than web pages.)
However, there are additional constraints that may be appropriate for specific URI schemes - maybe like reserving parentheses as you suggest - and were I to implement these I would do so in a layer built upon the generic URI parser: given the generic parse, a lookup on the scheme name could select an additional validation function.
#g --
Peter Gammie wrote:
Hello, I'm wondering what the state of this parser is. It parses the contents of the src attribute in the following: <p><img src="javascript:alert('XSS');" alt=""/></p> which causes IE 5.5 (and probably 6) to show a dialog box. (I lifted this example from the list at http://ha.ckers.org/xss.html) I was hoping the parser in Network.URI would choke on it - the parentheses are reserved, at least. cheers peter
-- Graham Klyne Contact info: http://www.ninebynine.org/#Contact
participants (3)
-
Graham Klyne
-
Neil Mitchell
-
Peter Gammie