
Moving to haskell-cafe, as this almost certainly isn't a library issue. Hi
It most certainly is a security flaw.
In the src of an img, yes, probably. In the href of a link, its a completely valid thing to do - and one that I've done loads of times. The URI is fine, its just the particular location that is dodgy.
whole pile of dodgy URIs. Most get culled (in my case) by the HaXml parser and/or XHTML 1.0 Strict validation, and now I hope to eliminate the rest by carefully handling the URIs.
I don't think that's possible. A URI can validly have javascript, and can validly be a lot of things which are unsafe.
On that topic, does anyone have any good advice for handling these things?
My advice is that you are targeting security at the wrong level. You shouldn't be cleaning the HTML to get a secure page, you should be having the level that interprets the HTML be secure regardless of the input.
If anyone knows of the state-of-the-art in this area, I'd appreciate a pointer.
http://htmlpurifier.org/live/smoketests/printDefinition.php
doesn't seem to think the style attribute is unsafe. Have they not been following the MySpace fiascos?
Safety is a property of the HTML viewer, not of the HTML or CSS. Thanks Neil