Re: [web-devel] XSS vs charset

2 Apr 2014

      On Wed, Apr 2, 2014 at 9:08 AM, Kazu Yamamoto  wrote:
...
Hi Michael,
Thank you for your reply.
...
I suppose theoretically you could be talking about a situation where
Mighty
is hosting a CGI application that receives user data and produces a
static
HTML file as a result.
Yes. Also I'm thinking about Yesod.
Yesod has more of a focus on dynamic content, and in those cases, we *do*
already set charset=utf8[1]. Where this would affect Yesod is in
yesod-static, in which case the same logic I've applied to Mighty would
apply: users should not be able to affect the content of static files under
normal circumstances, so the security concern is pretty remote.

[1]
https://github.com/yesodweb/yesod/blob/master/yesod-core/Yesod/Core/Content....
...
...
But it
could be worked around by the CGI application using <meta charset=...>
instead.
Yes. Is this rarely used in Yesod?
Yes. Dynamic responses don't normally go via static file serving at all. In
WAI terms, we always end up with a ResponseBuilder, not a ResponseFile, for
dynamic content.
...
...
So that comes to the question: is it safe for Mighty, mime-types, etc, to
require that all HTML files are stored as UTF-8? I'd say, as long as
there's a way for a user to override that if necessary, it sounds good to
me. mime-types does provide such a capability, so I'd be in favor of
tweaking its textual types to include explicit charset information.
Probably I was too sensitive. Based on your discussion, it is
safer/better for Mighty not to hard-code charset.
To be clear, besides the security concerns, there is *definitely* a
usability advantage in specifying charsets explicitly, in that the browser
doesn't need to use defaults or guessing[2]. This just comes down to a
numbers game: is it more likely that a browser will mis-guess the character
encoding of UTF8 data, or that someone running Mighty will provide non-UTF8
data?

One other point in the favor of specifying encoding type is that serving of
a file will *reliably* fail. Without a charset, some browsers may guess the
wrong character encoding while others won't, which makes it difficult to
debug. If you *always* serve with charset=utf8 and that turns out to be
wrong, you'll find out quickly and reliably.

[2] http://en.wikipedia.org/wiki/Charset_detection

Re: [web-devel] XSS vs charset

Michael Snoyman