Re: [web-devel] XSS vs charset

On Wed, Apr 2, 2014 at 9:08 AM, Kazu Yamamoto
Hi Michael,
Thank you for your reply.
I suppose theoretically you could be talking about a situation where Mighty is hosting a CGI application that receives user data and produces a static HTML file as a result.
Yes. Also I'm thinking about Yesod.
Yesod has more of a focus on dynamic content, and in those cases, we *do* already set charset=utf8[1]. Where this would affect Yesod is in yesod-static, in which case the same logic I've applied to Mighty would apply: users should not be able to affect the content of static files under normal circumstances, so the security concern is pretty remote. [1] https://github.com/yesodweb/yesod/blob/master/yesod-core/Yesod/Core/Content....
But it could be worked around by the CGI application using <meta charset=...> instead.
Yes. Is this rarely used in Yesod?
Yes. Dynamic responses don't normally go via static file serving at all. In WAI terms, we always end up with a ResponseBuilder, not a ResponseFile, for dynamic content.
So that comes to the question: is it safe for Mighty, mime-types, etc, to require that all HTML files are stored as UTF-8? I'd say, as long as there's a way for a user to override that if necessary, it sounds good to me. mime-types does provide such a capability, so I'd be in favor of tweaking its textual types to include explicit charset information.
Probably I was too sensitive. Based on your discussion, it is safer/better for Mighty not to hard-code charset.
To be clear, besides the security concerns, there is *definitely* a usability advantage in specifying charsets explicitly, in that the browser doesn't need to use defaults or guessing[2]. This just comes down to a numbers game: is it more likely that a browser will mis-guess the character encoding of UTF8 data, or that someone running Mighty will provide non-UTF8 data? One other point in the favor of specifying encoding type is that serving of a file will *reliably* fail. Without a charset, some browsers may guess the wrong character encoding while others won't, which makes it difficult to debug. If you *always* serve with charset=utf8 and that turns out to be wrong, you'll find out quickly and reliably. [2] http://en.wikipedia.org/wiki/Charset_detection

Yesod has more of a focus on dynamic content, and in those cases, we *do* already set charset=utf8[1]. Where this would affect Yesod is in yesod-static, in which case the same logic I've applied to Mighty would apply: users should not be able to affect the content of static files under normal circumstances, so the security concern is pretty remote.
When I checked Yesod today, it returned text/html without charset. But it appeared that it was my mistake. Hhat I saw was a 500 response (from Warp, not from Yesod). Sigh. OK. Yesod returns charset. Good.
To be clear, besides the security concerns, there is *definitely* a usability advantage in specifying charsets explicitly, in that the browser doesn't need to use defaults or guessing[2]. This just comes down to a numbers game: is it more likely that a browser will mis-guess the character encoding of UTF8 data, or that someone running Mighty will provide non-UTF8 data?
I'm assuming that static files contains charset information in their meta header. Creators of static files can do it by themselves without asking their server operator. --Kazu

On Wed, Apr 2, 2014 at 9:41 AM, Kazu Yamamoto
Yesod has more of a focus on dynamic content, and in those cases, we *do* already set charset=utf8[1]. Where this would affect Yesod is in yesod-static, in which case the same logic I've applied to Mighty would apply: users should not be able to affect the content of static files under normal circumstances, so the security concern is pretty remote.
When I checked Yesod today, it returned text/html without charset. But it appeared that it was my mistake. Hhat I saw was a 500 response (from Warp, not from Yesod). Sigh.
OK. Yesod returns charset. Good.
Good catch on the Warp responses, there's no reason *not* to include charset on those. I've pushed a commit for that[1]. [1] https://github.com/yesodweb/wai/commit/1daa65863367251d965be36e5fb388d54681c...
To be clear, besides the security concerns, there is *definitely* a usability advantage in specifying charsets explicitly, in that the browser doesn't need to use defaults or guessing[2]. This just comes down to a numbers game: is it more likely that a browser will mis-guess the character encoding of UTF8 data, or that someone running Mighty will provide non-UTF8 data?
I'm assuming that static files contains charset information in their meta header. Creators of static files can do it by themselves without asking their server operator.
That's a reasonable assumption. When possible, I still prefer using HTTP headers, as that would also address CSS and Javascript files. But given the complexity of needing to add an entire configuration system to Mighty to allow users to override default character set information, it's probably not worth making the change. Unless I hear otherwise, I'll leave mime-types as-is for now. Michael

Good catch on the Warp responses, there's no reason *not* to include charset on those. I've pushed a commit for that[1].
[1] https://github.com/yesodweb/wai/commit/1daa65863367251d965be36e5fb388d54681c...
Thanks.
Unless I hear otherwise, I'll leave mime-types as-is for now.
I agree. --Kazu
participants (2)
-
Kazu Yamamoto
-
Michael Snoyman