
On Fri, Mar 19, 2010 at 2:41 PM, Jeremy Shaw
On Fri, Mar 19, 2010 at 5:22 PM, Michael Snoyman
wrote: I am not going to have time to look at this again until Saturday or
Sunday. There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all.
Just to confuse the topic even more: if we do real URL encoding/decoding, I believe we would have to assume a certain character set. I had to deal with a site that was encoded in non-UTF8 just a bit ago, and dealing with query parameters is not fun.
That said, perhaps we should consider making the type of PathInfo "PathInfo ByteString" so we make it clear that we're doing no character encoding.
Yeah. I dunno. I just know it needs to be solved :)
Another issue in the same vein is dealing with leading and trailing slashes, though I think this is fairly simple in practice: the web app knows what to do about the trailing slashes, and each plugin should always pass a leading slash.
I am not quite sure what you mean 'each plugin should always pass a leading slash'. Pass to whom?
If we have:
MySite = MyHome | MyBlog Blog MyBlog = BlogHome | BlogPost String
Then I would expect something like this:
formatMySite MyHome = "MyHome" formatMySite (MyBlog blog) = "MyBlog/" ++ formatMyBlog blog
formatMyBlog BlogHome = "BlogHome" formatMyBlog (BlogPost title) = "BlogPost/" ++ title
mkAbs = ("http://localhost:3000/" ++)
(ignoring any escaping that needs to happen in title, and ignoring an AbsPath / PathInfo stuff).
But we could, of course, do it the other way:
formatMySite MyHome = "/MyHome" formatMySite (MyBlog blog) = "/MyBlog" ++ formatMyBlog blog
formatMyBlog BlogHome = "/BlogHome" formatMyBlog (BlogPost title) = "/BlogPost/" ++ title
mkAbs = ("http://localhost:3000" ++)
There definitely needs to be some policy.
- jeremy
Then here's a proposal for both issues at once: * PathInfo is a ByteString * handleWai strips the leading slash from the path-info * every component parses and generates URLs without a leading slash. Trailing slash is application's choice. Regarding URL encoding, let me point out that the following are two different URLs (just try clicking on them): http://www.snoyman.com/blog/entry/persistent-plugs/ http://www.snoyman.com/blog/entry%2Fpersistent-plugs/http://www.snoyman.com/blog/entry/persistent-plugs/ In other words, if we ever URL-decode the string before it reaches the application, we will have conflated unique URLs. I see two options here: * We specify that PathInfo contains URL-encoded values. Any fromUrl/toUrl functions must be aware of this fact. * We change the type of PathInfo to [ByteString], where we split the PathInfo by slashes, and specify that the pieces of *not* URL-encoded. In order to preserve perfectly the original value, we should not combine adjacent delimiters. In other words: /foo/bar/baz/ -> ["foo", "bar", "baz", ""] -- note the trailing empty string /foo/bar/baz -> ["foo", "bar", "baz"] -- we don't need a leading empty string; *every* pathinfo begins with a slash /foo%2Fbar/baz/ -> ["foo/bar", "baz", ""] /foo//bar/baz -> ["foo", "", "bar", "baz] I'm not strongly attached to any of this. Also, my original motivation for breaking up the pieces (easier pattern matching) will be mitigated by the usage of ByteStrings. Michael