
Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769 I think it's similar to what Jeremy is doing with his urlt package [1]. -chris [1]: http://src.seereason.com/~jeremy/SimpleSite1.html

Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where
toURLS :: a -> ShowS
fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of
[String]. fromURLC consumes a list of [String]. These functions are wrapped
up to provide:
toURL :: (AsURL a) => a -> String
fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it
could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1,
etc.
Now let's say I try to use your library in my application. So at first I
try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or
ImageURL? Do I try both and see which one succeeds? Except we both have a
constructor Upload, so both will succeed. There is no way to tell with
Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a
problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images
constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the
beginning. That is exactly what the URLT monad does. It just holds a
function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m ()
image Upload =
do ...
u <- showURL (ViewImage n)
...
image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the
URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m ()
mpApp Upload = ...
myApp FooBar = ...
myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can
be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
Hey everyone,
I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

I have the feeling it adds a lot of complexity. I agree with you that, if you want modularity, your components should only provide relative URLs and need to be parameterized over how to build an absolute URL. I didn't think of that problem, and using a custom monad transformer is definitely a solution. However, I'm always hesitant to build up stacks of monad transformers, it adds a lot of complexity. I would rather use something like typeclass, but I'm not sure yet how to do that. -chris On 16 mrt 2010, at 14:29, Jeremy Shaw wrote:
Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where toURLS :: a -> ShowS fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of [String]. fromURLC consumes a list of [String]. These functions are wrapped up to provide:
toURL :: (AsURL a) => a -> String fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1, etc.
Now let's say I try to use your library in my application. So at first I try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or ImageURL? Do I try both and see which one succeeds? Except we both have a constructor Upload, so both will succeed. There is no way to tell with Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the beginning. That is exactly what the URLT monad does. It just holds a function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m () image Upload = do ... u <- showURL (ViewImage n) ... image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m () mpApp Upload = ... myApp FooBar = ... myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

These approaches will definitely work, but I'm worried that creating a whole
set of datatypes to represent URLs is overkill. In Yesod (I'm sure many of
you have seen) I use quasi-quoting for defining the resources, like such:
[$mkResources|
/user:
GET: userList
/user/find/#userid
GET: userFind
/user/name/$username
GET: userName
|]
And so on and so forth. I don't think defining UserRoute adds much, besides
making the job of the library writer a little bit easier by pushing the work
off to the user. I think the six lines above succinctly:
* define the valid routes
* define the data types for arguments
* define the appropriate mapping to handler functions for each request
method
Chris mentioned earlier to me the idea of using quasi-quoting on the link
generation side, perhaps like:
[$link|/user/find/6]
I think the only piece of the puzzle missing to combine these two together
is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece
_validRoutes :: [[RoutePiece]]
_validRoutes =
[ [StaticPiece "user"]
, [StaticPiece "user", StaticPiece "find", IntPiece]
, [StaticPiece "user", StaticPiece "name", StringPiece]
]
Now if you write
[$link|/user/find/michael]
link can look up in _validRoutes that there is no matching route and
complain at compile time.
Advantages: less typing by the user.
Disadvantages: we'll have to restrict the data types allowed, but in
practice I think people will usually want only strings and ints anyway.
Also, this approach is more complex.
Michael
On Tue, Mar 16, 2010 at 7:04 AM, Chris Eidhof
I have the feeling it adds a lot of complexity. I agree with you that, if you want modularity, your components should only provide relative URLs and need to be parameterized over how to build an absolute URL. I didn't think of that problem, and using a custom monad transformer is definitely a solution.
However, I'm always hesitant to build up stacks of monad transformers, it adds a lot of complexity. I would rather use something like typeclass, but I'm not sure yet how to do that.
-chris
On 16 mrt 2010, at 14:29, Jeremy Shaw wrote:
Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where toURLS :: a -> ShowS fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of [String]. fromURLC consumes a list of [String]. These functions are wrapped up to provide:
toURL :: (AsURL a) => a -> String fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1, etc.
Now let's say I try to use your library in my application. So at first I try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or ImageURL? Do I try both and see which one succeeds? Except we both have a constructor Upload, so both will succeed. There is no way to tell with Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the beginning. That is exactly what the URLT monad does. It just holds a function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m () image Upload = do ... u <- showURL (ViewImage n) ... image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m () mpApp Upload = ... myApp FooBar = ... myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Creating a number of datatypes (each for every "component") is interesting because of two things: * You can only produce valid URLs (you get that for free) * You can provide an individual component as a library (e.g. a hackage package) I guess that all these approaches can be made equivalent, it's mostly a matter of style and preference. What I like about my approach is that it's very light on Template Haskell: the only TH comes from the regular library and is well-tested. -chris On 16 mrt 2010, at 16:16, Michael Snoyman wrote:
These approaches will definitely work, but I'm worried that creating a whole set of datatypes to represent URLs is overkill. In Yesod (I'm sure many of you have seen) I use quasi-quoting for defining the resources, like such:
[$mkResources| /user: GET: userList /user/find/#userid GET: userFind /user/name/$username GET: userName |]
And so on and so forth. I don't think defining UserRoute adds much, besides making the job of the library writer a little bit easier by pushing the work off to the user. I think the six lines above succinctly:
* define the valid routes * define the data types for arguments * define the appropriate mapping to handler functions for each request method
Chris mentioned earlier to me the idea of using quasi-quoting on the link generation side, perhaps like:
[$link|/user/find/6]
I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Now if you write
[$link|/user/find/michael]
link can look up in _validRoutes that there is no matching route and complain at compile time.
Advantages: less typing by the user. Disadvantages: we'll have to restrict the data types allowed, but in practice I think people will usually want only strings and ints anyway. Also, this approach is more complex.
Michael
On Tue, Mar 16, 2010 at 7:04 AM, Chris Eidhof
wrote: I have the feeling it adds a lot of complexity. I agree with you that, if you want modularity, your components should only provide relative URLs and need to be parameterized over how to build an absolute URL. I didn't think of that problem, and using a custom monad transformer is definitely a solution. However, I'm always hesitant to build up stacks of monad transformers, it adds a lot of complexity. I would rather use something like typeclass, but I'm not sure yet how to do that.
-chris
On 16 mrt 2010, at 14:29, Jeremy Shaw wrote:
Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where toURLS :: a -> ShowS fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of [String]. fromURLC consumes a list of [String]. These functions are wrapped up to provide:
toURL :: (AsURL a) => a -> String fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1, etc.
Now let's say I try to use your library in my application. So at first I try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or ImageURL? Do I try both and see which one succeeds? Except we both have a constructor Upload, so both will succeed. There is no way to tell with Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the beginning. That is exactly what the URLT monad does. It just holds a function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m () image Upload = do ... u <- showURL (ViewImage n) ... image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m () mpApp Upload = ... myApp FooBar = ... myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Took me a bit to appreciate what you just said there, but I see your point
now. It's true, it does have some very nice features. I'm still concerned
about creating something which involves too much boilerplate.
Am I understanding correctly that the URLs will be derived from the names of
the datatypes? Also, how would you address URL dispatch in this approach?
On Tue, Mar 16, 2010 at 8:24 AM, Chris Eidhof
Creating a number of datatypes (each for every "component") is interesting because of two things:
* You can only produce valid URLs (you get that for free) * You can provide an individual component as a library (e.g. a hackage package)
I guess that all these approaches can be made equivalent, it's mostly a matter of style and preference. What I like about my approach is that it's very light on Template Haskell: the only TH comes from the regular library and is well-tested.
-chris
On 16 mrt 2010, at 16:16, Michael Snoyman wrote:
These approaches will definitely work, but I'm worried that creating a whole set of datatypes to represent URLs is overkill. In Yesod (I'm sure many of you have seen) I use quasi-quoting for defining the resources, like such:
[$mkResources| /user: GET: userList /user/find/#userid GET: userFind /user/name/$username GET: userName |]
And so on and so forth. I don't think defining UserRoute adds much, besides making the job of the library writer a little bit easier by pushing the work off to the user. I think the six lines above succinctly:
* define the valid routes * define the data types for arguments * define the appropriate mapping to handler functions for each request method
Chris mentioned earlier to me the idea of using quasi-quoting on the link generation side, perhaps like:
[$link|/user/find/6]
I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Now if you write
[$link|/user/find/michael]
link can look up in _validRoutes that there is no matching route and complain at compile time.
Advantages: less typing by the user. Disadvantages: we'll have to restrict the data types allowed, but in practice I think people will usually want only strings and ints anyway. Also, this approach is more complex.
Michael
On Tue, Mar 16, 2010 at 7:04 AM, Chris Eidhof
wrote: I have the feeling it adds a lot of complexity. I agree with you that, if you want modularity, your components should only provide relative URLs and need to be parameterized over how to build an absolute URL. I didn't think of that problem, and using a custom monad transformer is definitely a solution. However, I'm always hesitant to build up stacks of monad transformers, it adds a lot of complexity. I would rather use something like typeclass, but I'm not sure yet how to do that.
-chris
On 16 mrt 2010, at 14:29, Jeremy Shaw wrote:
Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where toURLS :: a -> ShowS fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of [String]. fromURLC consumes a list of [String]. These functions are wrapped up to provide:
toURL :: (AsURL a) => a -> String fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1, etc.
Now let's say I try to use your library in my application. So at first I try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or ImageURL? Do I try both and see which one succeeds? Except we both have a constructor Upload, so both will succeed. There is no way to tell with Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the beginning. That is exactly what the URLT monad does. It just holds a function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m () image Upload = do ... u <- showURL (ViewImage n) ... image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m () mpApp Upload = ... myApp FooBar = ... myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

On 16 mrt 2010, at 16:46, Michael Snoyman wrote:
Took me a bit to appreciate what you just said there, but I see your point now. It's true, it does have some very nice features. I'm still concerned about creating something which involves too much boilerplate.
Yes. Generic programming (which is what the regular library provides) tries to hide the boilerplate (Template Haskell) code in a library and provides you with combinators so that you can program on the structure of a datatype. This is also at the core of my regular-web library [1], which generates forms/html/json/xml in the same way. You notice that on lines 34-36, I use the exact same TH calls. Once you did that, you get HTML and Formlets generation for free!
Am I understanding correctly that the URLs will be derived from the names of the datatypes?
Exactly!
Also, how would you address URL dispatch in this approach?
A URL is represented by a datastructure, e.g. ApplicationRoute. You would write a function "dispatch :: ApplicationRoute -> Application". Dispatch is not part of the library (and it shouldn't be, imo). In the module "MyApp.UserController" you might write a function "dispatchUser :: UserRoute -> Application", which is called by the original dispatch function. -chris [1]: http://github.com/chriseidhof/regular-web/blob/master/Example.lhs
On Tue, Mar 16, 2010 at 8:24 AM, Chris Eidhof
wrote: Creating a number of datatypes (each for every "component") is interesting because of two things: * You can only produce valid URLs (you get that for free) * You can provide an individual component as a library (e.g. a hackage package)
I guess that all these approaches can be made equivalent, it's mostly a matter of style and preference. What I like about my approach is that it's very light on Template Haskell: the only TH comes from the regular library and is well-tested.
-chris
On 16 mrt 2010, at 16:16, Michael Snoyman wrote:
These approaches will definitely work, but I'm worried that creating a whole set of datatypes to represent URLs is overkill. In Yesod (I'm sure many of you have seen) I use quasi-quoting for defining the resources, like such:
[$mkResources| /user: GET: userList /user/find/#userid GET: userFind /user/name/$username GET: userName |]
And so on and so forth. I don't think defining UserRoute adds much, besides making the job of the library writer a little bit easier by pushing the work off to the user. I think the six lines above succinctly:
* define the valid routes * define the data types for arguments * define the appropriate mapping to handler functions for each request method
Chris mentioned earlier to me the idea of using quasi-quoting on the link generation side, perhaps like:
[$link|/user/find/6]
I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Now if you write
[$link|/user/find/michael]
link can look up in _validRoutes that there is no matching route and complain at compile time.
Advantages: less typing by the user. Disadvantages: we'll have to restrict the data types allowed, but in practice I think people will usually want only strings and ints anyway. Also, this approach is more complex.
Michael
On Tue, Mar 16, 2010 at 7:04 AM, Chris Eidhof
wrote: I have the feeling it adds a lot of complexity. I agree with you that, if you want modularity, your components should only provide relative URLs and need to be parameterized over how to build an absolute URL. I didn't think of that problem, and using a custom monad transformer is definitely a solution. However, I'm always hesitant to build up stacks of monad transformers, it adds a lot of complexity. I would rather use something like typeclass, but I'm not sure yet how to do that.
-chris
On 16 mrt 2010, at 14:29, Jeremy Shaw wrote:
Hello,
It looks nearly identical, but without the URLT monad transformer.
Instead of ToURL I have the class:
class AsURL a where toURLS :: a -> ShowS fromURLC :: Consumer String (Failing a)
With is basically the same. Except toURLS returns a ShowS instead of [String]. fromURLC consumes a list of [String]. These functions are wrapped up to provide:
toURL :: (AsURL a) => a -> String fromURL :: (AsURL a) => String -> Failing a
I do not have generics based url printing/parsing, but there is no reason it could not be added. I do have template haskell based code though.
http://src.seereason.com/urlt/URLT/TH.hs
The thing you don't have is the URLT monad transformer:
http://src.seereason.com/urlt/URLT/Base.hs
Here is why you want it. Imagine you write an image gallery library:
data ImageURL = Upload | ViewImage Int
when you call toURL, you are going to get urls like, /Upload, /ViewImage/1, etc.
Now let's say I try to use your library in my application. So at first I try:
data MyApp = Upload | FooBar
But when a URL comes in, how do I know if I should decode it as MyApp or ImageURL? Do I try both and see which one succeeds? Except we both have a constructor Upload, so both will succeed. There is no way to tell with Upload the path "/Upload" is referring to.
So now I try:
data MyApp = Upload | FooBar | Images ImageURL
now I know that all incoming urls are decoded as MyApp. But there is still a problem. In my code I could write:
toUrl (Images (ViewImage 1))
but in your library code, you don't know anything about the Images constructor. So you just call,
toURL (ViewImage 1)
which generates /ViewImage/1 instead of the required /Images/ViewImage/1.
What I need is someway to tell your library code what prefix to add at the beginning. That is exactly what the URLT monad does. It just holds a function that adds a prefix to the URL.
so in your library you have:
image :: ImageURL -> URLT ImageURL m () image Upload = do ... u <- showURL (ViewImage n) ... image (ViewImage num) = ...
Instead of calling toURL, it calls showURL, which adds the context to the URL and then calls toURL on it.
And in my code I have:
myApp :: MyAPP -> URLT MyApp m () mpApp Upload = ... myApp FooBar = ... myApp (Images subURL) = nestURL Images $ images subURL
the 'nextURL Images' adds the Images context to the URLT environment. It can be used to nest multiple levels if needed:
nestURL A $ nestURL B $ nestURL Images $ showURL (ViewImage 1)
would get turned into something like:
"/A/B/Images/ViewImage/1"
What do you think?
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

On Tue, Mar 16, 2010 at 11:54 AM, Chris Eidhof
On 16 mrt 2010, at 16:46, Michael Snoyman wrote:
Took me a bit to appreciate what you just said there, but I see your point now. It's true, it does have some very nice features. I'm still concerned about creating something which involves too much boilerplate.
Yes. Generic programming (which is what the regular library provides) tries to hide the boilerplate (Template Haskell) code in a library and provides you with combinators so that you can program on the structure of a datatype. This is also at the core of my regular-web library [1], which generates forms/html/json/xml in the same way. You notice that on lines 34-36, I use the exact same TH calls. Once you did that, you get HTML and Formlets generation for free!
Using URLT.TH you would just need the one-liner: $(deriveAsURL ''UserRoute) However, if that is asking too much, I have also added URLT.Regular: http://src.seereason.com/urlt/URLT/Regular.hs So instead you do: $(deriveAll ''UserRoute "PFUserRoute") type instance PF UserRoute = PFUserRoute instance AsURL UserRoute where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
Am I understanding correctly that the URLs will be derived from the names of the datatypes?
Exactly!
URLT currently allows you to generate the urls from the names via template haskell, generics, or by writing instances by hand. I would like to add support for QuasiQuotes similar to what is done in Yesod.
Also, how would you address URL dispatch in this approach?
A URL is represented by a datastructure, e.g. ApplicationRoute. You would write a function "dispatch :: ApplicationRoute -> Application". Dispatch is not part of the library (and it shouldn't be, imo). In the module "MyApp.UserController" you might write a function "dispatchUser :: UserRoute -> Application", which is called by the original dispatch function.
That is how URLT works -- except better. Your function type will be like: dispatchApp :: (ShowURL m, URL m ~ ApplicationRoute) => ApplicationRoute -> m a MyApp.UserController might have a function like: dispatchUser :: (ShowURL m, URL m ~ UserRoute) => UserRoute -> m a the top level dispatchApp would call it like: dispatchApp (User userURL) = nestURL User $ dispatchApp userURL The constraints ensure that your app is also only generating URLs of type ApplicationRoute. If your app was generating urls of type UserRoute, but expecting incoming urls of type ApplicationRoute, that clearly would not work. Imagine if you accidentally wrote: dispatchApp Login = do let url = toURL List in <a href=list>list</a> Here, in the dispatchApp function I accidentally called 'toURL List' instead of, 'toURL (User List)'. In your code that is *not* caught as a type error. With the ShowURL monad it is caught as a compile time error. If the goal is type-safe URLs I think it is essential to catch this error, don't you? - jeremy

On Tue, Mar 16, 2010 at 3:00 PM, Jeremy Shaw
On Tue, Mar 16, 2010 at 11:54 AM, Chris Eidhof
wrote: On 16 mrt 2010, at 16:46, Michael Snoyman wrote:
Took me a bit to appreciate what you just said there, but I see your point now. It's true, it does have some very nice features. I'm still concerned about creating something which involves too much boilerplate.
Yes. Generic programming (which is what the regular library provides) tries to hide the boilerplate (Template Haskell) code in a library and provides you with combinators so that you can program on the structure of a datatype. This is also at the core of my regular-web library [1], which generates forms/html/json/xml in the same way. You notice that on lines 34-36, I use the exact same TH calls. Once you did that, you get HTML and Formlets generation for free!
Using URLT.TH you would just need the one-liner:
$(deriveAsURL ''UserRoute)
However, if that is asking too much, I have also added URLT.Regular:
http://src.seereason.com/urlt/URLT/Regular.hs
So instead you do:
$(deriveAll ''UserRoute "PFUserRoute") type instance PF UserRoute = PFUserRoute
instance AsURL UserRoute where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
Am I understanding correctly that the URLs will be derived from the names of the datatypes?
Exactly!
URLT currently allows you to generate the urls from the names via template haskell, generics, or by writing instances by hand. I would like to add support for QuasiQuotes similar to what is done in Yesod.
Also, how would you address URL dispatch in this approach?
A URL is represented by a datastructure, e.g. ApplicationRoute. You would write a function "dispatch :: ApplicationRoute -> Application". Dispatch is not part of the library (and it shouldn't be, imo). In the module "MyApp.UserController" you might write a function "dispatchUser :: UserRoute -> Application", which is called by the original dispatch function.
That is how URLT works -- except better. Your function type will be like:
dispatchApp :: (ShowURL m, URL m ~ ApplicationRoute) => ApplicationRoute -> m a
MyApp.UserController might have a function like:
dispatchUser :: (ShowURL m, URL m ~ UserRoute) => UserRoute -> m a
the top level dispatchApp would call it like:
dispatchApp (User userURL) = nestURL User $ dispatchApp userURL
The constraints ensure that your app is also only generating URLs of type ApplicationRoute. If your app was generating urls of type UserRoute, but expecting incoming urls of type ApplicationRoute, that clearly would not work.
Imagine if you accidentally wrote:
dispatchApp Login = do let url = toURL List in <a href=list>list</a>
Here, in the dispatchApp function I accidentally called 'toURL List' instead of, 'toURL (User List)'. In your code that is *not* caught as a type error. With the ShowURL monad it is caught as a compile time error. If the goal is type-safe URLs I think it is essential to catch this error, don't you?
- jeremy
Firstly, I haven't read through all of URLT, so take anything I say with a grain of salt. I'll happily be corrected where I misspeak. I'm not sure if I really see the big advantages for URLT over the code that I posted. As an advantage to my code, it's *much* smaller and easier to digest. I understand the URLT is doing a lot of other stuff with TH and the like, but I'm trying to look at the core of URL dispatch here. I would imagine having a system like this: * An underlying typeclass/datatype/whatever for defining a set of URLs. For lack of a better term, let's call it a WebSubapp. * The ability to embed a WebSubapp within another WebSubapp. * The ability to convert a resource in a WebSubapp into a relative path, and vice-versa. * Dispatch a request to a specific resource, passing also (either via explicit argument or through a Reader monad) a function to convert a resource into an absolute path. * Convert a WebSubapp into an Application. I'll assume for the moment that would be a Network.Wai.Application. Once we had that skeleton, we could dress it up however we want. For Yesod, I would change the mkResources quasi-quoter to produce an instance of WebSubapp. Others may wish to use the regular package, some might use TH, and others still may code it all directly. However, if we keep the same skeleton, then all of these will operate with each other seemlessly. The one piece of the puzzle that still irks me just a bit is initialization, such as creating database connections, loading settings, etc. I have some ideas on that as well, but I'll wait to discuss them after we get some of the more basic components addressed. Michael

On Tue, Mar 16, 2010 at 7:15 PM, Michael Snoyman
Firstly, I haven't read through all of URLT, so take anything I say with a grain of salt. I'll happily be corrected where I misspeak.
I'm not sure if I really see the big advantages for URLT over the code that I posted. As an advantage to my code, it's *much* smaller and easier to digest. I understand the URLT is doing a lot of other stuff with TH and the like, but I'm trying to look at the core of URL dispatch here. I would imagine having a system like this:
The essence of URLT is pretty darn small. The TH, generics, and all that other stuff is not required. This is the essence of URLT: newtype URLT url m a = URLT { unURLT :: ReaderT (url -> String) m a } deriving (Functor, Monad, MonadFix, MonadPlus, MonadIO, MonadTrans, MonadReader (url -> String)) showURL :: (Monad m) => url -> URLT url m String showURL u = do mkAbs <- ask return (mkAbs u) -- |used to embed a URLT into a larger parent url nestURL :: (Monad m) => (url2 -> url1) -> URLT url2 m a -> URLT url1 m a nestURL b = withURLT (. b) It's just one newtype wrapper around ReaderT and two very simple functions. No classes, no generics, no nothing.. To 'run' the URLT monad transformer (ie. go from 'URLT url m a' to 'm a') we simple supply a simple function of the type (url -> String). That is all that is required. To dispatch the incoming url we need a function that goes from (String -> url). And then we just write a plain old function that takes the type 'url' as an argument. So for the dispatch portion we don't require any classes or anything from the library itself. The Template Haskell, Generics, etc, are just there to provide some various ways of automatically deriving the (url -> String) and (String -> url) functions. * An underlying typeclass/datatype/whatever for defining a set of URLs. For
lack of a better term, let's call it a WebSubapp.
This would either refer to the monad URLT parameterized with a url type. e.g., URLT WebURL m a
* The ability to embed a WebSubapp within another WebSubapp.
the nestURL function.
* The ability to convert a resource in a WebSubapp into a relative path, and vice-versa.
showURL converts a relative path to an absolute.
* Dispatch a request to a specific resource, passing also (either via explicit argument or through a Reader monad) a function to convert a resource into an absolute path.
To dispatch a url you simple call the top-level handling function and pass in the url. The URLT environment holds the function to convert a resource into an absolute path.
* Convert a WebSubapp into an Application. I'll assume for the moment that would be a Network.Wai.Application.
In the current URLT I have a function that does this for happstack. (that is the entire reason why URLT depends on happstack, and why it would be easy to split out). I can write a similar module for Wai tomorrow.
Once we had that skeleton, we could dress it up however we want. For Yesod, I would change the mkResources quasi-quoter to produce an instance of WebSubapp. Others may wish to use the regular package, some might use TH, and others still may code it all directly.
In URLT mkResources would just need to return the two functions (String -> url) and (url -> String).
However, if we keep the same skeleton, then all of these will operate with each other seemlessly.
Yes. TH and Regular already operate seamless in URLT. If you add mkResource it would as well. Let's examine WebPlug more closely. class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a dispatch :: a -> (a -> AbsPath) -> Application now, I think that having dispatch be part of the WebPlug class itself is a problem because it assumes that your dispatch function needs no other arguments besides the URL. I find that is often not the case. For an image gallery library, the dispatch function might need to take a FilePath argument which specifies where the image directory is. So I think it is better that you write a dispatch handler with a unique name for url type and call it by its unique name. Then there is no problem if you want to add other arguments. So, now you have a function like: dispatchBlog :: a -> (a -> AbsPath) -> Application. now in my function I might want to generate a url that I will use as an href value. It's can't be a relative url (obviously) it needs to be an absolute url. So I need to do something like this: dispatchBlog Foo mkAbs = let u = mkAbs (BlogPost 1) in <a href=u>Blog Post 1</a> Well, it can be a bit annoying to have to have to explicitly have that extra mkAbs argument on every pattern. So we could just wrap it up in ReaderT monad if we wanted: dispatchBlog :: a -> Reader (a -> AbsPath) Application and mkAbs can be: mkAbs :: a -> Reader (a -> AbsPath) AbsPath mkAbs url = do f <- ask return (f url) and we can use it like: dispatchBlog Foo = do u <- mkAbs (BlogPost 1) <a href=u>Blog Post 1</a> and were you currently have this: dispatch (MyBlog b) toAbsPath req = dispatch b (toAbsPath . MyBlog) req we would have something like: dispatchMyBlog (MyBlog b) = withReader (MyBlog .) $ dispatchBlog b we can rename withReader to make its intentions more clear: dispatchSub c = withReader (c .) and just write: dispatchMyBlog (MyBlog b) = dispatchSub MyBlog $ dispatchBlog b Since we got rid of the dispatch function in WebPlug we now have: class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a Personally, I think it should return Failure a instead of Maybe a, because we can include information about why it failed. class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Failing a Now, this class is useful, but not required. It is also essentially the same as AsURL We have a low-level function: plugToWai' :: (a -> ReaderT (a -> AbsPath) Application) -- ^ the dispatch function -> (a -> AbsPath) -- ^ function to convert url to an AbsPath -> (AbsPath -> a) -- ^ function to convert the AbsPath back to a url -> Application We can that function from the higher-level: plugToWai :: (WebPlug a) => (a -> ReaderT (a -> AbsPath) Application) -> Application if we have a dispatch function that takes argumens: fooDispatch :: FilePath -> Int -> a -> ReaderT (a -> AbsPath) Application, we just do something like: plugToWai (fooDispatch "foo" 1) So, to summerize: 1. I don't think dispatch can be a member of the class, because the various dispatch functions may need to take extra arguments, and you can't do that if dispatch is in a class. 2. if you pass the mkAbs function via the Reader monad instead of passing it as an explicit argument then you have pretty much exactly reinvented URLT. Hence, I think you have no option but to agree that URLT is what you wanted all along ;) I am happy to split the happstack and HSP portions out of the core library. I would even be happy to split regular and TH so that we can have: urlt urlt-regular urlt-th urlt-mkResource urt-hsp urlt-happstack urlt-wai urlt-all so that you can only only the extensions you care about. - jeremy

On Tue, Mar 16, 2010 at 8:42 PM, Jeremy Shaw
On Tue, Mar 16, 2010 at 7:15 PM, Michael Snoyman
wrote: Firstly, I haven't read through all of URLT, so take anything I say with a grain of salt. I'll happily be corrected where I misspeak.
I'm not sure if I really see the big advantages for URLT over the code that I posted. As an advantage to my code, it's *much* smaller and easier to digest. I understand the URLT is doing a lot of other stuff with TH and the like, but I'm trying to look at the core of URL dispatch here. I would imagine having a system like this:
The essence of URLT is pretty darn small. The TH, generics, and all that other stuff is not required. This is the essence of URLT:
newtype URLT url m a = URLT { unURLT :: ReaderT (url -> String) m a } deriving (Functor, Monad, MonadFix, MonadPlus, MonadIO, MonadTrans, MonadReader (url -> String))
showURL :: (Monad m) => url -> URLT url m String showURL u = do mkAbs <- ask return (mkAbs u)
-- |used to embed a URLT into a larger parent url nestURL :: (Monad m) => (url2 -> url1) -> URLT url2 m a -> URLT url1 m a nestURL b = withURLT (. b)
It's just one newtype wrapper around ReaderT and two very simple functions. No classes, no generics, no nothing..
To 'run' the URLT monad transformer (ie. go from 'URLT url m a' to 'm a') we simple supply a simple function of the type (url -> String).
That is all that is required.
To dispatch the incoming url we need a function that goes from (String -> url). And then we just write a plain old function that takes the type 'url' as an argument. So for the dispatch portion we don't require any classes or anything from the library itself.
The Template Haskell, Generics, etc, are just there to provide some various ways of automatically deriving the (url -> String) and (String -> url) functions.
* An underlying typeclass/datatype/whatever for defining a set of URLs. For
lack of a better term, let's call it a WebSubapp.
This would either refer to the monad URLT parameterized with a url type.
e.g., URLT WebURL m a
* The ability to embed a WebSubapp within another WebSubapp.
the nestURL function.
* The ability to convert a resource in a WebSubapp into a relative path, and vice-versa.
showURL converts a relative path to an absolute.
* Dispatch a request to a specific resource, passing also (either via explicit argument or through a Reader monad) a function to convert a resource into an absolute path.
To dispatch a url you simple call the top-level handling function and pass in the url. The URLT environment holds the function to convert a resource into an absolute path.
* Convert a WebSubapp into an Application. I'll assume for the moment that would be a Network.Wai.Application.
In the current URLT I have a function that does this for happstack. (that is the entire reason why URLT depends on happstack, and why it would be easy to split out). I can write a similar module for Wai tomorrow.
Once we had that skeleton, we could dress it up however we want. For Yesod, I would change the mkResources quasi-quoter to produce an instance of WebSubapp. Others may wish to use the regular package, some might use TH, and others still may code it all directly.
In URLT mkResources would just need to return the two functions (String -> url) and (url -> String).
However, if we keep the same skeleton, then all of these will operate with each other seemlessly.
Yes. TH and Regular already operate seamless in URLT. If you add mkResource it would as well.
Let's examine WebPlug more closely.
class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a dispatch :: a -> (a -> AbsPath) -> Application
now, I think that having dispatch be part of the WebPlug class itself is a problem because it assumes that your dispatch function needs no other arguments besides the URL. I find that is often not the case. For an image gallery library, the dispatch function might need to take a FilePath argument which specifies where the image directory is. So I think it is better that you write a dispatch handler with a unique name for url type and call it by its unique name. Then there is no problem if you want to add other arguments.
So, now you have a function like:
dispatchBlog :: a -> (a -> AbsPath) -> Application.
now in my function I might want to generate a url that I will use as an href value. It's can't be a relative url (obviously) it needs to be an absolute url. So I need to do something like this:
dispatchBlog Foo mkAbs = let u = mkAbs (BlogPost 1) in <a href=u>Blog Post 1</a>
Well, it can be a bit annoying to have to have to explicitly have that extra mkAbs argument on every pattern. So we could just wrap it up in ReaderT monad if we wanted:
dispatchBlog :: a -> Reader (a -> AbsPath) Application
and mkAbs can be:
mkAbs :: a -> Reader (a -> AbsPath) AbsPath mkAbs url = do f <- ask return (f url)
and we can use it like:
dispatchBlog Foo = do u <- mkAbs (BlogPost 1) <a href=u>Blog Post 1</a>
and were you currently have this:
dispatch (MyBlog b) toAbsPath req = dispatch b (toAbsPath . MyBlog) req
we would have something like:
dispatchMyBlog (MyBlog b) = withReader (MyBlog .) $ dispatchBlog b
we can rename withReader to make its intentions more clear:
dispatchSub c = withReader (c .)
and just write:
dispatchMyBlog (MyBlog b) = dispatchSub MyBlog $ dispatchBlog b
Since we got rid of the dispatch function in WebPlug we now have:
class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a
Personally, I think it should return Failure a instead of Maybe a, because we can include information about why it failed.
class WebPlug a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Failing a
Now, this class is useful, but not required. It is also essentially the same as AsURL
We have a low-level function:
plugToWai' :: (a -> ReaderT (a -> AbsPath) Application) -- ^ the dispatch function -> (a -> AbsPath) -- ^ function to convert url to an AbsPath -> (AbsPath -> a) -- ^ function to convert the AbsPath back to a url -> Application
We can that function from the higher-level:
plugToWai :: (WebPlug a) => (a -> ReaderT (a -> AbsPath) Application) -> Application
if we have a dispatch function that takes argumens:
fooDispatch :: FilePath -> Int -> a -> ReaderT (a -> AbsPath) Application, we just do something like:
plugToWai (fooDispatch "foo" 1)
So, to summerize:
1. I don't think dispatch can be a member of the class, because the various dispatch functions may need to take extra arguments, and you can't do that if dispatch is in a class. 2. if you pass the mkAbs function via the Reader monad instead of passing it as an explicit argument then you have pretty much exactly reinvented URLT.
Hence, I think you have no option but to agree that URLT is what you wanted all along ;) I am happy to split the happstack and HSP portions out of the core library. I would even be happy to split regular and TH so that we can have:
urlt urlt-regular urlt-th urlt-mkResource urt-hsp urlt-happstack urlt-wai urlt-all
so that you can only only the extensions you care about.
- jeremy
Very thorough breakdown, +2 ;). As far as the ReaderT versus extra argument: I think this is a general argument that we could always have. I agree that monad stacks are more convenient; on the other hand, if we want to make something as palatable to as many people as possible, we would probably want to avoid the mtl vs transformers debate like the plague. See, for instance, what we had to do with control-monad-failure and control-monad-failure-mtl. Anyway, let's agree that that's almost irrelevant. I also agree completely with your point about needing to pass extra parameters; I just pushed an update to my gist (http://gist.github.com/334475) which addresses just that. My gist is *incredibly* similar to urlt, so frankly, I'd be happy taking urlt exactly as you've defined it so far. But for sake of completeness, I'd like to point out the last important difference. My new version has two typeclasses, I'll copy them in their entirety here: class IsRelPath a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a class IsRelPath (Routes a) => WebPlug a where type Routes a dispatch :: a -> Routes a -> (Routes a -> AbsPath) -> Application I think we all agree that IsRelPath 1) needs to exist and 2) should be called something better than that. I would say that it's useful to have dispatch as part of a typeclass, which is what WebPlug now is. What makes this typeclass so convenient is that any instance of WebPlug is *self contained*. There's no need to keep track of which subapps require which arguments. Finally, regarding the ReaderT issue, I would recommend making anything that requires monad transformers be a layer on top of the low-level code. Take, for instance, a comparison of the CGI monad versus the WAI Request -> IO Response; I find the latter much nicer to deal with, and if I ever want to, I could package up the Request in a ReaderT at some later stage.* Michael * Yes, I know we could unwrap the ReaderT and go in the reverse direction.

On Tue, Mar 16, 2010 at 11:05 PM, Michael Snoyman
My new version has two typeclasses, I'll copy them in their entirety here:
class IsRelPath a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a
class IsRelPath (Routes a) => WebPlug a where type Routes a dispatch :: a -> Routes a -> (Routes a -> AbsPath) -> Application
I think we all agree that IsRelPath 1) needs to exist and 2) should be called something better than that. I would say that it's useful to have dispatch as part of a typeclass, which is what WebPlug now is. What makes this typeclass so convenient is that any instance of WebPlug is *self contained*. There's no need to keep track of which subapps require which arguments.
I am not really clear wath the benefit for WebPlug is -- it seems to me that it is just adding more boilerplate.. I added two new modules to URLT, namely URLT.Wai and URLT.Dispatch. I think implemented your little blog demo twice. Once where I didn't use dispatch, and once were I did. The code for that is here: http://src.seereason.com/urlt/WaiExample.hs It seemed like using dispatch did not get rid of or simplify anything, it just added more boiler plate, type classes, and used extensions (type families) that a lot of people don't understand yet. And instead of writing something short a straigt-forward like: handleWai mkAbs fromAbs (mySite now) I had to write the longer: handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now))) And on the handler end, it hide useful information in the type signature and required more constructor matching. Instead of: myBlog :: UTCTime -> (BlogURL -> String) -> BlogURL -> Application myBlog now mkAbs BlogHome _request = I have: myBlogD :: BlogArgs -> (BlogURL -> String) -> BlogURL -> Application myBlogD (BlogArgs now) mkAbs BlogHome _request = In order to know the type of 'now' I have to go look somewhere else. In 'myBlog' it was right there in the type signature. So, I guess I do not yet see the value of Dispatch. On the plus side, it doesn't seem like I have to use it if I don't like it. But I am curious if I am missing something useful here.. One advantage is that I can do: :info Dispatch in GHCi, and see all the Dispatch instances that are available. But I'm not sure that really makes it worth the effort. - jeremy p.s. The WaiExample does not use AsURL / IsRelPath, because that is really a completely orthogonal issue, and I wanted to cut out anything that was not relevant.

On Wed, Mar 17, 2010 at 12:48 PM, Jeremy Shaw
On Tue, Mar 16, 2010 at 11:05 PM, Michael Snoyman
wrote: My new version has two typeclasses, I'll copy them in their entirety here:
class IsRelPath a where toRelPath :: a -> RelPath fromRelPath :: RelPath -> Maybe a
class IsRelPath (Routes a) => WebPlug a where type Routes a dispatch :: a -> Routes a -> (Routes a -> AbsPath) -> Application
I think we all agree that IsRelPath 1) needs to exist and 2) should be called something better than that. I would say that it's useful to have dispatch as part of a typeclass, which is what WebPlug now is. What makes this typeclass so convenient is that any instance of WebPlug is *self contained*. There's no need to keep track of which subapps require which arguments.
I am not really clear wath the benefit for WebPlug is -- it seems to me that it is just adding more boilerplate..
I added two new modules to URLT, namely URLT.Wai and URLT.Dispatch.
I think implemented your little blog demo twice. Once where I didn't use dispatch, and once were I did. The code for that is here:
http://src.seereason.com/urlt/WaiExample.hs
It seemed like using dispatch did not get rid of or simplify anything, it just added more boiler plate, type classes, and used extensions (type families) that a lot of people don't understand yet.
And instead of writing something short a straigt-forward like:
handleWai mkAbs fromAbs (mySite now)
I had to write the longer:
handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now)))
And on the handler end, it hide useful information in the type signature and required more constructor matching. Instead of:
myBlog :: UTCTime -> (BlogURL -> String) -> BlogURL -> Application myBlog now mkAbs BlogHome _request =
I have:
myBlogD :: BlogArgs -> (BlogURL -> String) -> BlogURL -> Application myBlogD (BlogArgs now) mkAbs BlogHome _request =
In order to know the type of 'now' I have to go look somewhere else. In 'myBlog' it was right there in the type signature.
So, I guess I do not yet see the value of Dispatch. On the plus side, it doesn't seem like I have to use it if I don't like it. But I am curious if I am missing something useful here..
One advantage is that I can do:
:info Dispatch
in GHCi, and see all the Dispatch instances that are available. But I'm not sure that really makes it worth the effort.
- jeremy
p.s. The WaiExample does not use AsURL / IsRelPath, because that is really a completely orthogonal issue, and I wanted to cut out anything that was not relevant.
Firstly, I think the most valid concern about my appoach is that it uses TypeFamilies. I grant that 100%. Now, as far as your concerns about boilerplate and hiding of types: you're correct on the small scale. When dealing with simple examples, it makes perfect sense to just pass in the 2 or 3 arguments directly instead of having a datatype declared. I see the advantage of having a unified typeclass/dispatch function for dealing with large, nested applications. That said, your example and my example are not exactly the same. I find the final line of mine to be *much* more concise than your Dispatch version. Let's compare them directly: Mine: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Your dispatch version: run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now))) Your handleWai version: run 3000 $ handleWai mkAbs fromAbs (mySite now) I think a lot of the boilerplate you experienced comes from your implementation of my idea, not the idea itself. However, let's try to deal with some of the other important issues. Firstly, Failing versus Maybe: I can't really see a case when you'd need to specify why the path is not a valid URL. It would seem that either it's a theoretically valid path, or it's not. Issues like "that object doesn't exist" wouldn't be handled at the dispatch level usually. I still think we need to reconsider relying on one or the other monad transformer library. I notice now that you're using mtl; Yesod uses transformers. I don't really have a strong preference on this, but it's immediately divisive. There's one other major difference between URLT and my gist: my gist splits a path into pieces and hands that off for parsing. Your code allows each function to handle that itself. In your example, you use the default Read instance (I assume for simplicity). Splitting into pieces the way I did allowed for easy pattern matching; what would URLT code look like that handled "real" URLs? Michael

On Wed, Mar 17, 2010 at 5:47 PM, Michael Snoyman
Now, as far as your concerns about boilerplate and hiding of types: you're correct on the small scale. When dealing with simple examples, it makes perfect sense to just pass in the 2 or 3 arguments directly instead of having a datatype declared. I see the advantage of having a unified typeclass/dispatch function for dealing with large, nested applications.
I can see how declaring a datatype (typically a record) can be useful when you are passing a larger number of arguments to a subhandler. In fact, I already have real code based on URLT where I do that. In the existing example, I can call the version with the wrapped up arguments just fine with out dispatch: run 3000 $ handleWaiU (mySiteD (SiteArgs (BlogArgs now))) " http://localhost:3000" If I call it using dispatch, then it is one token shorter: run 3000 $ handleWaiD (SiteArgs (BlogArgs now)) "http://localhost:3000" except I also am forced to add all these tokens: instance Dispatch SiteArgs where type Routes SiteArgs = SiteURL type App SiteArgs = Application dispatch = mySiteD even though I am only going to call dispatch on SiteArgs one place in my code. So, without dispatch you get the option of using data-types to bundle up arguments if you want to. I don't see how dispatch improves on that portion. With dispatch you are forced to whether you want to or not. The reason you are forced to is because dispatch requires a uniquely named type so it can determine which function to call. One advantage of Dispatch, is that you can write polymorphic functions that call dispatch: myFunc :: (Dispatch a) => a -> ... Is that something we are likely to exploit?
That said, your example and my example are not exactly the same. I find the final line of mine to be *much* more concise than your Dispatch version. Let's compare them directly:
Mine: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Your dispatch version: run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now))) Your handleWai version: run 3000 $ handleWai mkAbs fromAbs (mySite now)
True. If I had a version of handleWai that uses AsURL (similar to how plugToWai works). Then we have: Yours: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Mine (no dispatch): run 3000 $ handleWaiU (mySite now) "http://localhost:3000" Mine (dispatch) run 3000 $ handleWaiD (MySite $ Blog now) "http://localhost:3000" which are essentially the same. Without dispatch, mine could potentially be one token longer. Though in this case it is one token shorter. The version with dispatch is, of course, the same length. I think a lot of the boilerplate you experienced comes from your
implementation of my idea, not the idea itself.
I guess at this point I just feel like it is easier and more straightforward to call the handlers by unique names than to create an instance of Dispatch so I can call the handler using a general name. So, I am looking for some compelling examples where I am going to benefit from having a function like, dispatch :: (Dispatch a) => a -> (Routes a -> String) -> Routes a -> App a, hanging around. Though, as I also mentioned. I don't mind having the Dispatch class in the library as long as I am not required to use it.
However, let's try to deal with some of the other important issues. Firstly, Failing versus Maybe: I can't really see a case when you'd need to specify why the path is not a valid URL. It would seem that either it's a theoretically valid path, or it's not. Issues like "that object doesn't exist" wouldn't be handled at the dispatch level usually.
I have founding the Failing class to be very useful when using URLT for implementing a REST API. The links within my Haskell app won't fail, but links generated by non-Haskell clients can fail. For example, if some php programmer accidentally tries to get, /mysite/myblog/foobar/bolg/1 -- they are going to be a lot happier to see: expecting, 'blog', 'images', 'foo', but got 'bolg', than they would be if they just got 'invalid url'. (Even better would be if it gave the character offset to the bogus path component). Also, if you are writing the toURL / fromURL functions by hand instead of deriving them automatically somehow, then you are going to get it wrong sometimes (in my experience, often). I provide a QuickCheck function that can be used to ensure that your toURL / fromURL functions are inverses. But when the test fails, it is nice to get a more specific error message. I still think we need to reconsider relying on one or the other monad
transformer library. I notice now that you're using mtl; Yesod uses transformers. I don't really have a strong preference on this, but it's immediately divisive.
I refactored so that it does not really depend on either now. I did this by basically reimplementing URLT as a native Reader-like monad instead of wrapping around ReaderT. I added URLT.MTL and URLT.Transformers which contain the MonadTrans and MonadIO instances. But they are not used by any of the code. Happstack is currently mtl based. I think I like transformers better, though I am saddened to see they do not have the classes like MonadReader, MonadWriter, etc.
There's one other major difference between URLT and my gist: my gist splits a path into pieces and hands that off for parsing. Your code allows each function to handle that itself. In your example, you use the default Read instance (I assume for simplicity). Splitting into pieces the way I did allowed for easy pattern matching; what would URLT code look like that handled "real" URLs?
I like the String over the [String] because it is the most general form of representing a URL. If you wanted to use URLT to handle both the pathInfo and the query string parameters, then [String] isn't really the correct type. Though there could be something better than String as well... As for handling, "real" URLs, there are a variety of solutions. If you don't care too much about the prettiness of the URLs you can use template haskell to generate AsURL instances: $(deriveAsURL ''BlogURL) $(deriveAsURL ''SiteURL) main1b :: IO () main1b = do now <- getCurrentTime run 3000 $ handleWaiU (mySite now) "http://localhost:3000" Or if you prefer Regular over TH you can do something like this (we can probably be cleaned up a little): $(deriveAll ''BlogURL "PFBlogURL") type instance PF BlogURL = PFBlogURL instance AsURL BlogURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC $(deriveAll ''SiteURL "PFSiteURL") type instance PF SiteURL = PFSiteURL instance AsURL SiteURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC that should also work with main1b. Or you could do it without AsURL at all using syb: gtoURL :: (Data url) => url -> String gfromURL :: (Data url) => String -> Failing url run 3000 $ handleWai gtoURL gfromURL (mySite now) " http://localhost:3000" Or you could add an AsURL instance that just called gtoURL / gfromURL, and then you could use handleWaiU. If you want to write parsers by hand, you could do it using parsec: main1c :: IO () main1c = do now <- getCurrentTime run 3000 $ handleWai toSiteURL (fromURLP pSiteURL) (mySite now) " http://localhost:3000" where pBlogURL :: Parser BlogURL pBlogURL = do char '/' (BlogPost <$> many1 (noneOf "/")) <|> pure BlogHome pSiteURL :: Parser SiteURL pSiteURL = do char '/' MyBlog <$> (string "blog" *> pBlogURL) <|> pure MyHome toBlogURL :: BlogURL -> String toBlogURL BlogHome = "" toBlogURL (BlogPost title) = title toSiteURL :: SiteURL -> String toSiteURL MyHome = "" toSiteURL (MyBlog blogURL) = "blog/" > (toBlogURL blogURL) In this example, I call handleWai. But I could also create AsURL instances and call handleWaiU. Parsec is perhaps not the best choice of parser combinators. A more specialized URL parser combinator library might be nice. We could also add a helper function so that it is easier to do things via straight pattern matching. But I think straight pattern patching may prove tedious rather quickly? In general though, I am not a big fan of writing the converters by hand, because there is no assurance that they are inverses of each other, and it's annoying to have to basically express the same structure twice -- once to parse it, and once to print it. But there does need to be someway where you can very explicitly map how the datatype and string representation of the URL are related. It would be much better if there was a DSL that simultaneously expressed how to parse and how to print. I have not worked out how to do that yet though -- it is somewhat tricky. However, the quasiquote stuff looks potentially promising as a way of expressing the parsing and printing in a single step... - jeremy

I should note that I pushed updates to WaiExample and the URLT library in
regards to this post.
- jeremy
On Thu, Mar 18, 2010 at 4:07 PM, Jeremy Shaw
On Wed, Mar 17, 2010 at 5:47 PM, Michael Snoyman
wrote: Now, as far as your concerns about boilerplate and hiding of types: you're correct on the small scale. When dealing with simple examples, it makes perfect sense to just pass in the 2 or 3 arguments directly instead of having a datatype declared. I see the advantage of having a unified typeclass/dispatch function for dealing with large, nested applications.
I can see how declaring a datatype (typically a record) can be useful when you are passing a larger number of arguments to a subhandler. In fact, I already have real code based on URLT where I do that. In the existing example, I can call the version with the wrapped up arguments just fine with out dispatch:
run 3000 $ handleWaiU (mySiteD (SiteArgs (BlogArgs now))) " http://localhost:3000"
If I call it using dispatch, then it is one token shorter:
run 3000 $ handleWaiD (SiteArgs (BlogArgs now)) " http://localhost:3000"
except I also am forced to add all these tokens:
instance Dispatch SiteArgs where type Routes SiteArgs = SiteURL type App SiteArgs = Application dispatch = mySiteD
even though I am only going to call dispatch on SiteArgs one place in my code.
So, without dispatch you get the option of using data-types to bundle up arguments if you want to. I don't see how dispatch improves on that portion.
With dispatch you are forced to whether you want to or not. The reason you are forced to is because dispatch requires a uniquely named type so it can determine which function to call.
One advantage of Dispatch, is that you can write polymorphic functions that call dispatch:
myFunc :: (Dispatch a) => a -> ...
Is that something we are likely to exploit?
That said, your example and my example are not exactly the same. I find the final line of mine to be *much* more concise than your Dispatch version. Let's compare them directly:
Mine: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Your dispatch version: run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now))) Your handleWai version: run 3000 $ handleWai mkAbs fromAbs (mySite now)
True. If I had a version of handleWai that uses AsURL (similar to how plugToWai works). Then we have:
Yours: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Mine (no dispatch): run 3000 $ handleWaiU (mySite now) "http://localhost:3000" Mine (dispatch) run 3000 $ handleWaiD (MySite $ Blog now) "http://localhost:3000"
which are essentially the same. Without dispatch, mine could potentially be one token longer. Though in this case it is one token shorter. The version with dispatch is, of course, the same length.
I think a lot of the boilerplate you experienced comes from your
implementation of my idea, not the idea itself.
I guess at this point I just feel like it is easier and more straightforward to call the handlers by unique names than to create an instance of Dispatch so I can call the handler using a general name. So, I am looking for some compelling examples where I am going to benefit from having a function like, dispatch :: (Dispatch a) => a -> (Routes a -> String) -> Routes a -> App a, hanging around.
Though, as I also mentioned. I don't mind having the Dispatch class in the library as long as I am not required to use it.
However, let's try to deal with some of the other important issues. Firstly, Failing versus Maybe: I can't really see a case when you'd need to specify why the path is not a valid URL. It would seem that either it's a theoretically valid path, or it's not. Issues like "that object doesn't exist" wouldn't be handled at the dispatch level usually.
I have founding the Failing class to be very useful when using URLT for implementing a REST API. The links within my Haskell app won't fail, but links generated by non-Haskell clients can fail. For example, if some php programmer accidentally tries to get, /mysite/myblog/foobar/bolg/1 -- they are going to be a lot happier to see:
expecting, 'blog', 'images', 'foo', but got 'bolg', than they would be if they just got 'invalid url'. (Even better would be if it gave the character offset to the bogus path component).
Also, if you are writing the toURL / fromURL functions by hand instead of deriving them automatically somehow, then you are going to get it wrong sometimes (in my experience, often). I provide a QuickCheck function that can be used to ensure that your toURL / fromURL functions are inverses. But when the test fails, it is nice to get a more specific error message.
I still think we need to reconsider relying on one or the other monad
transformer library. I notice now that you're using mtl; Yesod uses transformers. I don't really have a strong preference on this, but it's immediately divisive.
I refactored so that it does not really depend on either now. I did this by basically reimplementing URLT as a native Reader-like monad instead of wrapping around ReaderT. I added URLT.MTL and URLT.Transformers which contain the MonadTrans and MonadIO instances. But they are not used by any of the code.
Happstack is currently mtl based. I think I like transformers better, though I am saddened to see they do not have the classes like MonadReader, MonadWriter, etc.
There's one other major difference between URLT and my gist: my gist splits a path into pieces and hands that off for parsing. Your code allows each function to handle that itself. In your example, you use the default Read instance (I assume for simplicity). Splitting into pieces the way I did allowed for easy pattern matching; what would URLT code look like that handled "real" URLs?
I like the String over the [String] because it is the most general form of representing a URL. If you wanted to use URLT to handle both the pathInfo and the query string parameters, then [String] isn't really the correct type. Though there could be something better than String as well...
As for handling, "real" URLs, there are a variety of solutions. If you don't care too much about the prettiness of the URLs you can use template haskell to generate AsURL instances:
$(deriveAsURL ''BlogURL) $(deriveAsURL ''SiteURL)
main1b :: IO () main1b = do now <- getCurrentTime run 3000 $ handleWaiU (mySite now) "http://localhost:3000"
Or if you prefer Regular over TH you can do something like this (we can probably be cleaned up a little):
$(deriveAll ''BlogURL "PFBlogURL") type instance PF BlogURL = PFBlogURL
instance AsURL BlogURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
$(deriveAll ''SiteURL "PFSiteURL") type instance PF SiteURL = PFSiteURL
instance AsURL SiteURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
that should also work with main1b.
Or you could do it without AsURL at all using syb:
gtoURL :: (Data url) => url -> String gfromURL :: (Data url) => String -> Failing url
run 3000 $ handleWai gtoURL gfromURL (mySite now) " http://localhost:3000"
Or you could add an AsURL instance that just called gtoURL / gfromURL, and then you could use handleWaiU.
If you want to write parsers by hand, you could do it using parsec:
main1c :: IO () main1c = do now <- getCurrentTime run 3000 $ handleWai toSiteURL (fromURLP pSiteURL) (mySite now) " http://localhost:3000" where pBlogURL :: Parser BlogURL pBlogURL = do char '/' (BlogPost <$> many1 (noneOf "/")) <|> pure BlogHome pSiteURL :: Parser SiteURL pSiteURL = do char '/' MyBlog <$> (string "blog" *> pBlogURL) <|> pure MyHome
toBlogURL :: BlogURL -> String toBlogURL BlogHome = "" toBlogURL (BlogPost title) = title
toSiteURL :: SiteURL -> String toSiteURL MyHome = "" toSiteURL (MyBlog blogURL) = "blog/" > (toBlogURL blogURL)
In this example, I call handleWai. But I could also create AsURL instances and call handleWaiU.
Parsec is perhaps not the best choice of parser combinators. A more specialized URL parser combinator library might be nice.
We could also add a helper function so that it is easier to do things via straight pattern matching. But I think straight pattern patching may prove tedious rather quickly?
In general though, I am not a big fan of writing the converters by hand, because there is no assurance that they are inverses of each other, and it's annoying to have to basically express the same structure twice -- once to parse it, and once to print it.
But there does need to be someway where you can very explicitly map how the datatype and string representation of the URL are related.
It would be much better if there was a DSL that simultaneously expressed how to parse and how to print. I have not worked out how to do that yet though -- it is somewhat tricky.
However, the quasiquote stuff looks potentially promising as a way of expressing the parsing and printing in a single step...
- jeremy

Jeremy Shaw
Happstack is currently mtl based. I think I like transformers better, though I am saddened to see they do not have the classes like MonadReader, MonadWriter, etc.
http://hackage.haskell.org/packages/archive/monads-fd/0.0.0.1/doc/html/Contr...
Transformers is intended to be used with either "monads-fd" (to get
those instances using fundeps) or "monads-tf" (similar formulations with
type families.)
Monads-fd should be backwards-compatible w/ mtl.
G
--
Gregory Collins

On Thu, Mar 18, 2010 at 2:07 PM, Jeremy Shaw
On Wed, Mar 17, 2010 at 5:47 PM, Michael Snoyman
wrote: Now, as far as your concerns about boilerplate and hiding of types: you're correct on the small scale. When dealing with simple examples, it makes perfect sense to just pass in the 2 or 3 arguments directly instead of having a datatype declared. I see the advantage of having a unified typeclass/dispatch function for dealing with large, nested applications.
I can see how declaring a datatype (typically a record) can be useful when you are passing a larger number of arguments to a subhandler. In fact, I already have real code based on URLT where I do that. In the existing example, I can call the version with the wrapped up arguments just fine with out dispatch:
run 3000 $ handleWaiU (mySiteD (SiteArgs (BlogArgs now))) " http://localhost:3000"
If I call it using dispatch, then it is one token shorter:
run 3000 $ handleWaiD (SiteArgs (BlogArgs now)) " http://localhost:3000"
except I also am forced to add all these tokens:
instance Dispatch SiteArgs where type Routes SiteArgs = SiteURL type App SiteArgs = Application dispatch = mySiteD
even though I am only going to call dispatch on SiteArgs one place in my code.
So, without dispatch you get the option of using data-types to bundle up arguments if you want to. I don't see how dispatch improves on that portion.
With dispatch you are forced to whether you want to or not. The reason you are forced to is because dispatch requires a uniquely named type so it can determine which function to call.
One advantage of Dispatch, is that you can write polymorphic functions that call dispatch:
myFunc :: (Dispatch a) => a -> ...
Is that something we are likely to exploit?
That said, your example and my example are not exactly the same. I find the final line of mine to be *much* more concise than your Dispatch version. Let's compare them directly:
Mine: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Your dispatch version: run 3000 $ handleWai mkAbs fromAbs (dispatch (SiteArgs (BlogArgs now))) Your handleWai version: run 3000 $ handleWai mkAbs fromAbs (mySite now)
True. If I had a version of handleWai that uses AsURL (similar to how plugToWai works). Then we have:
Yours: run 3000 $ plugToWai (MySite $ Blog now) "http://localhost:3000/" Mine (no dispatch): run 3000 $ handleWaiU (mySite now) "http://localhost:3000" Mine (dispatch) run 3000 $ handleWaiD (MySite $ Blog now) "http://localhost:3000"
which are essentially the same. Without dispatch, mine could potentially be one token longer. Though in this case it is one token shorter. The version with dispatch is, of course, the same length.
I think a lot of the boilerplate you experienced comes from your
implementation of my idea, not the idea itself.
I guess at this point I just feel like it is easier and more straightforward to call the handlers by unique names than to create an instance of Dispatch so I can call the handler using a general name. So, I am looking for some compelling examples where I am going to benefit from having a function like, dispatch :: (Dispatch a) => a -> (Routes a -> String) -> Routes a -> App a, hanging around.
Though, as I also mentioned. I don't mind having the Dispatch class in the library as long as I am not required to use it.
Based on everything you've said, and some thought I've had on my own, I agree that the base function should involve no typeclasses and not break up the path into pieces. Here's a proposal for the entire core: newtype AbsPath = AbsPath { unAbsPath :: String } newtype PathInfo = PathInfo { unPathInfo :: String } handleWai :: (PathInfo -> Failing url) -> (url -> PathInfo) -> (PathInfo -> AbsPath) -> (url -> (url -> AbsPath) -> Application) -> Application handleWai parsePI buildPI buildAbsPath dispatch req = do let pi = PathInfo $ S.unpack $ pathInfo req case parsePI pi of Success url -> dispatch url (buildAbsPath . buildPI) req Failure errors -> return $ Response Status404 [] $ Right $ fromLBS $ L.pack $ unlines errors I've gone ahead and gotten my previous plugToWai function to work on top of this (available in the gist), which should be enough of a proof-of-concept that this core is solid enough. I think it makes a lot of sense to define the two newtypes to keep a clear distinction between the two categories of "URLs". We could augment this further with a "[String] -> IO Response" failure handling function. If we *really* want to go overboard, we could even redefine it as this: handleWai :: (PathInfo -> Either err url) -> (err -> Application) -> (url -> PathInfo) -> (PathInfo -> AbsPath) -> (url -> (url -> AbsPath) -> Application) -> Application However, let's try to deal with some of the other important issues.
Firstly, Failing versus Maybe: I can't really see a case when you'd need to specify why the path is not a valid URL. It would seem that either it's a theoretically valid path, or it's not. Issues like "that object doesn't exist" wouldn't be handled at the dispatch level usually.
I have founding the Failing class to be very useful when using URLT for implementing a REST API. The links within my Haskell app won't fail, but links generated by non-Haskell clients can fail. For example, if some php programmer accidentally tries to get, /mysite/myblog/foobar/bolg/1 -- they are going to be a lot happier to see:
expecting, 'blog', 'images', 'foo', but got 'bolg', than they would be if they just got 'invalid url'. (Even better would be if it gave the character offset to the bogus path component).
Also, if you are writing the toURL / fromURL functions by hand instead of deriving them automatically somehow, then you are going to get it wrong sometimes (in my experience, often). I provide a QuickCheck function that can be used to ensure that your toURL / fromURL functions are inverses. But when the test fails, it is nice to get a more specific error message.
I still think we need to reconsider relying on one or the other monad
transformer library. I notice now that you're using mtl; Yesod uses transformers. I don't really have a strong preference on this, but it's immediately divisive.
I refactored so that it does not really depend on either now. I did this by basically reimplementing URLT as a native Reader-like monad instead of wrapping around ReaderT. I added URLT.MTL and URLT.Transformers which contain the MonadTrans and MonadIO instances. But they are not used by any of the code.
Happstack is currently mtl based. I think I like transformers better, though I am saddened to see they do not have the classes like MonadReader, MonadWriter, etc.
I see that Gregory already responded on monads-fd and monads-tf. Which only further splits the community unfortunately.
There's one other major difference between URLT and my gist: my gist splits
a path into pieces and hands that off for parsing. Your code allows each function to handle that itself. In your example, you use the default Read instance (I assume for simplicity). Splitting into pieces the way I did allowed for easy pattern matching; what would URLT code look like that handled "real" URLs?
I like the String over the [String] because it is the most general form of representing a URL. If you wanted to use URLT to handle both the pathInfo and the query string parameters, then [String] isn't really the correct type. Though there could be something better than String as well...
In some ways, ByteString is more appropriate, since that *is* the actual data available. But I doubt this really makes much of a difference, especially if we just internally use Char8 unpacking.
As for handling, "real" URLs, there are a variety of solutions. If you don't care too much about the prettiness of the URLs you can use template haskell to generate AsURL instances:
$(deriveAsURL ''BlogURL) $(deriveAsURL ''SiteURL)
main1b :: IO () main1b = do now <- getCurrentTime run 3000 $ handleWaiU (mySite now) "http://localhost:3000"
Or if you prefer Regular over TH you can do something like this (we can probably be cleaned up a little):
$(deriveAll ''BlogURL "PFBlogURL") type instance PF BlogURL = PFBlogURL
instance AsURL BlogURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
$(deriveAll ''SiteURL "PFSiteURL") type instance PF SiteURL = PFSiteURL
instance AsURL SiteURL where toURLS = gtoURLS . from fromURLC = fmap (fmap to) gfromURLC
that should also work with main1b.
Or you could do it without AsURL at all using syb:
gtoURL :: (Data url) => url -> String gfromURL :: (Data url) => String -> Failing url
run 3000 $ handleWai gtoURL gfromURL (mySite now) " http://localhost:3000"
Or you could add an AsURL instance that just called gtoURL / gfromURL, and then you could use handleWaiU.
If you want to write parsers by hand, you could do it using parsec:
main1c :: IO () main1c = do now <- getCurrentTime run 3000 $ handleWai toSiteURL (fromURLP pSiteURL) (mySite now) " http://localhost:3000" where pBlogURL :: Parser BlogURL pBlogURL = do char '/' (BlogPost <$> many1 (noneOf "/")) <|> pure BlogHome pSiteURL :: Parser SiteURL pSiteURL = do char '/' MyBlog <$> (string "blog" *> pBlogURL) <|> pure MyHome
toBlogURL :: BlogURL -> String toBlogURL BlogHome = "" toBlogURL (BlogPost title) = title
toSiteURL :: SiteURL -> String toSiteURL MyHome = "" toSiteURL (MyBlog blogURL) = "blog/" > (toBlogURL blogURL)
In this example, I call handleWai. But I could also create AsURL instances and call handleWaiU.
Parsec is perhaps not the best choice of parser combinators. A more specialized URL parser combinator library might be nice.
We could also add a helper function so that it is easier to do things via straight pattern matching. But I think straight pattern patching may prove tedious rather quickly?
In general though, I am not a big fan of writing the converters by hand, because there is no assurance that they are inverses of each other, and it's annoying to have to basically express the same structure twice -- once to parse it, and once to print it.
But there does need to be someway where you can very explicitly map how the datatype and string representation of the URL are related.
It would be much better if there was a DSL that simultaneously expressed how to parse and how to print. I have not worked out how to do that yet though -- it is somewhat tricky.
However, the quasiquote stuff looks potentially promising as a way of expressing the parsing and printing in a single step...
- jeremy
I'm glad to hear someone else finds writing the same data twice to be error-prone and redundant. If we get this core out there, I'll happily split my mkResources quasi-quoter from Yesod and make it available as a standalone package. By the way, should we think of something more descriptive than urlt? Michael

As an example of both a unified URL creation framework and persistence framework, I've put together a little example of how we could create an "authentication plugin." For the purposes of our discussion here, we could ignore the persistence piece for now, though I would like to eventually discuss how we could make that better. I wrote a small blog post[1] describing the system. The code relevant for our discussion is broken into two files: WebPlug.hs[2] defines the interface and auth-example.hs[3] is the actual example. In this version of WebPlug.hs, I've included WebPlug as a datatype instead of a typeclass. I don't actually *use* that datatype here, but I think it would be very useful for higher-level utilities like the quasi-quoter to be able to access the three related functions together. Michael [1] http://www.snoyman.com/blog/entry/persistent-plugs/ [2] http://github.com/snoyberg/persistent/blob/master/WebPlug.hs [3] http://github.com/snoyberg/persistent/blob/master/auth-example.hs

On Fri, Mar 19, 2010 at 4:22 PM, Michael Snoyman
As an example of both a unified URL creation framework and persistence framework, I've put together a little example of how we could create an "authentication plugin." For the purposes of our discussion here, we could ignore the persistence piece for now, though I would like to eventually discuss how we could make that better.
Yeah, I gotta finish the urlt stuff first before I think about something else ;)
I wrote a small blog post[1] describing the system. The code relevant for our discussion is broken into two files: WebPlug.hs[2] defines the interface and auth-example.hs[3] is the actual example.
In this version of WebPlug.hs, I've included WebPlug as a datatype instead of a typeclass. I don't actually *use* that datatype here, but I think it would be very useful for higher-level utilities like the quasi-quoter to be able to access the three related functions together.
Right. I already have a similar datatype in URLT.HandleT. My type also includes a 'defaultPage' type which can be used to specify what value "/" should be mapped to. Though, in mine, the dispatch / handleLink function is based on URLT, but that can probably be generalized. As a bonus you also get a Functor instance, and a runSite function that uses the type.. You should really check out URLT sometime :p I am not going to have time to look at this again until Saturday or Sunday. There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all. - jeremy

On Fri, Mar 19, 2010 at 4:42 PM, Jeremy Shaw
Right. I already have a similar datatype in URLT.HandleT. My type also includes a 'defaultPage' type which can be used to specify what value "/" should be mapped to. Though, in mine, the dispatch / handleLink function is based on URLT, but that can probably be generalized. As a bonus you also get a Functor instance, and a runSite function that uses the type..
Actually, you already supply handleWebPlug, so the only real bonus is the Functor instance.. which is only there because it is based on URLT. - jeremy

On Fri, Mar 19, 2010 at 2:42 PM, Jeremy Shaw
On Fri, Mar 19, 2010 at 4:22 PM, Michael Snoyman
wrote: As an example of both a unified URL creation framework and persistence framework, I've put together a little example of how we could create an "authentication plugin." For the purposes of our discussion here, we could ignore the persistence piece for now, though I would like to eventually discuss how we could make that better.
Yeah, I gotta finish the urlt stuff first before I think about something else ;)
I'd prefer to do that too in general, but I'm going to be doing a project next that involves a lot of DB code. I already did one site that used direct SQL generation, and I'd really rather avoid going that route again. I wrote a small blog post[1] describing the system. The code relevant for
our discussion is broken into two files: WebPlug.hs[2] defines the interface and auth-example.hs[3] is the actual example.
In this version of WebPlug.hs, I've included WebPlug as a datatype instead of a typeclass. I don't actually *use* that datatype here, but I think it would be very useful for higher-level utilities like the quasi-quoter to be able to access the three related functions together.
Right. I already have a similar datatype in URLT.HandleT. My type also includes a 'defaultPage' type which can be used to specify what value "/" should be mapped to. Though, in mine, the dispatch / handleLink function is based on URLT, but that can probably be generalized. As a bonus you also get a Functor instance, and a runSite function that uses the type..
You should really check out URLT sometime :p
I thought I had, just didn't realize that HandleT was relevant to the low-level bit. I'll try to get through the rest of urlt today.
I am not going to have time to look at this again until Saturday or Sunday. There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all.
Just to confuse the topic even more: if we do real URL encoding/decoding, I believe we would have to assume a certain character set. I had to deal with a site that was encoded in non-UTF8 just a bit ago, and dealing with query parameters is not fun.
That said, perhaps we should consider making the type of PathInfo "PathInfo ByteString" so we make it clear that we're doing no character encoding. Another issue in the same vein is dealing with leading and trailing slashes, though I think this is fairly simple in practice: the web app knows what to do about the trailing slashes, and each plugin should always pass a leading slash. Michael

On Fri, Mar 19, 2010 at 5:22 PM, Michael Snoyman
I am not going to have time to look at this again until Saturday or Sunday.
There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all.
Just to confuse the topic even more: if we do real URL encoding/decoding, I believe we would have to assume a certain character set. I had to deal with a site that was encoded in non-UTF8 just a bit ago, and dealing with query parameters is not fun.
That said, perhaps we should consider making the type of PathInfo "PathInfo ByteString" so we make it clear that we're doing no character encoding.
Yeah. I dunno. I just know it needs to be solved :)
Another issue in the same vein is dealing with leading and trailing slashes, though I think this is fairly simple in practice: the web app knows what to do about the trailing slashes, and each plugin should always pass a leading slash.
I am not quite sure what you mean 'each plugin should always pass a leading slash'. Pass to whom? If we have: MySite = MyHome | MyBlog Blog MyBlog = BlogHome | BlogPost String Then I would expect something like this: formatMySite MyHome = "MyHome" formatMySite (MyBlog blog) = "MyBlog/" ++ formatMyBlog blog formatMyBlog BlogHome = "BlogHome" formatMyBlog (BlogPost title) = "BlogPost/" ++ title mkAbs = ("http://localhost:3000/" ++) (ignoring any escaping that needs to happen in title, and ignoring an AbsPath / PathInfo stuff). But we could, of course, do it the other way: formatMySite MyHome = "/MyHome" formatMySite (MyBlog blog) = "/MyBlog" ++ formatMyBlog blog formatMyBlog BlogHome = "/BlogHome" formatMyBlog (BlogPost title) = "/BlogPost/" ++ title mkAbs = ("http://localhost:3000" ++) There definitely needs to be some policy. - jeremy

On Fri, Mar 19, 2010 at 2:41 PM, Jeremy Shaw
On Fri, Mar 19, 2010 at 5:22 PM, Michael Snoyman
wrote: I am not going to have time to look at this again until Saturday or
Sunday. There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all.
Just to confuse the topic even more: if we do real URL encoding/decoding, I believe we would have to assume a certain character set. I had to deal with a site that was encoded in non-UTF8 just a bit ago, and dealing with query parameters is not fun.
That said, perhaps we should consider making the type of PathInfo "PathInfo ByteString" so we make it clear that we're doing no character encoding.
Yeah. I dunno. I just know it needs to be solved :)
Another issue in the same vein is dealing with leading and trailing slashes, though I think this is fairly simple in practice: the web app knows what to do about the trailing slashes, and each plugin should always pass a leading slash.
I am not quite sure what you mean 'each plugin should always pass a leading slash'. Pass to whom?
If we have:
MySite = MyHome | MyBlog Blog MyBlog = BlogHome | BlogPost String
Then I would expect something like this:
formatMySite MyHome = "MyHome" formatMySite (MyBlog blog) = "MyBlog/" ++ formatMyBlog blog
formatMyBlog BlogHome = "BlogHome" formatMyBlog (BlogPost title) = "BlogPost/" ++ title
mkAbs = ("http://localhost:3000/" ++)
(ignoring any escaping that needs to happen in title, and ignoring an AbsPath / PathInfo stuff).
But we could, of course, do it the other way:
formatMySite MyHome = "/MyHome" formatMySite (MyBlog blog) = "/MyBlog" ++ formatMyBlog blog
formatMyBlog BlogHome = "/BlogHome" formatMyBlog (BlogPost title) = "/BlogPost/" ++ title
mkAbs = ("http://localhost:3000" ++)
There definitely needs to be some policy.
- jeremy
Then here's a proposal for both issues at once: * PathInfo is a ByteString * handleWai strips the leading slash from the path-info * every component parses and generates URLs without a leading slash. Trailing slash is application's choice. Regarding URL encoding, let me point out that the following are two different URLs (just try clicking on them): http://www.snoyman.com/blog/entry/persistent-plugs/ http://www.snoyman.com/blog/entry%2Fpersistent-plugs/http://www.snoyman.com/blog/entry/persistent-plugs/ In other words, if we ever URL-decode the string before it reaches the application, we will have conflated unique URLs. I see two options here: * We specify that PathInfo contains URL-encoded values. Any fromUrl/toUrl functions must be aware of this fact. * We change the type of PathInfo to [ByteString], where we split the PathInfo by slashes, and specify that the pieces of *not* URL-encoded. In order to preserve perfectly the original value, we should not combine adjacent delimiters. In other words: /foo/bar/baz/ -> ["foo", "bar", "baz", ""] -- note the trailing empty string /foo/bar/baz -> ["foo", "bar", "baz"] -- we don't need a leading empty string; *every* pathinfo begins with a slash /foo%2Fbar/baz/ -> ["foo/bar", "baz", ""] /foo//bar/baz -> ["foo", "", "bar", "baz] I'm not strongly attached to any of this. Also, my original motivation for breaking up the pieces (easier pattern matching) will be mitigated by the usage of ByteStrings. Michael

ok, here is what I have found out so far. First, I tested 3 html generation libraries to see if they do any escaping on the arguments passed to href (Text.Html, Text.XHtml, and HSP): {-# OPTIONS -F -pgmFtrhsx #-} module Main where import System.IO import qualified Text.Html as H import qualified Text.XHtml as X import HSP import HSP.Identity import HSP.HTML main :: IO () main = do hSetEncoding stdout utf8 let nihongo = "日本語" putStrLn nihongo putStrLn $ H.renderHtml $ H.anchor H.! [H.href nihongo] H.<< (H.toHtml "nihongo") putStrLn $ X.renderHtml $ X.anchor X.! [X.href nihongo] X.<< (X.toHtml "nihongo") putStrLn $ renderAsHTML $ evalIdentity $ <a href=nihongo>nihongo</a> The output produced was: *Main Text.Html System.IO> main 日本語 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 FINAL//EN"> <!--Rendered using the Haskell Html Library v0.2-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> http://www.w3.org/1999/xhtml"
So, none of them attempted to convert the String into a valid URL. The
XHtml library did make an attempt to encode the string, but that encoding
does not really make it a valid URL. (And the other two utf-8 encoded the
string, because they utf-8 encoded the whole document -- which is the
correct thing to do).
The behavior of these libraries seems correct -- if they attempted to do
more url encoding, I think that would just make things worse.
Next there is the question of what are you supposed to do with non-ASCII
characters in a URI? This is describe in section 2.1 of RFC 2396:
http://www.ietf.org/rfc/rfc2396.txt
The relationship between URI and characters has been a source of
confusion for characters that are not part of US-ASCII. To describe
the relationship, it is useful to distinguish between a "character"
(as a distinguishable semantic entity) and an "octet" (an 8-bit
byte). There are two mappings, one from URI characters to octets, and
a second from octets to original characters:
URI character sequence->octet sequence->original character sequence
A URI is represented as a sequence of characters, not as a sequence
of octets. That is because URI might be "transported" by means that
are not through a computer network, e.g., printed on paper, read over
the radio, etc.
So a URI is a character sequence (of a restricted set of characters that are
found in ASCII). A URI does not have a 'binary representation', because it
could be transmitted via non-binary forms, such as a business card, etc. It
is the characters that matter. A uri that has been utf-8 encoded and utf-16
encoded is still the same uri because the characters represented by those
encodings are the same.
So, there is actually another little piece missing in that sequence when
data is transmitted via the computer. Namely, extracting the URI from the
raw octets.
raw octets for uri -> URI character sequence -> octet sequence -> original
character sequence
For example, let's pretend a web page was sent as: Content-type: text/html;
charset=utf-32
The utf-32 octets representing the uri must first be decoded to characters
(aka the uri character sequence). That seems outside the scope of URLT ..
that stage of decoding should be done before URLT gets the data because it
requires looking at HTTP headers, the meta-equiv tag, etc. Next we can
convert the uri sequence into a new sequence of octets representing 8-bit
encoded data. That is done by converting normal ascii characters to their
8-bit ascii equivalent, and by converting % encoded values to their
equivalent 8-bit values. so the character 'a' in the URI would be converted
to 0x61, and the sequence %35 would be converted to 0x35. Next the binary
data is converted to the original character sequence.
There are a few things that make this tricky.
1. the encoding of the octet sequence in the middle is not specified in the
uri. So when you are converting back to the original character sequence you
don't know if octet sequence represents ascii, utf-8, or something else.
2. normalization and reserved characters
Every character *can* be percent encoded, though your are only supposed to
percent encode a limited set. URL normalization dictates that the following
three URIs are equivalent:
http://example.com:80/~smith/home.html
http://EXAMPLE.com/%7Esmith/home.html
http://EXAMPLE.com:/%7esmith/home.html
The %7E and ~ are equal, because ~ is *not* a reserved character. But
/foo/bar/baz/
/foo%2Fbar/baz/
are *not* equal because / is a reserved character.
RFC3986 has this to say about when to encode and decode:
2.4. When to Encode or Decode
Under normal circumstances, the only time when octets within a URI
are percent-encoded is during the process of producing the URI from
its component parts. This is when an implementation determines which
of the reserved characters are to be used as subcomponent delimiters
and which can be safely used as data. Once produced, a URI is always
in its percent-encoded form.
When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time. For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"~" without changing its interpretation.
Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI. Implementations must not
percent-encode or decode the same string more than once, as decoding
an already decoded string might lead to misinterpreting a percent
data octet as the beginning of a percent-encoding, or vice versa in
the case of percent-encoding an already percent-encoded string.
It also has this to say about encoding Unicode data:
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".
I can't find an official stamp of approval, but I believe the http scheme
now specifies that the octets in the middle step are utf-8 encoded.
So, here is a starting example of what I think should happen for encoding,
and then decoding.
1. We start with a list of path components ["foo/bar","baz"]
2. We then convert the sequence to a String containing the utf-8 encoded
octets (a String not a bytestring)
3. We percent encode everything that is not an unreserved character
4. We add the delimiters
We now have a proper URI. Note that we have a String and that the URI is
made up of the characters in that String. The final step happens when the
URI is actually used:
5. the URI is inserted into an HTML document (etc). The document is this
encoded according to whatever encoding the document is supposed to have
(could be anything), converting the URI into some encoding.
So a URI is actually encoded twice. We use a similar process to decode the
URI. Here is some code that does what I described:
import Codec.Binary.UTF8.String (encodeString, decodeString)
import Network.URI
import System.FilePath.Posix (joinPath, splitDirectories)
encodeUrl :: [String] -> String
encodeUrl paths =
let step1 = map encodeString paths -- utf-8 encode the data characters in
path components (we have not added any delimiters yet)
step2 = map (escapeURIString isUnreserved) step1 -- percent encode the
characters
step3 = joinPath step2 -- add in the delimiters
in step3
decodeUrl :: String -> [String]
decodeUrl str =
let step1 = splitDirectories str -- split path on delimiters
step2 = map unEscapeString step1 -- decode any percent encoded
characters
step3 = map decodeString step2 -- decode octets
in step3
f = encodeString "日本語"
test =
let p = ["foo/bar", "日本語"]
e = encodeUrl p
d = decodeUrl e
in (d == p, p, e ,d)
The problem with using [String] is that it assumes the only delimiter we
care about is '/'. But we might also want to deal with the other delimiters
such as : # ?. (For example, if we want to use the urlt system to generate
the query string as well as the path..). But [String] does not give us a way
to do that. Instead it seems like we would need a type that would allow us
to specify the path, the query string, the fragment, etc. namely a real uri
type? Perhaps there is something on hackage we can leverage.
I think that having each individual set of toUrl / fromUrl functions deal
with the encoding / decoding is not a good way to go. Makes it too easy to
get it wrong. Having it all done correctly in one place makes life easier
for people adding new instances or methods of generating instances.
I think that urlt dealing with ByteString or [ByteString] is never the right
thing. The only time that the URI is a 'byte string' is when it is encoded
in an html document, or encoded in the http headers. But at the URLT level,
we don't know what encoding that is. Instead we want the bytestring decoded,
and we want to receive a 'URI character sequence.' Or we want to give a 'URI
character sequence' to a the html library, and let it worry about the
encoding of the document.
At present, I think I am still ok with the fromURL and toURL functions
producing and consuming String values. But, what we need is an intermediate
URL type like:
data URL = URL { paths :: [String], queryString :: String :: frag :: String
}
and functions that properly do, encodeURL :: URL -> String, decodeURL ::
String -> URL.
The AsURL class would look like:
class AsURL u where
toURLC :: u -> URL
fromURLC :: URL -> Failing u
instance AsURL URL where
toURLC = id
fromURLC = Success
And then toURL / fromURL would be like:
toURL :: (AsURL u) => u -> String
toURL = encodeURL . toURLC
fromURL :: (AsURL u) => String -> u
fromURL = fromURLC . decodeURL
The Strings in the URL type would not require any special encoding/decoding.
The encoding / decoding would be handled by the encodeURL / decodeURL
functions.
In other words, when the user creates a URL type by hand, they do not have
to know anything about url encoding rules, it just happens like magic. That
should make it much easier to write AsURL instances by hand.
Does this makes sense to you?
The key now is seeing if someone has already create a suitable URL type that
we can use...
- jeremy
On Fri, Mar 19, 2010 at 5:55 PM, Michael Snoyman
http://www.ietf.org/rfc/rfc2396.txt
On Fri, Mar 19, 2010 at 2:41 PM, Jeremy Shaw
wrote: On Fri, Mar 19, 2010 at 5:22 PM, Michael Snoyman
wrote: I am not going to have time to look at this again until Saturday or
Sunday. There are a few minor details that have been swept under the rug that need to be addressed. For example, when exactly does should url encoding / decoding take place. It's not good if that happens twice or not at all.
Just to confuse the topic even more: if we do real URL encoding/decoding, I believe we would have to assume a certain character set. I had to deal with a site that was encoded in non-UTF8 just a bit ago, and dealing with query parameters is not fun.
That said, perhaps we should consider making the type of PathInfo "PathInfo ByteString" so we make it clear that we're doing no character encoding.
Yeah. I dunno. I just know it needs to be solved :)
Another issue in the same vein is dealing with leading and trailing slashes, though I think this is fairly simple in practice: the web app knows what to do about the trailing slashes, and each plugin should always pass a leading slash.
I am not quite sure what you mean 'each plugin should always pass a leading slash'. Pass to whom?
If we have:
MySite = MyHome | MyBlog Blog MyBlog = BlogHome | BlogPost String
Then I would expect something like this:
formatMySite MyHome = "MyHome" formatMySite (MyBlog blog) = "MyBlog/" ++ formatMyBlog blog
formatMyBlog BlogHome = "BlogHome" formatMyBlog (BlogPost title) = "BlogPost/" ++ title
mkAbs = ("http://localhost:3000/" ++)
(ignoring any escaping that needs to happen in title, and ignoring an AbsPath / PathInfo stuff).
But we could, of course, do it the other way:
formatMySite MyHome = "/MyHome" formatMySite (MyBlog blog) = "/MyBlog" ++ formatMyBlog blog
formatMyBlog BlogHome = "/BlogHome" formatMyBlog (BlogPost title) = "/BlogPost/" ++ title
mkAbs = ("http://localhost:3000" ++)
There definitely needs to be some policy.
- jeremy
Then here's a proposal for both issues at once:
* PathInfo is a ByteString * handleWai strips the leading slash from the path-info * every component parses and generates URLs without a leading slash. Trailing slash is application's choice.
Regarding URL encoding, let me point out that the following are two different URLs (just try clicking on them):
http://www.snoyman.com/blog/entry/persistent-plugs/ http://www.snoyman.com/blog/entry%2Fpersistent-plugs/http://www.snoyman.com/blog/entry/persistent-plugs/
In other words, if we ever URL-decode the string before it reaches the application, we will have conflated unique URLs. I see two options here:
* We specify that PathInfo contains URL-encoded values. Any fromUrl/toUrl functions must be aware of this fact. * We change the type of PathInfo to [ByteString], where we split the PathInfo by slashes, and specify that the pieces of *not* URL-encoded. In order to preserve perfectly the original value, we should not combine adjacent delimiters. In other words:
/foo/bar/baz/ -> ["foo", "bar", "baz", ""] -- note the trailing empty string /foo/bar/baz -> ["foo", "bar", "baz"] -- we don't need a leading empty string; *every* pathinfo begins with a slash /foo%2Fbar/baz/ -> ["foo/bar", "baz", ""] /foo//bar/baz -> ["foo", "", "bar", "baz]
I'm not strongly attached to any of this. Also, my original motivation for breaking up the pieces (easier pattern matching) will be mitigated by the usage of ByteStrings.
Michael

ok, here is what I have found out so far. First, I tested 3 html generation libraries to see if they do any escaping on the arguments passed to href (Text.Html, Text.XHtml, and HSP):
{-# OPTIONS -F -pgmFtrhsx #-} module Main where
import System.IO import qualified Text.Html as H import qualified Text.XHtml as X import HSP import HSP.Identity import HSP.HTML
main :: IO () main = do hSetEncoding stdout utf8 let nihongo = "日本語" putStrLn nihongo putStrLn $ H.renderHtml $ H.anchor H.! [H.href nihongo] H.<< (H.toHtml "nihongo") putStrLn $ X.renderHtml $ X.anchor X.! [X.href nihongo] X.<< (X.toHtml "nihongo") putStrLn $ renderAsHTML $ evalIdentity $ <a href=nihongo>nihongo</a>
The output produced was:
*Main Text.Html System.IO> main 日本語 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 FINAL//EN"> <!--Rendered using the Haskell Html Library v0.2-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> http://www.w3.org/1999/xhtml"
So, none of them attempted to convert the String into a valid URL. The XHtml library did make an attempt to encode the string, but that encoding does not really make it a valid URL. (And the other two utf-8 encoded the string, because they utf-8 encoded the whole document -- which is the correct thing to do).
The behavior of these libraries seems correct -- if they attempted to do more url encoding, I think that would just make things worse.
Next there is the question of what are you supposed to do with non-ASCII characters in a URI? This is describe in section 2.1 of RFC 2396:
http://www.ietf.org/rfc/rfc2396.txt
The relationship between URI and characters has been a source of confusion for characters that are not part of US-ASCII. To describe the relationship, it is useful to distinguish between a "character" (as a distinguishable semantic entity) and an "octet" (an 8-bit byte). There are two mappings, one from URI characters to octets, and a second from octets to original characters:
URI character sequence->octet sequence->original character sequence
A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.
So a URI is a character sequence (of a restricted set of characters that are found in ASCII). A URI does not have a 'binary representation', because it could be transmitted via non-binary forms, such as a business card, etc. It is the characters that matter. A uri that has been utf-8 encoded and utf-16 encoded is still the same uri because the characters represented by those encodings are the same.
So, there is actually another little piece missing in that sequence when data is transmitted via the computer. Namely, extracting the URI from the raw octets.
raw octets for uri -> URI character sequence -> octet sequence -> original character sequence
For example, let's pretend a web page was sent as: Content-type: text/html; charset=utf-32
The utf-32 octets representing the uri must first be decoded to characters (aka the uri character sequence). That seems outside the scope of URLT .. that stage of decoding should be done before URLT gets the data because it requires looking at HTTP headers, the meta-equiv tag, etc. Next we can convert the uri sequence into a new sequence of octets representing 8-bit encoded data. That is done by converting normal ascii characters to their 8-bit ascii equivalent, and by converting % encoded values to their equivalent 8-bit values. so the character 'a' in the URI would be converted to 0x61, and the sequence %35 would be converted to 0x35. Next the binary data is converted to the original character sequence.
There are a few things that make this tricky.
1. the encoding of the octet sequence in the middle is not specified in the uri. So when you are converting back to the original character sequence you don't know if octet sequence represents ascii, utf-8, or something else.
2. normalization and reserved characters
Every character *can* be percent encoded, though your are only supposed to percent encode a limited set. URL normalization dictates that the following three URIs are equivalent:
http://example.com:80/~smith/home.html http://EXAMPLE.com/%7Esmith/home.html http://EXAMPLE.com:/%7esmith/home.html
The %7E and ~ are equal, because ~ is *not* a reserved character. But
/foo/bar/baz/ /foo%2Fbar/baz/
are *not* equal because / is a reserved character.
RFC3986 has this to say about when to encode and decode:
2.4. When to Encode or Decode
Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form.
When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string.
It also has this to say about encoding Unicode data:
When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2".
I can't find an official stamp of approval, but I believe the http scheme now specifies that the octets in the middle step are utf-8 encoded.
So, here is a starting example of what I think should happen for encoding, and then decoding.
1. We start with a list of path components ["foo/bar","baz"] 2. We then convert the sequence to a String containing the utf-8 encoded octets (a String not a bytestring) 3. We percent encode everything that is not an unreserved character 4. We add the delimiters
We now have a proper URI. Note that we have a String and that the URI is made up of the characters in that String. The final step happens when the URI is actually used:
5. the URI is inserted into an HTML document (etc). The document is this encoded according to whatever encoding the document is supposed to have (could be anything), converting the URI into some encoding.
So a URI is actually encoded twice. We use a similar process to decode the URI. Here is some code that does what I described:
import Codec.Binary.UTF8.String (encodeString, decodeString) import Network.URI import System.FilePath.Posix (joinPath, splitDirectories)
encodeUrl :: [String] -> String encodeUrl paths = let step1 = map encodeString paths -- utf-8 encode the data characters in path components (we have not added any delimiters yet) step2 = map (escapeURIString isUnreserved) step1 -- percent encode the characters step3 = joinPath step2 -- add in the delimiters in step3
decodeUrl :: String -> [String] decodeUrl str = let step1 = splitDirectories str -- split path on delimiters step2 = map unEscapeString step1 -- decode any percent encoded characters step3 = map decodeString step2 -- decode octets in step3
f = encodeString "日本語"
test = let p = ["foo/bar", "日本語"] e = encodeUrl p d = decodeUrl e in (d == p, p, e ,d)
The problem with using [String] is that it assumes the only delimiter we care about is '/'. But we might also want to deal with the other delimiters such as : # ?. (For example, if we want to use the urlt system to generate the query string as well as the path..). But [String] does not give us a way to do that. Instead it seems like we would need a type that would allow us to specify the path, the query string, the fragment, etc. namely a real uri type? Perhaps there is something on hackage we can leverage.
I think that having each individual set of toUrl / fromUrl functions deal with the encoding / decoding is not a good way to go. Makes it too easy to get it wrong. Having it all done correctly in one place makes life easier for people adding new instances or methods of generating instances.
I think that urlt dealing with ByteString or [ByteString] is never the right thing. The only time that the URI is a 'byte string' is when it is encoded in an html document, or encoded in the http headers. But at the URLT level, we don't know what encoding that is. Instead we want the bytestring decoded, and we want to receive a 'URI character sequence.' Or we want to give a 'URI character sequence' to a the html library, and let it worry about the encoding of the document.
At present, I think I am still ok with the fromURL and toURL functions producing and consuming String values. But, what we need is an intermediate URL type like:
data URL = URL { paths :: [String], queryString :: String :: frag :: String }
and functions that properly do, encodeURL :: URL -> String, decodeURL :: String -> URL.
The AsURL class would look like:
class AsURL u where toURLC :: u -> URL fromURLC :: URL -> Failing u
instance AsURL URL where toURLC = id fromURLC = Success
And then toURL / fromURL would be like:
toURL :: (AsURL u) => u -> String toURL = encodeURL . toURLC
fromURL :: (AsURL u) => String -> u fromURL = fromURLC . decodeURL
The Strings in the URL type would not require any special encoding/decoding. The encoding / decoding would be handled by the encodeURL / decodeURL functions.
In other words, when the user creates a URL type by hand, they do not have to know anything about url encoding rules, it just happens like magic. That should make it much easier to write AsURL instances by hand.
Does this makes sense to you?
The key now is seeing if someone has already create a suitable URL type that we can use...
That made perfect sense, thank you for doing such thorough research on
2010/3/20 Jeremy Shaw

On Sun, Mar 21, 2010 at 12:04 AM, Michael Snoyman
That made perfect sense, thank you for doing such thorough research on this.
I've attached two files; test1.html is UTF-8 encoded, test3.html is windows-1255 (Hebrew). On my system, both links point to the same location, implying to me that you are spot on that UTF-8 should always be used for URLs. I had made a mistake with my test on Friday; apparently we only have the encoding issue with the query string.
Hmm. Those files do not contain value urls. The strings in the hrefs contain characters that are not in the limited set allowed by the URI spec. The part that is true is that even though the files have different encodings (utf-8 vs windows-1255) the characters in the strings are the same, so the urls are the same. I guess maybe the reason you put in invalid characters is because it is hard to test whether different encodings matter if you are only testing characters that are represented by the same octets in both encodings. Regarding your encoding issue with the query string. I believe there may have been 'nothing wrong'. At the URI level there is no specification as to how the query string is to be interpreted, or what underlying charset it should be associated with. It does have the requirement that it can only contain a limited set up characters, and that other characters must be converted to octets and then percent encoded. Now, things get interesting when you look at forms and application/x-www-form-urlencoded. When you create a form you have a form element that looks something like this: <form action="/submit" method=POST enctype="application/x-www-form-urlencoded;charset=utf-8">...</form> Except internet explorer, and a bunch of servers get stupid if you actually set the charset=utf-8. So the de facto standard is that the form is submitted using the same character encoding as the page it came from. So if the <head> contains <meta charset="windows-1255">, then the form data will be encoded as windows-1255, converted to octets, and then percent encoded, plus the other things that url encoding does (such as + for spaces). You can also add the, accept-charset="utf-8" if you want to override the default and have the form submit some other character encoding. Not sure how widely supported that is. Now, if we were to change the method=POST to method=GET, then the urlencoded data would be passed as a query string, with its windows-1255 encoded payload. And that is perfectly valid. So, the choice of how to encode the pathInfo and query string is pretty much application specific. For the URLT stuff we are both generating and parsing the path components, so we can choose whatever encoding we want -- with utf-8 being a good choice.
Now, back to your point: I'm not sure why you want to include the query string and fragment as part of the URL. Regarding the fragment: it will never be passed to the server, so it's *impossible* to consider it for parsing URLs. I understand that you might want to generate URLs with a fragment, but we would then need to have parse and render functions which do not parallel each other properly.
Right. I forgot about how fragments actually work.
Regarding the query string, I can see more of an argument being made to include it, but it feels wrong to me. Precedence in most places does not allow you to route requests based on the query string, and this seems like a Good Idea. I know it would be nice to be guaranteed that there is a certain GET parameter present, but I really think this should be dealt with at the handler level.
What do you mean by 'precedence' ? Including query string in urlt is certainly nice for some contexts. For example: data UserURL = AllUsers SortOrder data SortOrder = Asc | Desc Here the sort order is required. But the sort order does not really add hiearchy to the system, so it belongs more in the query string and less in the path. We might want a URL like: /allusers?sortOrder=asc Now let's say we wrap that up in a larger site: data SiteURL = Users UserURL The Users constructor is adding hierarchy, so it shouldn't be modifying the query string. So it will just add something like: /users/allusers?sortOrder=asc So only the last component gets to add a query string. The big trip up would be forms with method GET. The form submission is handled by taking the form set data, encoding it as application/x-www-form-urlencoded, and then append ? and the encoded data to the end of the action. If the action already contained a ?, that would not work out. So, the toUrl / fromUrl instances would have to know if the url was going to be used as the target for an action and prohibit the use of a query string. That could be tricky :-/ Also, in my example, I am handling parameters that are url specific. But many sites might have some sort of global parameters that can be tacked on to every query string. Not really sure how that would work out either.
If we can agree on this, I don't see a necessity to rely on an external package to provide the URL datatype (since we would just be using [String]). I can provide the encodeURL/decodeURL functions in web-encodings if that's acceptable- your implementation seems correct to me. However, since it does not function on fully-qualified URLs, perhaps we should call it encodePathInfo/decodePathInfo?
encodePathInfo / decodePathInfo is probably a good choice of names. Adding them to web-encodings is likely useful, but I will just use local copies in urlt, because web-encodings brings in too many extra dependencies that I don't want at that level. I don't think I will export them though, so it should not cause a conflict. Also, my implementation is not quite right. It escapes more characters than is strictly required. path segments have the following ABNF: path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | "," Also, . and .. are allowed in a path segment, but have special meaning. Not sure what we want to do about those. I like the property that *any* String value is automatically escaped and has no special meaning. So the same should be true for '.' and '..'. But if you do need to use '.' and '..' for some reason, there is no mechanism to do it in the current system. Though I am not sure what a compelling use case would be, so I am ok with just not allowing them for now.

On Sun, Mar 21, 2010 at 12:04 AM, Michael Snoyman
wrote: That made perfect sense, thank you for doing such thorough research on this.
I've attached two files; test1.html is UTF-8 encoded, test3.html is windows-1255 (Hebrew). On my system, both links point to the same location, implying to me that you are spot on that UTF-8 should always be used for URLs. I had made a mistake with my test on Friday; apparently we only have the encoding issue with the query string.
Hmm. Those files do not contain value urls. The strings in the hrefs contain characters that are not in the limited set allowed by the URI spec. The part that is true is that even though the files have different encodings (utf-8 vs windows-1255) the characters in the strings are the same, so the urls are the same. I guess maybe the reason you put in invalid characters is because it is hard to test whether different encodings matter if you are only testing characters that are represented by the same octets in both encodings.
Well, you guessed correctly at my reason for constructing the files as I did. Not this is actually relevant to the discussion at hand, I believe that it is valid HTML to put values in the HREF fields that are not in the appropriate character range and assume the web browser will take care of
On Mon, Mar 22, 2010 at 4:20 PM, Jeremy Shaw
Regarding your encoding issue with the query string. I believe there may have been 'nothing wrong'. At the URI level there is no specification as to how the query string is to be interpreted, or what underlying charset it should be associated with. It does have the requirement that it can only contain a limited set up characters, and that other characters must be converted to octets and then percent encoded.
Now, things get interesting when you look at forms and application/x-www-form-urlencoded. When you create a form you have a form element that looks something like this:
<form action="/submit" method=POST enctype="application/x-www-form-urlencoded;charset=utf-8">...</form>
Except internet explorer, and a bunch of servers get stupid if you actually set the charset=utf-8. So the de facto standard is that the form is submitted using the same character encoding as the page it came from. So if the <head> contains <meta charset="windows-1255">, then the form data will be encoded as windows-1255, converted to octets, and then percent encoded, plus the other things that url encoding does (such as + for spaces). You can also add the, accept-charset="utf-8" if you want to override the default and have the form submit some other character encoding. Not sure how widely supported that is.
Now, if we were to change the method=POST to method=GET, then the urlencoded data would be passed as a query string, with its windows-1255 encoded payload. And that is perfectly valid.
So, the choice of how to encode the pathInfo and query string is pretty much application specific. For the URLT stuff we are both generating and parsing the path components, so we can choose whatever encoding we want -- with utf-8 being a good choice.
I agree; the issue of query-string encoding not being under our control is further reason to discourage its inclusion in URLT.
Now, back to your point: I'm not sure why you want to include the query
string and fragment as part of the URL. Regarding the fragment: it will never be passed to the server, so it's *impossible* to consider it for parsing URLs. I understand that you might want to generate URLs with a fragment, but we would then need to have parse and render functions which do not parallel each other properly.
Right. I forgot about how fragments actually work.
Regarding the query string, I can see more of an argument being made to include it, but it feels wrong to me. Precedence in most places does not allow you to route requests based on the query string, and this seems like a Good Idea. I know it would be nice to be guaranteed that there is a certain GET parameter present, but I really think this should be dealt with at the handler level.
What do you mean by 'precedence' ?
I mean I've never seen a system that allows routing based on the query string. In PHP, you create files that match the pathinfo; in Django, you match regexs on the path info; I believe the same is true for Rails. This isn't a proof that this is the Right Thing, merely an observation.
Including query string in urlt is certainly nice for some contexts. For
example:
data UserURL = AllUsers SortOrder
data SortOrder = Asc | Desc
Here the sort order is required. But the sort order does not really add hiearchy to the system, so it belongs more in the query string and less in the path. We might want a URL like:
/allusers?sortOrder=asc
On the other hand, those two possible URLs are not really *unique resources* (to use more RESTful terminology). The sortOrder is not really specifying *what* to return, just *how* to return it. Most well-designed URL schemes would work that way. The badly designed ones, like /user.php?id=5&name=michael&... shouldn't really be considered I think.
Now let's say we wrap that up in a larger site:
data SiteURL = Users UserURL
The Users constructor is adding hierarchy, so it shouldn't be modifying the query string. So it will just add something like:
/users/allusers?sortOrder=asc
So only the last component gets to add a query string.
Not quite sure how we should enforce something like that.
The big trip up would be forms with method GET. The form submission is handled by taking the form set data, encoding it as application/x-www-form-urlencoded, and then append ? and the encoded data to the end of the action. If the action already contained a ?, that would not work out.
You can't have a URL containing a ?; the closest you can come is a URL containing an *escaped* ?, which will simply be absorbed by the [String]
piece of the URL. Unless I'm missing your point here. So, the toUrl / fromUrl instances would have to know if the url was going to
be used as the target for an action and prohibit the use of a query string. That could be tricky :-/
Also, in my example, I am handling parameters that are url specific. But many sites might have some sort of global parameters that can be tacked on to every query string. Not really sure how that would work out either.
If we can agree on this, I don't see a necessity to rely on an external package to provide the URL datatype (since we would just be using [String]). I can provide the encodeURL/decodeURL functions in web-encodings if that's acceptable- your implementation seems correct to me. However, since it does not function on fully-qualified URLs, perhaps we should call it encodePathInfo/decodePathInfo?
encodePathInfo / decodePathInfo is probably a good choice of names. Adding them to web-encodings is likely useful, but I will just use local copies in urlt, because web-encodings brings in too many extra dependencies that I don't want at that level. I don't think I will export them though, so it should not cause a conflict.
I have no problem with that decision, but out of curiosity which dependencies are problematic? The only non-HP packages are failure, safe, text and wai. The only ones which could in theory be eliminated are failure and safe; if there is desire for me to do so, I'll look into it.
Also, my implementation is not quite right. It escapes more characters than is strictly required. path segments have the following ABNF:
path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
Also, . and .. are allowed in a path segment, but have special meaning. Not sure what we want to do about those. I like the property that *any* String value is automatically escaped and has no special meaning. So the same should be true for '.' and '..'. But if you do need to use '.' and '..' for some reason, there is no mechanism to do it in the current system. Though I am not sure what a compelling use case would be, so I am ok with just not allowing them for now.
I'm not sure if they have meaning at the HTTP level. At the HTML level, they specify relative paths, but I don't think they mean anything once it enters HTTP.
Michael

On Mon, Mar 22, 2010 at 9:11 PM, Michael Snoyman
On Mon, Mar 22, 2010 at 4:20 PM, Jeremy Shaw
wrote: On Sun, Mar 21, 2010 at 12:04 AM, Michael Snoyman
wrote: That made perfect sense, thank you for doing such thorough research on this.
I've attached two files; test1.html is UTF-8 encoded, test3.html is windows-1255 (Hebrew). On my system, both links point to the same location, implying to me that you are spot on that UTF-8 should always be used for URLs. I had made a mistake with my test on Friday; apparently we only have the encoding issue with the query string.
Hmm. Those files do not contain value urls. The strings in the hrefs contain characters that are not in the limited set allowed by the URI spec. The part that is true is that even though the files have different encodings (utf-8 vs windows-1255) the characters in the strings are the same, so the urls are the same. I guess maybe the reason you put in invalid characters is because it is hard to test whether different encodings matter if you are only testing characters that are represented by the same octets in both encodings.
Well, you guessed correctly at my reason for constructing the files as I did. Not this is actually relevant to the discussion at hand, I believe that it is valid HTML to put values in the HREF fields that are not in the appropriate character range and assume the web browser will take care of things. </off-topic>
I believe the html 4.0 explicitly states that it is illegal here (though it recommends that user agents do something sensible anyway): http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars The big trip up would be forms with method GET. The form submission is
handled by taking the form set data, encoding it as application/x-www-form-urlencoded, and then append ? and the encoded data to the end of the action. If the action already contained a ?, that would not work out.
You can't have a URL containing a ?; the closest you can come is a URL containing an *escaped* ?, which will simply be absorbed by the [String] piece of the URL. Unless I'm missing your point here.
What I meant is that if the url supplied to the action already had a query string, then something undesirable would probably happen. encodePathInfo / decodePathInfo is probably a good choice of names. Adding
them to web-encodings is likely useful, but I will just use local copies in urlt, because web-encodings brings in too many extra dependencies that I don't want at that level. I don't think I will export them though, so it should not cause a conflict.
I have no problem with that decision, but out of curiosity which dependencies are problematic? The only non-HP packages are failure, safe, text and wai. The only ones which could in theory be eliminated are failure and safe; if there is desire for me to do so, I'll look into it.
Well, I see no reason to make all of urlt depend on failure, safe, text, wai, and web encodings when two small local functions would do the trick. Using the functions from web-encodings would not really increase compatibility / interoperability in any way, and I don't expect a lot a bug fixes that will have to be applied to multiple locations. Remember that I plan to split urlt up into a few pieces soon. I don't want happstack users complaining they have to install wai, or wai users complaining they have to install happstack. Even if happstack is ported to wai, there are extra layers that happstack adds which might benefit from some extra functions in urlt.
is strictly required. path segments have the following ABNF:
path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
Also, . and .. are allowed in a path segment, but have special meaning. Not sure what we want to do about those. I like the property that *any* String value is automatically escaped and has no special meaning. So the same should be true for '.' and '..'. But if you do need to use '.' and '..' for some reason, there is no mechanism to do it in the current system. Though I am not sure what a compelling use case would be, so I am ok with just not allowing them for now.
I'm not sure if they have meaning at the HTTP level. At the HTML level,
Also, my implementation is not quite right. It escapes more characters than they specify relative paths, but I don't think they mean anything once it enters HTTP.
I'm not sure what to do with this information. It is true that they may be normalized by the browser before they are passed to the server. But urlt is being used primarily to create URLs that will be used in HTML pages. So, I think will still have to decide what to do with them.. Also, we shouldn't assume that the client normalized the .. stuff. Perhaps a malicious client won't in the hopes that it can retrieve http://example.com/../../../../etc/passwd or something. - jeremy

On Mon, Mar 22, 2010 at 7:37 PM, Jeremy Shaw
On Mon, Mar 22, 2010 at 9:11 PM, Michael Snoyman
wrote: On Mon, Mar 22, 2010 at 4:20 PM, Jeremy Shaw
wrote: On Sun, Mar 21, 2010 at 12:04 AM, Michael Snoyman
wrote: That made perfect sense, thank you for doing such thorough research on this.
I've attached two files; test1.html is UTF-8 encoded, test3.html is windows-1255 (Hebrew). On my system, both links point to the same location, implying to me that you are spot on that UTF-8 should always be used for URLs. I had made a mistake with my test on Friday; apparently we only have the encoding issue with the query string.
Hmm. Those files do not contain value urls. The strings in the hrefs contain characters that are not in the limited set allowed by the URI spec. The part that is true is that even though the files have different encodings (utf-8 vs windows-1255) the characters in the strings are the same, so the urls are the same. I guess maybe the reason you put in invalid characters is because it is hard to test whether different encodings matter if you are only testing characters that are represented by the same octets in both encodings.
Well, you guessed correctly at my reason for constructing the files as I did. Not this is actually relevant to the discussion at hand, I believe that it is valid HTML to put values in the HREF fields that are not in the appropriate character range and assume the web browser will take care of things. </off-topic>
I believe the html 4.0 explicitly states that it is illegal here (though it recommends that user agents do something sensible anyway):
http://www.w3.org/TR/REC-html40/appendix/notes.html#non-ascii-chars
The big trip up would be forms with method GET. The form submission is
handled by taking the form set data, encoding it as application/x-www-form-urlencoded, and then append ? and the encoded data to the end of the action. If the action already contained a ?, that would not work out.
You can't have a URL containing a ?; the closest you can come is a URL containing an *escaped* ?, which will simply be absorbed by the [String] piece of the URL. Unless I'm missing your point here.
What I meant is that if the url supplied to the action already had a query string, then something undesirable would probably happen.
encodePathInfo / decodePathInfo is probably a good choice of names. Adding
them to web-encodings is likely useful, but I will just use local copies in urlt, because web-encodings brings in too many extra dependencies that I don't want at that level. I don't think I will export them though, so it should not cause a conflict.
I have no problem with that decision, but out of curiosity which dependencies are problematic? The only non-HP packages are failure, safe, text and wai. The only ones which could in theory be eliminated are failure and safe; if there is desire for me to do so, I'll look into it.
Well, I see no reason to make all of urlt depend on failure, safe, text, wai, and web encodings when two small local functions would do the trick. Using the functions from web-encodings would not really increase compatibility / interoperability in any way, and I don't expect a lot a bug fixes that will have to be applied to multiple locations.
Remember that I plan to split urlt up into a few pieces soon. I don't want happstack users complaining they have to install wai, or wai users complaining they have to install happstack. Even if happstack is ported to wai, there are extra layers that happstack adds which might benefit from some extra functions in urlt.
I was asking more in general if people took issue with the dependency list. I agree that URLT should not depend on web-encodings.
Also, my implementation is not quite right. It escapes more characters than
is strictly required. path segments have the following ABNF:
path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
Also, . and .. are allowed in a path segment, but have special meaning. Not sure what we want to do about those. I like the property that *any* String value is automatically escaped and has no special meaning. So the same should be true for '.' and '..'. But if you do need to use '.' and '..' for some reason, there is no mechanism to do it in the current system. Though I am not sure what a compelling use case would be, so I am ok with just not allowing them for now.
I'm not sure if they have meaning at the HTTP level. At the HTML level, they specify relative paths, but I don't think they mean anything once it enters HTTP.
I'm not sure what to do with this information. It is true that they may be normalized by the browser before they are passed to the server. But urlt is being used primarily to create URLs that will be used in HTML pages. So, I think will still have to decide what to do with them.. Also, we shouldn't assume that the client normalized the .. stuff. Perhaps a malicious client won't in the hopes that it can retrieve http://example.com/../../../../etc/passwd or something.
- jeremy
What I meant to say is I think we should just leave the . and .. in the data and let the client deal with it, which I *think* is what you're saying. If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :). Michael

On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about. I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff). The next steps are to: 1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well So take a look and let me know what you think. Especially in regards to #1. Then we can also look into how to extend the yesod mkResources stuff to work with this new code. from a parsing point of view, we almost don't have to do anything, we could just do: [mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |] or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation. - jeremy

On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
wrote: If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about.
I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff).
The next steps are to:
1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well
So take a look and let me know what you think. Especially in regards to #1.
Then we can also look into how to extend the yesod mkResources stuff to work with this new code.
from a parsing point of view, we almost don't have to do anything, we could just do:
[mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |]
or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
I don't have time right now to look at the code, but I will soon (I'm in
On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw

On Thu, Mar 25, 2010 at 6:48 AM, Michael Snoyman
On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw
wrote: On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
wrote: If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about. I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff). The next steps are to: 1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well So take a look and let me know what you think. Especially in regards to #1. Then we can also look into how to extend the yesod mkResources stuff to work with this new code. from a parsing point of view, we almost don't have to do anything, we could just do: [mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |] or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
I don't have time right now to look at the code, but I will soon (I'm in the middle of traveling *again*). Regarding the mkResource issue, I think it's slightly more complicated, and what I was thinking of was a combination of TemplateHaskell and QuasiQuotes to address it: * The quasi-quoted content will follow mostly the same syntax, though allowing you to specify the name of the constructor you wish to assign to each resource pattern. * Since we now will need to have multiple top-level definitions being generated, quasi-quoting alone won't solve the issue. So I will have a quasi-quote function to convert the YAML syntax to a StringObject, and then a TH function to convert the StringObject to: 1) A datatype for the URL. 2) A pair to to/from functions. 3) A dispatch function. This is the point at which having WebPlug/HandleT becomes useful. If that was too vague, just let me know and I'll clarify ;).
Hello, everybody. I haven't been following the entire thread. But I think that typed URLs are great. When I started thinking about typed URLs, I thought about using only data type definition and dropping quosi-quotation entirely in favour of template haskell. I wanted to write some template haskell function to work on data-type declaration, it would use data-constructors names and arguments and build path components from their names. So we will need no new notation to describe application resources, we will never be conserned with URL generator and parser, we will work with our own, carefully defined data-type as it IS a URL. -- Victor Nazarov

Hello Victor,
The current version of urlt already has template haskell code for
automatically generating the url from the data type. There is an example of
that in WaiExample.hs.
I find that the TH is great when developing the application, because it
'just works'. But when the app gets closer to release, I sometimes want to
customize the way the urls look. (for seo, etc). The nice part is that I can
just write some custom instances of PathInfo instead of deriving them, and
all of the other code just works.
Is there something more that you wanted the TH code to do?
- jeremy
On Thu, Mar 25, 2010 at 12:11 PM, Victor Nazarov
On Thu, Mar 25, 2010 at 6:48 AM, Michael Snoyman
wrote: On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw
On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
wrote: If I'm not mistaken, I think that addresses all the issues on the
is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about. I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff). The next steps are to: 1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well So take a look and let me know what you think. Especially in regards to #1. Then we can also look into how to extend the yesod mkResources stuff to work with this new code. from a parsing point of view, we almost don't have to do anything, we could just do: [mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |] or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
I don't have time right now to look at the code, but I will soon (I'm in
middle of traveling *again*). Regarding the mkResource issue, I think it's slightly more complicated, and what I was thinking of was a combination of TemplateHaskell and QuasiQuotes to address it: * The quasi-quoted content will follow mostly the same syntax, though allowing you to specify the name of the constructor you wish to assign to each resource pattern. * Since we now will need to have multiple top-level definitions being generated, quasi-quoting alone won't solve the issue. So I will have a quasi-quote function to convert the YAML syntax to a StringObject, and
wrote: table; the then
a TH function to convert the StringObject to: 1) A datatype for the URL. 2) A pair to to/from functions. 3) A dispatch function. This is the point at which having WebPlug/HandleT becomes useful. If that was too vague, just let me know and I'll clarify ;).
Hello, everybody.
I haven't been following the entire thread. But I think that typed URLs are great. When I started thinking about typed URLs, I thought about using only data type definition and dropping quosi-quotation entirely in favour of template haskell. I wanted to write some template haskell function to work on data-type declaration, it would use data-constructors names and arguments and build path components from their names. So we will need no new notation to describe application resources, we will never be conserned with URL generator and parser, we will work with our own, carefully defined data-type as it IS a URL.
-- Victor Nazarov

On Thu, Mar 25, 2010 at 11:31 PM, Jeremy Shaw
Hello Victor, The current version of urlt already has template haskell code for automatically generating the url from the data type. There is an example of that in WaiExample.hs. I find that the TH is great when developing the application, because it 'just works'. But when the app gets closer to release, I sometimes want to customize the way the urls look. (for seo, etc). The nice part is that I can just write some custom instances of PathInfo instead of deriving them, and all of the other code just works. Is there something more that you wanted the TH code to do?
It sounds just like what I want. Where is the latest code to check it out?
On Thu, Mar 25, 2010 at 12:11 PM, Victor Nazarov
wrote: On Thu, Mar 25, 2010 at 6:48 AM, Michael Snoyman
wrote: On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw
wrote: On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
wrote: If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about. I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff). The next steps are to: 1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well So take a look and let me know what you think. Especially in regards to #1. Then we can also look into how to extend the yesod mkResources stuff to work with this new code. from a parsing point of view, we almost don't have to do anything, we could just do: [mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |] or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
I don't have time right now to look at the code, but I will soon (I'm in the middle of traveling *again*). Regarding the mkResource issue, I think it's slightly more complicated, and what I was thinking of was a combination of TemplateHaskell and QuasiQuotes to address it: * The quasi-quoted content will follow mostly the same syntax, though allowing you to specify the name of the constructor you wish to assign to each resource pattern. * Since we now will need to have multiple top-level definitions being generated, quasi-quoting alone won't solve the issue. So I will have a quasi-quote function to convert the YAML syntax to a StringObject, and then a TH function to convert the StringObject to: 1) A datatype for the URL. 2) A pair to to/from functions. 3) A dispatch function. This is the point at which having WebPlug/HandleT becomes useful. If that was too vague, just let me know and I'll clarify ;).
Hello, everybody.
I haven't been following the entire thread. But I think that typed URLs are great. When I started thinking about typed URLs, I thought about using only data type definition and dropping quosi-quotation entirely in favour of template haskell. I wanted to write some template haskell function to work on data-type declaration, it would use data-constructors names and arguments and build path components from their names. So we will need no new notation to describe application resources, we will never be conserned with URL generator and parser, we will work with our own, carefully defined data-type as it IS a URL.
-- Victor Nazarov
-- Victor Nazarov

The latest version of the code is now at:
http://src.seereason.com/web-routes/
- jeremy
On Fri, Mar 26, 2010 at 5:36 AM, Victor Nazarov
On Thu, Mar 25, 2010 at 11:31 PM, Jeremy Shaw
wrote: Hello Victor, The current version of urlt already has template haskell code for automatically generating the url from the data type. There is an example of that in WaiExample.hs. I find that the TH is great when developing the application, because it 'just works'. But when the app gets closer to release, I sometimes want to customize the way the urls look. (for seo, etc). The nice part is that I can just write some custom instances of PathInfo instead of deriving them, and all of the other code just works. Is there something more that you wanted the TH code to do?
It sounds just like what I want. Where is the latest code to check it out?
On Thu, Mar 25, 2010 at 12:11 PM, Victor Nazarov <
asviraspossible@gmail.com>
wrote:
On Thu, Mar 25, 2010 at 6:48 AM, Michael Snoyman
wrote: On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw
wrote: On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman <
michael@snoyman.com>
wrote:
If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about. I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff). The next steps are to: 1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well So take a look and let me know what you think. Especially in regards to #1. Then we can also look into how to extend the yesod mkResources stuff to work with this new code. from a parsing point of view, we almost don't have to do anything, we could just do: [mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |] or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
I don't have time right now to look at the code, but I will soon (I'm in the middle of traveling *again*). Regarding the mkResource issue, I think it's slightly more complicated, and what I was thinking of was a combination of TemplateHaskell and QuasiQuotes to address it: * The quasi-quoted content will follow mostly the same syntax, though allowing you to specify the name of the constructor you wish to assign to each resource pattern. * Since we now will need to have multiple top-level definitions being generated, quasi-quoting alone won't solve the issue. So I will have a quasi-quote function to convert the YAML syntax to a StringObject, and then a TH function to convert the StringObject to: 1) A datatype for the URL. 2) A pair to to/from functions. 3) A dispatch function. This is the point at which having WebPlug/HandleT becomes useful. If that was too vague, just let me know and I'll clarify ;).
Hello, everybody.
I haven't been following the entire thread. But I think that typed URLs are great. When I started thinking about typed URLs, I thought about using only data type definition and dropping quosi-quotation entirely in favour of template haskell. I wanted to write some template haskell function to work on data-type declaration, it would use data-constructors names and arguments and build path components from their names. So we will need no new notation to describe application resources, we will never be conserned with URL generator and parser, we will work with our own, carefully defined data-type as it IS a URL.
-- Victor Nazarov
-- Victor Nazarov

OK, here are my initial code comments:
* Do we want to move everything into Web.URLT? More to the point, I'm not
sure I see the point of calling this URLT, since it doesn't really require
any monad transformers; maybe we should call it web-routes and then the
module would be Web.Routes?
* I like the PathInfo class and to/fromPathSegments. Perhaps we should
bundle that with the decode/encodePathInfo into a single module?
* I'd like to minimize dependencies as much as possible for the basic
package. The two dependencies I've noticed are Consumer and
applicative-extras. I think the type signatures would be clearer *without*
those packages included, eg:
fromPathSegments :: [String] -> Either ErrMsg a
I'm not certain what exactly the type of ErrMsg should be here; I don't
really have a problem using [String], which would be close to the definition
of Failing.
* I think it's very important to allow users to supply customized 404 pages.
Essentially, we need to augment handleWai (possibly others) with a (ErrMsg
-> Application) parameter.
* It might be nice to have "type WaiSite url = Site url String Application".
By the way, are you certain you want to allow parameterization over the
pathInfo type?
The only packages that I feel qualified to speak about then are urlt and
urlt-wai, and my recommendation would be:
urlt contains decode/encodePathInfo, PathInfo class and related functions,
Site and related functions. If you agree on allowing the parameterization of
404 errors, then also provide a default 404 error.
urlt-wai contains WaiSite, handleWai and related functions.
I have not actually tested the code to make sure it's doing the right thing,
but I'm sure it's perfect and bug-free ;). I'll do thorough testing when I
have more than 10 minutes at the computer.
Michael
PS: In case you're wondering, we're visiting my in-laws in northern
California right now and are driving down to my parents in southern
California in a few hours, thus the erratic schedule...
On Wed, Mar 24, 2010 at 3:36 PM, Jeremy Shaw
On Mon, Mar 22, 2010 at 9:41 PM, Michael Snoyman
wrote: If I'm not mistaken, I think that addresses all the issues on the table; is there anything left to decide? I look forward to seeing a sample URLT :).
There were other issues that came up, but nothing exciting enough to talk about.
I have pushed a patch which I think brings the code up to date in terms of functionality. See WaiExample for a detail of everything that is currently supported (aside from the happstack / hsp stuff).
The next steps are to:
1. change the names of any functions or types that we do not currently like 2. add the haddock documentation 3. split the package into separate packages so that you don't have to pull in extra dependencies that you aren't going to use 4. turn the WaiExample into a literate tutorial / blog post 5. add a (simple) happstack example as well
So take a look and let me know what you think. Especially in regards to #1.
Then we can also look into how to extend the yesod mkResources stuff to work with this new code.
from a parsing point of view, we almost don't have to do anything, we could just do:
[mkResource| "/foo/:int/:int" = \i j -> mySite (Foo i j) |]
or whatever the syntax is. But that does not solve the issue of how to go from (Foo 1 2) back to /foo/1/2 and ensure that it is the inverse operation.
- jeremy

On Thu, Mar 25, 2010 at 12:29 PM, Michael Snoyman
OK, here are my initial code comments:
* Do we want to move everything into Web.URLT? More to the point, I'm not sure I see the point of calling this URLT, since it doesn't really require any monad transformers; maybe we should call it web-routes and then the module would be Web.Routes?
I think Web.Routes is a fine name. I'll make it happen. In the rest of this post I refer to things by the old names, but I do intend to change the module names and rename the package to web-routes.
* I like the PathInfo class and to/fromPathSegments. Perhaps we should bundle that with the decode/encodePathInfo into a single module?
I put PathInfo in a separate module because I am a little dubious of classes these days. I find it a bit annoying that you can only have one PathInfo instance per type. And I think it helps show that using PathInfo is not actually required. But, in practice, I think having less modules is probably a good thing in this case, since it does not affect the dependency chain at all. Just because I *can* put every function in it's own module doesn't mean I should. ;) Also, we probably do want people to provide PathInfo instances, even if they don't have to.. * I'd like to minimize dependencies as much as possible for the basic
package. The two dependencies I've noticed are Consumer and applicative-extras. I think the type signatures would be clearer *without* those packages included, eg:
fromPathSegments :: [String] -> Either ErrMsg a
Except that is not a usable type. fromPathSegments may consume, some, but not all of the path segments. Consider the type: data SiteURL = Foo Int Int fromPathSegments is going to receive the path segments: ["Foo","1","2"] If you wrote a parser by hand, you would want it to look a little something like: do string "Foo" slash i <- fromPathSegments slash j <- fromPathSegments eol return (Foo i j) The key concept here is that when you call fromPathSegments to get the first argument of Foo you need to know how many of the path segments were consumed / are remaining, so you can pass only those segments to the second fromPathSegments. So you really need a type like: fromPathSegments :: [String] -> (Either ErrMsg a, [String]) which outputs the unconsumed path segments. But this is obviously a ripe target for a monad of some sort -- trying keep track of the unconsumed portions by hand seems like it would asking for trouble... The Consumer monad takes care of that and provides the functions you would expect such as, next, peek, and poke. And it seems nice to be able to use Monad, MonadPlus, Applicative, Alternative, etc, for composing fromPathSegments into larger parsers ? But, perhaps there is a better choice of monad, or a better way of dealing with the problem? Or maybe it's not really a problem? I think Failing is a pretty nifty data-type for dealing with errors. But perhaps it is not a big win here.. The #1 thing that makes Failing better than (Either [String] a) is it's Applicative instance. Specifically, Failing will accumulate and return all the errors which have occurred, not the just first failure (which is the behavior for Applicative (Either e)). So for example, let's say you are doing are trying to lookup a bunch of keys from the query string. The key / value pairs in the query string are typically independent of each other. So let's say you do: (,) <$> lookup "foo" <*> lookup "bar" but neither of those keys exist. With Either you will only get the error 'could not find "foo"'. But with Failing you will get the error 'could not find "foo". could not find "bar"'. It is nice to get a report of all the things that are broken, instead of getting only one error at a time, fixing it, and then getting another error, etc. However, I am not sure if this property is all that useful which urlt. If you are trying to parse a url like: (string "Foo" *> Foo) <$> fromPathSegments <*> fromPathSegments And the parsing of "Foo" fails.. then there is no use in finding out if the other segments parse ok -- because they are likely to be garbage. Maybe it failed because it got the string "FOo" instead of "Foo", but more likely it got something completely unrelated like, /bar/c/2.4. So, perhaps Either is a better choice even with out considering dependencies... I think that Applicative / Alternative instances for Either are only defined in transformers in the Control.Monad.Error module -- which is a bit annoying. But we don't actually need those to implement urlt itself. This brings up another detail though. the fromPathSegments / Consumer stuff is basically implementing a parser. Except, unlike something like parsec, we do not keep track of the current position for reporting errors. I wonder if we should perhaps use a slightly richer parser environment. Within a web app, once you got your to/from instances debugged, you will never get a parse error, so having great error messages is not essential. But, for other people linking to your site it could be potentially helpful. Though, it seems like the current error messages out to be sufficient given how short the urls are.. I'm not certain what exactly the type of ErrMsg should be here; I don't
really have a problem using [String], which would be close to the definition of Failing.
* I think it's very important to allow users to supply customized 404 pages. Essentially, we need to augment handleWai (possibly others) with a (ErrMsg -> Application) parameter.
Yeah, there are (at least) two possibilities, add an extra param for the handler. Or bubble the error up to the top: handleWai_1 :: (url -> String) -> (String -> Failing url) -> String -> ([ErrorMsg] -> Application) -> ((url -> String) -> url -> Application) -> Application handleWai_1 fromUrl toUrl approot handleError handler = \request -> do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request case fUrl of (Failure errs) -> handleError errs request (Success url) -> handler (showString approot . fromUrl) url request handleWai_2 :: (url -> String) -> (String -> Failing url) -> String -> ((url -> String) -> url -> Application) -> (Request -> IO (Failing Response)) handleWai_2 fromUrl toUrl approot handler = \request -> do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request case fUrl of (Failure errs) -> return (Failure errs) (Success url) -> fmap Success $ handler (showString approot . fromUrl) url request The second choice is perhaps more flexible. Which do you prefer? In the first option, the handleError function could be a Maybe value -- and if you supply Nothing you get some default 404 page? In happstack we have a third possiblity. The ServerMonad is an instance of MonadPlus so we can throw out the error message and just call mzero: implSite :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m a implSite domain approot siteSpec = do r <- implSite_ domain approot siteSpec case r of (Failure _) -> mzero (Success a) -> return a implSite_ :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m (Failing a) implSite_ domain approot siteSpec = dirs approot $ do rq <- askRq let pathInfo = intercalate "/" (rqPaths rq) f = runSite (domain ++ approot) siteSpec pathInfo case f of (Failure errs) -> return (Failure errs) (Success sp) -> Success <$> (localRq (const $ rq { rqPaths = [] }) sp) then we can do: msum [ implSite "domain" "approot" siteSpec , default404 ] if implSite calls mzero, then the next handler (in this case default404) is tried.
* It might be nice to have "type WaiSite url = Site url String Application". By the way, are you certain you want to allow parameterization over the pathInfo type?
I'm not certain I don't want to allow it... I have a vague notion that I might want to use Text sometimes instead of String. Though if I was really committed to that then I should make toPathInfo and fromPathInfo parameterized over pathInfo as well... So perhaps I will axe it from Site for now. I need to change the name of that type and it's record names too I think.
The only packages that I feel qualified to speak about then are urlt and urlt-wai, and my recommendation would be:
urlt contains decode/encodePathInfo, PathInfo class and related functions, Site and related functions. If you agree on allowing the parameterization of 404 errors, then also provide a default 404 error.
urlt-wai contains WaiSite, handleWai and related functions.
Yeah, that is what I was thinking. urlt would contain what is currently in; URLT.Base URLT.PathInfo URLT.HandleT URLT.Monad URLT.QuickCheck QuickCheck module does not actually depend on QuickCheck, which is nice because QC1 vs QC2 is a big problem right now. It might also be nice to include: URLT.TH with depends on template-haskell. But I am not sure that depending on template-haskell is an issue because template-haskell comes with ghc6, and the code in URLT.TH already handles the breakage that happened with TH 2.4. If I switch to Either instead of Failing I believe the dependencies would be: base, Consumer, template-haskell, network, utf8-string urlt-wai would just include: URLT.Wai - jeremy

On Thu, Mar 25, 2010 at 4:25 PM, Jeremy Shaw
On Thu, Mar 25, 2010 at 12:29 PM, Michael Snoyman
wrote: OK, here are my initial code comments:
* Do we want to move everything into Web.URLT? More to the point, I'm not sure I see the point of calling this URLT, since it doesn't really require any monad transformers; maybe we should call it web-routes and then the module would be Web.Routes?
I think Web.Routes is a fine name. I'll make it happen. In the rest of this post I refer to things by the old names, but I do intend to change the module names and rename the package to web-routes.
* I like the PathInfo class and to/fromPathSegments. Perhaps we should bundle that with the decode/encodePathInfo into a single module?
I put PathInfo in a separate module because I am a little dubious of classes these days. I find it a bit annoying that you can only have one PathInfo instance per type. And I think it helps show that using PathInfo is not actually required. But, in practice, I think having less modules is probably a good thing in this case, since it does not affect the dependency chain at all. Just because I *can* put every function in it's own module doesn't mean I should. ;) Also, we probably do want people to provide PathInfo instances, even if they don't have to..
I also am beginning to share a mistrust of classes; I think I went a little too overboard on them on a few previous packages (namely, convertible-text) and am now having a reaction in the opposite direction. I'm sure one day I'll find the Golden Path...
* I'd like to minimize dependencies as much as possible for the basic
package. The two dependencies I've noticed are Consumer and applicative-extras. I think the type signatures would be clearer *without* those packages included, eg:
fromPathSegments :: [String] -> Either ErrMsg a
Except that is not a usable type. fromPathSegments may consume, some, but not all of the path segments. Consider the type:
data SiteURL = Foo Int Int
fromPathSegments is going to receive the path segments:
["Foo","1","2"]
If you wrote a parser by hand, you would want it to look a little something like:
do string "Foo" slash i <- fromPathSegments slash j <- fromPathSegments eol return (Foo i j)
The key concept here is that when you call fromPathSegments to get the first argument of Foo you need to know how many of the path segments were consumed / are remaining, so you can pass only those segments to the second fromPathSegments.
So you really need a type like:
fromPathSegments :: [String] -> (Either ErrMsg a, [String])
which outputs the unconsumed path segments.
Well, given that as a criterion, I agree with the rest of your analysis entirely. However, I think we're looking at the purpose of fromPathSegments very differently. I'm not quite certain I understand why we would want to output the unconsumed segments; if something is unconsumed, then it seems like it's an invalid URL and should fail.
In your example, if I request "/Foo/5/6/7", fromPathSegments would return (Right (Foo 5 6), ["7"]); but what is going to consume that 7 now? The use case I envisioned for something like this is: data BlogRoutes = ... data MySite = MyHome | MyBlog BlogRoutes fromPathSegments ("blog":rest) = MyBlog `fmap` fromPathSegments
But this is obviously a ripe target for a monad of some sort -- trying keep track of the unconsumed portions by hand seems like it would asking for trouble...
The Consumer monad takes care of that and provides the functions you would expect such as, next, peek, and poke. And it seems nice to be able to use Monad, MonadPlus, Applicative, Alternative, etc, for composing fromPathSegments into larger parsers ?
But, perhaps there is a better choice of monad, or a better way of dealing with the problem? Or maybe it's not really a problem?
I think Failing is a pretty nifty data-type for dealing with errors. But perhaps it is not a big win here.. The #1 thing that makes Failing better than (Either [String] a) is it's Applicative instance. Specifically, Failing will accumulate and return all the errors which have occurred, not the just first failure (which is the behavior for Applicative (Either e)).
So for example, let's say you are doing are trying to lookup a bunch of keys from the query string. The key / value pairs in the query string are typically independent of each other. So let's say you do:
(,) <$> lookup "foo" <*> lookup "bar"
but neither of those keys exist. With Either you will only get the error 'could not find "foo"'. But with Failing you will get the error 'could not find "foo". could not find "bar"'. It is nice to get a report of all the things that are broken, instead of getting only one error at a time, fixing it, and then getting another error, etc.
However, I am not sure if this property is all that useful which urlt. If you are trying to parse a url like:
(string "Foo" *> Foo) <$> fromPathSegments <*> fromPathSegments
And the parsing of "Foo" fails.. then there is no use in finding out if the other segments parse ok -- because they are likely to be garbage. Maybe it failed because it got the string "FOo" instead of "Foo", but more likely it got something completely unrelated like, /bar/c/2.4.
So, perhaps Either is a better choice even with out considering dependencies... I think that Applicative / Alternative instances for Either are only defined in transformers in the Control.Monad.Error module -- which is a bit annoying. But we don't actually need those to implement urlt itself.
This brings up another detail though.
the fromPathSegments / Consumer stuff is basically implementing a parser. Except, unlike something like parsec, we do not keep track of the current position for reporting errors. I wonder if we should perhaps use a slightly richer parser environment. Within a web app, once you got your to/from instances debugged, you will never get a parse error, so having great error messages is not essential. But, for other people linking to your site it could be potentially helpful. Though, it seems like the current error messages out to be sufficient given how short the urls are..
I don't think fancy error reporting will help here. More to the point: we could always layer a fancy parser on top of a simpler typeclass. For that matter, the same argument can be made for Failing and Consumer.
I'm not certain what exactly the type of ErrMsg should be here; I don't
really have a problem using [String], which would be close to the definition of Failing.
* I think it's very important to allow users to supply customized 404 pages. Essentially, we need to augment handleWai (possibly others) with a (ErrMsg -> Application) parameter.
Yeah, there are (at least) two possibilities, add an extra param for the handler. Or bubble the error up to the top:
handleWai_1 :: (url -> String) -> (String -> Failing url) -> String -> ([ErrorMsg] -> Application) -> ((url -> String) -> url -> Application) -> Application handleWai_1 fromUrl toUrl approot handleError handler = \request -> do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request case fUrl of (Failure errs) -> handleError errs request (Success url) -> handler (showString approot . fromUrl) url request
handleWai_2 :: (url -> String) -> (String -> Failing url) -> String -> ((url -> String) -> url -> Application) -> (Request -> IO (Failing Response)) handleWai_2 fromUrl toUrl approot handler = \request -> do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request case fUrl of (Failure errs) -> return (Failure errs) (Success url) -> fmap Success $ handler (showString approot . fromUrl) url request
The second choice is perhaps more flexible. Which do you prefer? In the first option, the handleError function could be a Maybe value -- and if you supply Nothing you get some default 404 page?
I personally prefer the first option exactly as you describe it, but you're also correct that the second is more flexible. If anyone else reading this thread would prefer the second, speak now or forever hold your peace ;).
In happstack we have a third possiblity. The ServerMonad is an instance of MonadPlus so we can throw out the error message and just call mzero:
implSite :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m a implSite domain approot siteSpec = do r <- implSite_ domain approot siteSpec case r of (Failure _) -> mzero (Success a) -> return a
implSite_ :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m (Failing a) implSite_ domain approot siteSpec = dirs approot $ do rq <- askRq let pathInfo = intercalate "/" (rqPaths rq) f = runSite (domain ++ approot) siteSpec pathInfo case f of (Failure errs) -> return (Failure errs) (Success sp) -> Success <$> (localRq (const $ rq { rqPaths = [] }) sp)
then we can do:
msum [ implSite "domain" "approot" siteSpec , default404 ]
if implSite calls mzero, then the next handler (in this case default404) is tried.
* It might be nice to have "type WaiSite url = Site url String Application". By the way, are you certain you want to allow parameterization over the pathInfo type?
I'm not certain I don't want to allow it... I have a vague notion that I might want to use Text sometimes instead of String. Though if I was really committed to that then I should make toPathInfo and fromPathInfo parameterized over pathInfo as well... So perhaps I will axe it from Site for now. I need to change the name of that type and it's record names too I think.
Referring to the fear of typeclasses mentioned above: I'd like to avoid MPTCs even more so. In fact, as I look at it, each extra parameter we add creates more potential for incompatible components. For instance, I can see an argument being made to use extensible exceptions for the fromPathSegments return type, but I'd rather keep things standard with [String] than create more division. The only packages that I feel qualified to speak about then are urlt and
urlt-wai, and my recommendation would be:
urlt contains decode/encodePathInfo, PathInfo class and related functions, Site and related functions. If you agree on allowing the parameterization of 404 errors, then also provide a default 404 error.
urlt-wai contains WaiSite, handleWai and related functions.
Yeah, that is what I was thinking. urlt would contain what is currently in;
URLT.Base URLT.PathInfo URLT.HandleT URLT.Monad URLT.QuickCheck
QuickCheck module does not actually depend on QuickCheck, which is nice because QC1 vs QC2 is a big problem right now.
It might also be nice to include:
URLT.TH
with depends on template-haskell. But I am not sure that depending on template-haskell is an issue because template-haskell comes with ghc6, and the code in URLT.TH already handles the breakage that happened with TH 2.4.
I have a different motive for keeping the TH code out: it seems like all of
the other pieces of code should be relatively stable from early on, while the TH code (and quasi-quoting, and regular) will probably have some major changes happening for a while. It would be nice to have a consistent major release number for long periods of time on the core.
If I switch to Either instead of Failing I believe the dependencies would be:
base, Consumer, template-haskell, network, utf8-string
urlt-wai would just include:
URLT.Wai
Sounds great. Let me know when this is available for review. If you want me to do any of the merging/renaming, I have some time now (I arrived in southern California at 3:30 in the morning...).
Michael

On Fri, Mar 26, 2010 at 10:30 AM, Michael Snoyman
I also am beginning to share a mistrust of classes; I think I went a little too overboard on them on a few previous packages (namely, convertible-text) and am now having a reaction in the opposite direction. I'm sure one day I'll find the Golden Path...
Sometimes I think that module functors or adga's module system might be part of the solution. But maybe only because I have not used those systems much ;)
* I'd like to minimize dependencies as much as possible for the basic
package. The two dependencies I've noticed are Consumer and applicative-extras. I think the type signatures would be clearer *without* those packages included, eg:
fromPathSegments :: [String] -> Either ErrMsg a
Except that is not a usable type. fromPathSegments may consume, some, but not all of the path segments. Consider the type:
data SiteURL = Foo Int Int
fromPathSegments is going to receive the path segments:
["Foo","1","2"]
If you wrote a parser by hand, you would want it to look a little something like:
do string "Foo" slash i <- fromPathSegments slash j <- fromPathSegments eol return (Foo i j)
The key concept here is that when you call fromPathSegments to get the first argument of Foo you need to know how many of the path segments were consumed / are remaining, so you can pass only those segments to the second fromPathSegments.
So you really need a type like:
fromPathSegments :: [String] -> (Either ErrMsg a, [String])
which outputs the unconsumed path segments.
Well, given that as a criterion, I agree with the rest of your analysis entirely. However, I think we're looking at the purpose of fromPathSegments very differently. I'm not quite certain I understand why we would want to output the unconsumed segments; if something is unconsumed, then it seems like it's an invalid URL and should fail.
In your example, if I request "/Foo/5/6/7", fromPathSegments would return (Right (Foo 5 6), ["7"]); but what is going to consume that 7 now? The use case I envisioned for something like this is:
data BlogRoutes = ... data MySite = MyHome | MyBlog BlogRoutes fromPathSegments ("blog":rest) = MyBlog `fmap` fromPathSegments
But what if you had, data BlogRoutes = ... data Foo = ... data MySite = MyHome | MyBlog Foo BlogRoutes Where the MyBlog constructor has *two* arguments. In theory you want to write something like: fromPathSegments ("MyBlog":rest) = MyBlog `fmap` fromPathSegments ?? `ap` fromPathSegments ??? The first fromPathSegments will parse the 'Foo' argument and the second fromPathSegments will parse the BlogRoutes argument. To make things more interesting, let's assume that Foo and BlogRoutes were defined in 3rd party modules that don't even know about each other your app. The problem is, what arguments do you pass to each fromPathSegments call? The first call to fromPathSegments is going to consume some of the path segments, and the second call will consume the remaining. But we do not have enough information here to know in advance how to split up 'rest' between the two calls. Instead we to run the first fromPathSegments and have it tell us what part it did not consume. If what I have said still does not make sense, then try this exercise: create 3 modules, one each for: data BlogRoutes = BlogHome data Foo = Foo Int | Bar Char Int data MySite = MyBlog Foo BlogRoutes now create fromPathSegments instances for each of those routes. I think you will find that it is difficult to implement the instance for MySite. Finally, can you now change the Foo type to: data Foo = Foo Int Int | Bar Int Char Int by *only* modifying the Foo module, and without breaking the MySite module? Regarding the type: data Foo = Foo Int Int attempting to parse: "/Foo/5/6/7" I think that should be handled in, fromPathInfo :: (PathInfo u) => String -> Failing u, when it calls fromPathSegments. At that point in time we know that all the segments should have been consumed... so if there is left over junk, something is wrong. The latest version of the code is now at: http://src.seereason.com/web-routes/ I did the renaming but have not made all the other changes yet. - jeremy

On Fri, Mar 26, 2010 at 9:30 AM, Jeremy Shaw
On Fri, Mar 26, 2010 at 10:30 AM, Michael Snoyman
wrote: Well, given that as a criterion, I agree with the rest of your analysis entirely. However, I think we're looking at the purpose of fromPathSegments very differently. I'm not quite certain I understand why we would want to output the unconsumed segments; if something is unconsumed, then it seems like it's an invalid URL and should fail.
In your example, if I request "/Foo/5/6/7", fromPathSegments would return (Right (Foo 5 6), ["7"]); but what is going to consume that 7 now? The use case I envisioned for something like this is:
data BlogRoutes = ... data MySite = MyHome | MyBlog BlogRoutes fromPathSegments ("blog":rest) = MyBlog `fmap` fromPathSegments
But what if you had,
data BlogRoutes = ... data Foo = ... data MySite = MyHome | MyBlog Foo BlogRoutes
Where the MyBlog constructor has *two* arguments. In theory you want to write something like:
fromPathSegments ("MyBlog":rest) = MyBlog `fmap` fromPathSegments ?? `ap` fromPathSegments ???
The first fromPathSegments will parse the 'Foo' argument and the second fromPathSegments will parse the BlogRoutes argument. To make things more interesting, let's assume that Foo and BlogRoutes were defined in 3rd party modules that don't even know about each other your app.
The problem is, what arguments do you pass to each fromPathSegments call? The first call to fromPathSegments is going to consume some of the path segments, and the second call will consume the remaining. But we do not have enough information here to know in advance how to split up 'rest' between the two calls. Instead we to run the first fromPathSegments and have it tell us what part it did not consume.
If what I have said still does not make sense, then try this exercise:
create 3 modules, one each for:
data BlogRoutes = BlogHome data Foo = Foo Int | Bar Char Int data MySite = MyBlog Foo BlogRoutes
now create fromPathSegments instances for each of those routes. I think you will find that it is difficult to implement the instance for MySite. Finally, can you now change the Foo type to:
data Foo = Foo Int Int | Bar Int Char Int
by *only* modifying the Foo module, and without breaking the MySite module?
Regarding the type:
data Foo = Foo Int Int
attempting to parse:
"/Foo/5/6/7"
I think that should be handled in, fromPathInfo :: (PathInfo u) => String -> Failing u, when it calls fromPathSegments.
At that point in time we know that all the segments should have been consumed... so if there is left over junk, something is wrong.
The latest version of the code is now at:
http://src.seereason.com/web-routes/
I did the renaming but have not made all the other changes yet.
- jeremy
Can you give me any real-world examples where you would have URL routes built up like that? It seems like this is an optimization for the abnormal case. Michael

On Fri, Mar 26, 2010 at 5:13 PM, Michael Snoyman
Can you give me any real-world examples where you would have URL routes built up like that? It seems like this is an optimization for the abnormal case.
I am not sure I would consider it an 'optimization' -- with out this change the desired behavior can not be expressed as far as I can tell. Off the top of my head, I have the following type in my image gallery library: data GalleryCommon = ViewImage ImageId [Transform] deriving (Eq, Ord, Show, Read, Data, Typeable) The ImageId and Transform properties are used in other url components (not shown here). They have PathInfo instances already. The Template Haskell or Regular libraries would generate a parser that looks a bit like: fromPathSegments = (string "ViewImage" *> ViewImage) <$> fromPathSegments <*> fromPathSegments Without the changes to the type, the TH code would not be able to reuse the existing PathInfo ImageId instance, but would instead be forced to handle the ImageId argument explicitly.. That is really unfortunate, because the path generated by the PathInfo ImageId instance is much cleaner than what TH would generate. It seems to me that the primary purpose of the PathInfo class is to allow you to build composable parsers / generates for urls. It seems odd to limit the composableness to only the last argument of a constructor.. - jeremy

A variation which is more obviously hierarchical in nature:
data User = User LastName FirstName
data Section = Recent | TopRated
data UserHomeURL = Wall Section | Profile
data UserURL = ViewHome User UserHome
Which would be used to construct urls like:
/shaw/jeremy/wall/recent
/snoyman/michael/profile
etc..
- jeremy
On Fri, Mar 26, 2010 at 5:44 PM, Jeremy Shaw
On Fri, Mar 26, 2010 at 5:13 PM, Michael Snoyman
wrote: Can you give me any real-world examples where you would have URL routes built up like that? It seems like this is an optimization for the abnormal case.
I am not sure I would consider it an 'optimization' -- with out this change the desired behavior can not be expressed as far as I can tell.
Off the top of my head, I have the following type in my image gallery library:
data GalleryCommon = ViewImage ImageId [Transform] deriving (Eq, Ord, Show, Read, Data, Typeable)
The ImageId and Transform properties are used in other url components (not shown here). They have PathInfo instances already.
The Template Haskell or Regular libraries would generate a parser that looks a bit like:
fromPathSegments = (string "ViewImage" *> ViewImage) <$> fromPathSegments <*> fromPathSegments
Without the changes to the type, the TH code would not be able to reuse the existing PathInfo ImageId instance, but would instead be forced to handle the ImageId argument explicitly.. That is really unfortunate, because the path generated by the PathInfo ImageId instance is much cleaner than what TH would generate.
It seems to me that the primary purpose of the PathInfo class is to allow you to build composable parsers / generates for urls. It seems odd to limit the composableness to only the last argument of a constructor..
- jeremy

On Fri, Mar 26, 2010 at 6:01 PM, Jeremy Shaw
A variation which is more obviously hierarchical in nature:
data User = User LastName FirstName data Section = Recent | TopRated data UserHomeURL = Wall Section | Profile data UserURL = ViewHome User UserHome
Which would be used to construct urls like:
/shaw/jeremy/wall/recent /snoyman/michael/profile
To expand on this slightly let's pretend you first implemented a module that supported a single user: data Section = Recent | TopRated data UserHomeURL = Wall Section | Profile And you release that as a library. Then in another app, which supports multiple users, you want to use it. So you create: data User = User LastName FirstName data UserURL = ViewHome User UserHome That seems pretty sensible. After all the half the point of this library is to be able to build reusable components. Because we are reusing a component, we can not modify the UserHome portion of the URL. But we still need some way to specify which user's home we are looking at. The above types seem sensible for that. You have similar structures in your photo blog app: -------- instance Yesod PB where resources = [$mkResources| /: GET: indexHandler /entries/$entryId: GET: entry /entries/$entryId/$filename: GET: media /feed: GET: feed ... ------ A reasonable URL type for that might be: data EntryId = ... data PhotoBlogURL = BlogHome | Entry EntryId | Media EntryId FileName | Feed Where Media has two arguments to it's constructor. You can't really factor that out, can you? You could fake it like: data PhotoBlogURL = BlogHome | Entry EntryId | Media (EntryId, FileName) | Feed But that does not really buy you anything, because when you write the Media parser, you still have to know how many segments EntryId consumes, and how many FileName consumes. perhaps entryid is: data EntryId = EntryId { year :: Int , month :: Int ,

On Thu, Mar 18, 2010 at 5:17 PM, Michael Snoyman
Based on everything you've said, and some thought I've had on my own, I agree that the base function should involve no typeclasses and not break up the path into pieces. Here's a proposal for the entire core:
newtype AbsPath = AbsPath { unAbsPath :: String } newtype PathInfo = PathInfo { unPathInfo :: String }
Can you provide some simples examples of the types of mistakes we might make if we didn't use newtypes here? One potentially nice thing about having the function, showURL :: (url -> String) instead of (url -> AbsPath) is that it works with most of the html 'templating' solutions with out any extra fusing around. For example, with Text.Html a ! [href (showURL Foo)] Which is kind of nice. But I also like using newtypes when it helps avoid problems. handleWai :: (PathInfo -> Failing url)
-> (url -> PathInfo) -> (PathInfo -> AbsPath) -> (url -> (url -> AbsPath) -> Application) -> Application handleWai parsePI buildPI buildAbsPath dispatch req = do let pi = PathInfo $ S.unpack $ pathInfo req case parsePI pi of Success url -> dispatch url (buildAbsPath . buildPI) req Failure errors -> return $ Response Status404 [] $ Right $ fromLBS $ L.pack $ unlines errors
Depends on which 'core' we are talking about. I still intend to use urlt with happstack, which does not yet have fully integration with Wai. So I will need: handleHappstack.. or some variant. And I can imagine other non-Wai people want to use urlt as well. So I imagine we will have: urlt-wai urlt-happstack etc so at the real core that would just leave, PathInfo and AbsPath ? Unless we get rid of them.. then there is nothing at the core, only optional things :p - jeremy

On Fri, Mar 19, 2010 at 2:31 PM, Jeremy Shaw
On Thu, Mar 18, 2010 at 5:17 PM, Michael Snoyman
wrote: Based on everything you've said, and some thought I've had on my own, I agree that the base function should involve no typeclasses and not break up the path into pieces. Here's a proposal for the entire core:
newtype AbsPath = AbsPath { unAbsPath :: String } newtype PathInfo = PathInfo { unPathInfo :: String }
Can you provide some simples examples of the types of mistakes we might make if we didn't use newtypes here?
One potentially nice thing about having the function, showURL :: (url -> String) instead of (url -> AbsPath) is that it works with most of the html 'templating' solutions with out any extra fusing around. For example, with Text.Html
a ! [href (showURL Foo)]
Which is kind of nice.
But I also like using newtypes when it helps avoid problems.
I think I've said it before: I'm on the fence about this one. The newtypes are just doing what they usually do: prevent you from making silly mistakes and ensure more type safety. I have no incredibly persuasive examples.
handleWai :: (PathInfo -> Failing url)
-> (url -> PathInfo) -> (PathInfo -> AbsPath) -> (url -> (url -> AbsPath) -> Application) -> Application handleWai parsePI buildPI buildAbsPath dispatch req = do let pi = PathInfo $ S.unpack $ pathInfo req case parsePI pi of Success url -> dispatch url (buildAbsPath . buildPI) req Failure errors -> return $ Response Status404 [] $ Right $ fromLBS $ L.pack $ unlines errors
Depends on which 'core' we are talking about. I still intend to use urlt with happstack, which does not yet have fully integration with Wai. So I will need:
handleHappstack.. or some variant. And I can imagine other non-Wai people want to use urlt as well. So I imagine we will have:
urlt-wai urlt-happstack etc
so at the real core that would just leave, PathInfo and AbsPath ? Unless we get rid of them.. then there is nothing at the core, only optional things :p
If all we're aiming for here is a method for type-safe URLs, then this would work. I'm trying to broaden the scope to including pluggable web pieces; for this, a unified request/response type are a must.
That said, we *could* make the Application type be polymorphic and then provide handleWai, handleCgi, handleHappstack, etc, and then have plugins specific for each of those systems. But I'd rather see us standardize on WAI (that is the purpose of it) and provide compatibility wrappers. Michael

Michael Snoyman
I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Yes, this approach smells a lot better to me: the types are types and
the data are data. Just a brainstorm: if each handler monad were to
carry its routing table around with it, you could define something like:
match :: ByteString
-> (WhateverYourWebMonadIsCalled a)
-> (WhateverYourWebMonadIsCalled a)
And write
handler = match "/foo/:bar/#int/" $ do ...
without the template haskell quasiquotation (i.e. do the checking at
runtime.) Is it always true that compile-time link checking is
possible/desirable? All of these solutions also imply that each handler
knows where it's hung on the routing table so that it can resolve
relative URLs, etc.
IMO no matter what a link-checker should be lightweight enough that you
can refactor easily without rewriting a bunch of static tables; you
should be able to move an entire "subtree" to another place in the
routing table in O(1) time.
This rabbit hole goes pretty deep though; if you're serious about the
bondage and discipline approach you'd want to ensure that you can check
query parameters; i.e. "'/foo' takes a mandatory 'bar' integer parameter
on the queryString and an optional 'sort' parameter which must be either
'asc' or 'desc'", etc. At some point I have to wonder: is the medicine
worse than the disease?
G
--
Gregory Collins

On 16 mrt 2010, at 17:51, Gregory Collins wrote:
Michael Snoyman
writes: I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Yes, this approach smells a lot better to me: the types are types and the data are data. Just a brainstorm: if each handler monad were to carry its routing table around with it, you could define something like:
match :: ByteString -> (WhateverYourWebMonadIsCalled a) -> (WhateverYourWebMonadIsCalled a)
And write
handler = match "/foo/:bar/#int/" $ do ...
without the template haskell quasiquotation (i.e. do the checking at runtime.) Is it always true that compile-time link checking is possible/desirable? All of these solutions also imply that each handler knows where it's hung on the routing table so that it can resolve relative URLs, etc.
IMO no matter what a link-checker should be lightweight enough that you can refactor easily without rewriting a bunch of static tables; you should be able to move an entire "subtree" to another place in the routing table in O(1) time.
This rabbit hole goes pretty deep though; if you're serious about the bondage and discipline approach you'd want to ensure that you can check query parameters; i.e. "'/foo' takes a mandatory 'bar' integer parameter on the queryString and an optional 'sort' parameter which must be either 'asc' or 'desc'", etc. At some point I have to wonder: is the medicine worse than the disease?
This would be a fairly straightforward extension of my library. I'll see if I can come up with something on the trainride to ZuriHac tomorrow! -chris

Michael Snoyman
writes: I think the only piece of the puzzle missing to combine these two together is to have mkResources output something along the lines of:
data RoutePiece = StaticPiece String | IntPiece | StringPiece _validRoutes :: [[RoutePiece]] _validRoutes = [ [StaticPiece "user"] , [StaticPiece "user", StaticPiece "find", IntPiece] , [StaticPiece "user", StaticPiece "name", StringPiece] ]
Yes, this approach smells a lot better to me: the types are types and the data are data. Just a brainstorm: if each handler monad were to carry its routing table around with it, you could define something like:
match :: ByteString -> (WhateverYourWebMonadIsCalled a) -> (WhateverYourWebMonadIsCalled a)
And write
handler = match "/foo/:bar/#int/" $ do ...
without the template haskell quasiquotation (i.e. do the checking at runtime.) Is it always true that compile-time link checking is possible/desirable? All of these solutions also imply that each handler knows where it's hung on the routing table so that it can resolve relative URLs, etc.
IMO no matter what a link-checker should be lightweight enough that you can refactor easily without rewriting a bunch of static tables; you should be able to move an entire "subtree" to another place in the routing table in O(1) time.
This rabbit hole goes pretty deep though; if you're serious about the bondage and discipline approach you'd want to ensure that you can check query parameters; i.e. "'/foo' takes a mandatory 'bar' integer parameter on the queryString and an optional 'sort' parameter which must be either 'asc' or 'desc'", etc. At some point I have to wonder: is the medicine worse than the disease?
I think you have some valid points here. If anyone cares to dig back
On Tue, Mar 16, 2010 at 9:51 AM, Gregory Collins

On Tue, Mar 16, 2010 at 11:51 AM, Gregory Collins
Michael Snoyman
writes:
Yes, this approach smells a lot better to me: the types are types and the data are data. Just a brainstorm: if each handler monad were to carry its routing table around with it, you could define something like:
match :: ByteString -> (WhateverYourWebMonadIsCalled a) -> (WhateverYourWebMonadIsCalled a)
And write
handler = match "/foo/:bar/#int/" $ do ...
without the template haskell quasiquotation (i.e. do the checking at runtime.)
Well, that is pretty much exactly how the ServerMonad in happstack works. That is why I wanted to free ServerMonad so that it is not happstack specific. I am not sure what :bar is supposed to mean, but let's pretend it means match on the type Bar. In the current code today you would could match on "/foo/:bar/#int/" like this: dir "foo" $ path $ \(bar :: Bar) -> path $ \(i :: Int) -> ... Now, one issue with your runtime match function is that it returns two parameters, in that example, but might return a different number of arguments in other cases. So the type would have to be something vararg-ish. match :: (MonadPlus m, ServerMonad m, MatchArgs c) => String -> (c -> m a) -> m a Where the type 'c' hopefully matches up with the values that "/foo/:bar/#int/" returns. Despite this annoyance, this could be implemented using ServerMonad today. One advantage of using the QuasiQuote method instead of a plain string is that the QuasiQuote method would parse the string at compile time and generate a matcher with a specific type signature. If the handler function had different arguments you would get a compile time error. I would very much like to see a QuasiQuote version of match added to the ServerMonad library. It's basically just syntactic sugar for using dir / path /etc. But it is very nice sugar.
Is it always true that compile-time link checking is possible/desirable? All of these solutions also imply that each handler knows where it's hung on the routing table so that it can resolve relative URLs, etc.
With URLT, the handler 'knows where it is hung' because it is stored transparently by the URLT monad, and accessed via showURL. So in the blog library, for example, you would just write: u <- showURL (ViewPost 1) showURL magically knows where it is hung. If the blog was incorporated into a master site: data App = Blog BlogURL it would be told like: app (Blog blogURL) = nestURL Blog $ blogHandler blogURL
IMO no matter what a link-checker should be lightweight enough that you can refactor easily without rewriting a bunch of static tables; you should be able to move an entire "subtree" to another place in the routing table in O(1) time.
That should be easy with URLT. You could just change the type: data App = MyBlog BlogURL Now, if you forget to update the nestURL handler from Blog to MyBlog, you will get a compile time error. Or, might might not change the type at all. You might just change the AsURL instance so that (Blog (ViewPost 1)) generates: /myblog/viewpost/1 So, you have the flexibility to change the way your urls look by modifying two functions and leaving everything else alone.
This rabbit hole goes pretty deep though; if you're serious about the bondage and discipline approach you'd want to ensure that you can check query parameters; i.e. "'/foo' takes a mandatory 'bar' integer parameter on the queryString and an optional 'sort' parameter which must be either 'asc' or 'desc'", etc. At some point I have to wonder: is the medicine worse than the disease?
Using the the URLT, the issue of the mandatory and optional arguments is simple. You would have types like: data Sort = Asc | Desc data WebURL = Foo Int (Maybe Sort) instance AsURL Foo where toURLS (Foo i mSort) = showString "Foo?" . showString "bar=" . shows n . showString ";" . (case mSort of Nothing -> id ; (Just Asc) -> showString "sort=asc;" ; (Just Desc) -> showString "sort=desc;") fromURLC = -- skipped for brevity It seems like the medicine is quite nice here. Instead of having to remember if the 'bar' is a parameter or a path component, we just do: showURL (Foo 1 (Just Asc)) The details of how that looks are centralized in a single spot. if we decide that we want the url to instead look like: /foo/1?sort=desc We can change it in one spot instead of having to: 1. change it everywhere 2. tell everyone about the change and get them to remember that it changed Additionally, with the typed version you can never forget the mandatory 'bar' parameter because the compiler will tell you. Writing the toURLS and fromURLC instances by hand gives you very precise control over how the URLs look -- though it can be a bit tedious. However I think some simple combinators could help with that. Either a DSL for constructing things by hand, or something based on a yesod style quasiquoter. - jeremy

Here's a bit of an orthogonal gist to what Chris has posted. It's very tedious to program in this style, but it could form the basis for other libraries to build on top of it. http://gist.github.com/334475 WebPlug is a typeclass that consists of three methods: one to convert to a relative path, one to convert from a relative path, and a dispatch method. The trick is that dispatch is given a function that can convert a WebPlug into an *absolute* path. Given something like this, I could write a blog WebPlug that creates all internal links via the absolute path function. They would all be guaranteed to be valid by definition, and then the blog could be plugged into another application at any point. Some standards for loading settings and the like *might* be nice, but aren't really necesary. Also, I believe that this approach is compatible with either the regular package approach Chris uses above, or a quasi-quoting approach like in Yesod. (Obviously the Yesod code itself would have to change to accomodate this.) Michael

Hello,
I believe this is essentially how URLT works, except the 'toAbsPath'
function is stored in the reader monad instead of being passed around
manually.
And advantage of using the reader monad is that it also ensures that you are
dispatching the right type of URL in each part. More on this in a different
post later.
- jeremy
On Tue, Mar 16, 2010 at 3:49 PM, Michael Snoyman
Here's a bit of an orthogonal gist to what Chris has posted. It's very tedious to program in this style, but it could form the basis for other libraries to build on top of it.
WebPlug is a typeclass that consists of three methods: one to convert to a relative path, one to convert from a relative path, and a dispatch method. The trick is that dispatch is given a function that can convert a WebPlug into an *absolute* path.
Given something like this, I could write a blog WebPlug that creates all internal links via the absolute path function. They would all be guaranteed to be valid by definition, and then the blog could be plugged into another application at any point.
Some standards for loading settings and the like *might* be nice, but aren't really necesary. Also, I believe that this approach is compatible with either the regular package approach Chris uses above, or a quasi-quoting approach like in Yesod. (Obviously the Yesod code itself would have to change to accomodate this.)
Michael
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

What happens if you have,
data UserRoute = Find Int Int
you need a instance:
instance (GToURL f, GToURL g) => GToURL (f :*: g)
I think?
But the fromURL / gfromURL functions do not have a way to indicate how much
of the input they consumed -- which I think will be a problem?
I believe the current code works because it assumes that each call to
gfromURL will consume all the remaining input. But when your constructor has
two arguments, the first one better not consume all the rest of the input..
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
Hey everyone,
I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

My solution, by the way was to change fromURL to have a type more like:
fromURL :: [String] -> (Failing a, [String])
so that it returns the unconsumed portion of the path segments.
If you get the latest urlt from darcs:
darcs get http://src.seereason.com/urlt/
I have ported the regular stuff over already. Someone needs to add the
missing instance for :*:.
In that code I am using the Consumer monad, which is just a wrapped up
version of ([c] -> (a, [c])).
The URLT library as a whole does not build at the moment, but he parts
needed to fix URLT.Regular do.
- jeremy
On Tue, Mar 23, 2010 at 2:36 PM, Jeremy Shaw
What happens if you have,
data UserRoute = Find Int Int
you need a instance:
instance (GToURL f, GToURL g) => GToURL (f :*: g)
I think?
But the fromURL / gfromURL functions do not have a way to indicate how much of the input they consumed -- which I think will be a problem?
I believe the current code works because it assumes that each call to gfromURL will consume all the remaining input. But when your constructor has two arguments, the first one better not consume all the rest of the input..
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone,
I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Hi Jeremy, Yes, that's why I didn't provide the instance. It should be a real (lightweight) parser. My implementation is just a sketch, not a real implementation. -chris On 23 mrt 2010, at 20:36, Jeremy Shaw wrote:
What happens if you have,
data UserRoute = Find Int Int
you need a instance:
instance (GToURL f, GToURL g) => GToURL (f :*: g)
I think?
But the fromURL / gfromURL functions do not have a way to indicate how much of the input they consumed -- which I think will be a problem?
I believe the current code works because it assumes that each call to gfromURL will consume all the remaining input. But when your constructor has two arguments, the first one better not consume all the rest of the input..
- jeremy
On Tue, Mar 16, 2010 at 3:52 AM, Chris Eidhof
wrote: Hey everyone, I just wrote down some of my ideas about type-safe URL handling on github, it's at http://gist.github.com/333769
I think it's similar to what Jeremy is doing with his urlt package [1].
-chris
[1]: http://src.seereason.com/~jeremy/SimpleSite1.html
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

I pushed a big old patch to web-routes. It splits it into sub-packages, drops the parameterization of Site over pathInfo, and switches from Failing to either. I need to figure out how to tweak p2u in PathInfo so that it works with parsec 2 and 3. But, it's getting a lot closer to being done. I think some of the naming could still be improved. - jeremy

Jeremy,
Awesome work, sorry I haven't been more responsive for the past few days-
I'm afraid that's going to continue for a bit.
We'd been discussing the PathInfo class previously; I understand now what
you're trying to achieve with it, but I think for a lot of use cases using a
parser like that will be unnecesary. For those cases, I'd hate to introduce
a parsec dependency, especially given the 2/3 split we're dealing with right
now.
I know it's painful to split this up into even more packages, but we do you
think of adding web-routes-parser?
Also, question about the Site datatype: Did we want the formatLink and
parseLink functions to work on [String] instead of String? I think I see
advantages either way. Also, I'm not sure if I sent this previously, but I
think the defaultPage function is redundant; that should probably be
specified by 'parseLink ""'.
I'll try to get started on porting the Yesod mkResources code over to
web-routes now.
Michael
On Sun, Mar 28, 2010 at 9:03 PM, Jeremy Shaw
I pushed a big old patch to web-routes. It splits it into sub-packages, drops the parameterization of Site over pathInfo, and switches from Failing to either.
I need to figure out how to tweak p2u in PathInfo so that it works with parsec 2 and 3.
But, it's getting a lot closer to being done. I think some of the naming could still be improved.
- jeremy
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

On Mon, Mar 29, 2010 at 8:47 AM, Michael Snoyman
Jeremy,
We'd been discussing the PathInfo class previously; I understand now what you're trying to achieve with it, but I think for a lot of use cases using a parser like that will be unnecesary. For those cases, I'd hate to introduce a parsec dependency, especially given the 2/3 split we're dealing with right now.
Well, parsec is in the haskell platform, so it's really a question of can be it implemented so that it works with both 2 and 3.
I know it's painful to split this up into even more packages, but we do you think of adding web-routes-parser?
I would consider it, but I don't understand how that would work... So far you have been pretty unexcited about the parsing having a type similar to: [String] -> (Either String a, [String]) because it is often 'unnecessary'. But the problem with, [String] -> (Either String a), is that, as far as I can tell, it won't work in the cases where it *is* necessary. And you can't simply switch to, [String] -> (Either String a, [String]), when that happens.. But perhaps I am missing something here..
Also, question about the Site datatype: Did we want the formatLink and parseLink functions to work on [String] instead of String? I think I see advantages either way. Also, I'm not sure if I sent this previously, but I think the defaultPage function is redundant; that should probably be specified by 'parseLink ""'.
I will have to think about [String] vs String more. I don't want using Site to mean you have to use PathInfo, but switching to [String] wouldn't have that effect, and would mean that people would not have to worry about those pesky encoding issues. The only real use case I have for not using PathInfo is if you wanted to create a function like: gfromPathSegments :: (Data url) => PathInfo -> Either String url. And that can be done most easily if PathInfo = [String]. So I think I'll make that change. In the old code base, where there was no url encoding done by the library, it did not really make a difference. Regarding defaultPage, not all methods of generating formatLink / parseLink lend themselves to handling "" very easily, so it seemed nice to have a single place that can add that case in, rather than add similar code to all the parsers.. Also, I think that I sometimes changes the landing page of my site. But I think I will make 'defaultPage' a Maybe value, and bill it as a way to 'override' where "" resolves to.
I'll try to get started on porting the Yesod mkResources code over to web-routes now.
awesome! - jeremy

-chris On 29 mrt 2010, at 18:18, Jeremy Shaw wrote:
On Mon, Mar 29, 2010 at 8:47 AM, Michael Snoyman
wrote: Jeremy, We'd been discussing the PathInfo class previously; I understand now what you're trying to achieve with it, but I think for a lot of use cases using a parser like that will be unnecesary. For those cases, I'd hate to introduce a parsec dependency, especially given the 2/3 split we're dealing with right now.
Well, parsec is in the haskell platform, so it's really a question of can be it implemented so that it works with both 2 and 3.
I'm on the latest 6.12 platform, and I only have parsec-2.1 installed. Which packages use parsec-3?
So far you have been pretty unexcited about the parsing having a type similar to:
[String] -> (Either String a, [String])
because it is often 'unnecessary'. But the problem with, [String] -> (Either String a), is that, as far as I can tell, it won't work in the cases where it *is* necessary. And you can't simply switch to, [String] -> (Either String a, [String]), when that happens.. But perhaps I am missing something here..
I think you're right, Jeremy. If you want an instance for :*:, (i.e. for a constructor with several fields), you definitely need a parser. Be it parsec or a hand-rolled Consumer-like parser, you do need one. -chris

On Mon, Mar 29, 2010 at 11:28 AM, Chris Eidhof
-chris
On 29 mrt 2010, at 18:18, Jeremy Shaw wrote:
On Mon, Mar 29, 2010 at 8:47 AM, Michael Snoyman
wrote: Jeremy, We'd been discussing the PathInfo class previously; I understand now what you're trying to achieve with it, but I think for a lot of use cases using a parser like that will be unnecesary. For those cases, I'd hate to introduce a parsec dependency, especially given the 2/3 split we're dealing with right now.
Well, parsec is in the haskell platform, so it's really a question of can be it implemented so that it works with both 2 and 3.
I'm on the latest 6.12 platform, and I only have parsec-2.1 installed. Which packages use parsec-3?
in web-routes, the PathInfo module now contains a function p2u :: Parser a
-> URLParser a so that you can lift a Char parser to be a URL parser: testp :: URLParser (Char, String, String) testp = do segment "foo" st <- p2u (char 'h' >> char 'o') sg <- anySegment sg' <- anySegment return (st,sg, sg') but I have not figured out how to implement p2u so that it works under parsec 2 & 3. Maybe I'll just leave it out for now, but I am doing some exploration first. - jeremy

On Mon, Mar 29, 2010 at 9:18 AM, Jeremy Shaw
On Mon, Mar 29, 2010 at 8:47 AM, Michael Snoyman
wrote: Jeremy,
We'd been discussing the PathInfo class previously; I understand now what you're trying to achieve with it, but I think for a lot of use cases using a parser like that will be unnecesary. For those cases, I'd hate to introduce a parsec dependency, especially given the 2/3 split we're dealing with right now.
Well, parsec is in the haskell platform, so it's really a question of can be it implemented so that it works with both 2 and 3.
I know it's painful to split this up into even more packages, but we do you think of adding web-routes-parser?
I would consider it, but I don't understand how that would work...
So far you have been pretty unexcited about the parsing having a type similar to:
[String] -> (Either String a, [String])
because it is often 'unnecessary'. But the problem with, [String] -> (Either String a), is that, as far as I can tell, it won't work in the cases where it *is* necessary. And you can't simply switch to, [String] -> (Either String a, [String]), when that happens.. But perhaps I am missing something here..
The reason I'm unexcited is that I never would have dreamed of defining my routes that way. I don't feel like drawing out this point too much, because you clearly *would* define your routes that way. However, just to draw the distinction in how I would do things differently, I'll use an example of mine that you quoted earlier: instance Yesod PB where resources = [$mkResources| /: GET: indexHandler /entries/$entryId: GET: entry /entries/$entryId/$filename: GET: media /feed: GET: feed If I were to convert this to a datatype, it would be: data PBRoutes = Home | Entry String | File String String | Feed I simply wouldn't nest a datatype inside any of the constructors. I understand that you want to do this in some circumstances, but I would simply "duplicate" the parsing code for the Entry and File constructors, since I find that parsing code trivial. In particular: parsePB ["entries", eid] = Entry eid parsePB ["entries", eid, filename] = File eid filename I don't see a need for providing a sophisticated parser. Also, question about the Site datatype: Did we want the formatLink and
parseLink functions to work on [String] instead of String? I think I see advantages either way. Also, I'm not sure if I sent this previously, but I think the defaultPage function is redundant; that should probably be specified by 'parseLink ""'.
I will have to think about [String] vs String more. I don't want using Site to mean you have to use PathInfo, but switching to [String] wouldn't have that effect, and would mean that people would not have to worry about those pesky encoding issues. The only real use case I have for not using PathInfo is if you wanted to create a function like: gfromPathSegments :: (Data url) => PathInfo -> Either String url. And that can be done most easily if PathInfo = [String]. So I think I'll make that change. In the old code base, where there was no url encoding done by the library, it did not really make a difference.
Regarding defaultPage, not all methods of generating formatLink / parseLink lend themselves to handling "" very easily, so it seemed nice to have a single place that can add that case in, rather than add similar code to all the parsers.. Also, I think that I sometimes changes the landing page of my site. But I think I will make 'defaultPage' a Maybe value, and bill it as a way to 'override' where "" resolves to.
That sounds good.
I'll try to get started on porting the Yesod mkResources code over to
web-routes now.
awesome!
- jeremy
So, I've thought about the syntax for this, and I have this idea in mind. $(createRoutes MyRoutes [$parseRoutes| /: name: Home methods: [GET] /user/#userid: name: User methods: [GET, PUT, DELETE] /static: name: Static subsite: StaticRoutes dispatch: staticRoutes |]) This would generate a datatype: data MyRoutes = Home | User Integer | Static StaticRoutes Handler functions would be getHome, getUser, putUser, deleteUser. Static would be a pluggable subsite; I'd have to play around with the syntax of that a bit. Also, this will allow *any* type of application, not just wai (I want this to be as general as possible). Michael

On Mon, Mar 29, 2010 at 12:16 PM, Michael Snoyman
The reason I'm unexcited is that I never would have dreamed of defining my routes that way. I don't feel like drawing out this point too much, because you clearly *would* define your routes that way. However, just to draw the distinction in how I would do things differently, I'll use an example of mine that you quoted earlier:
instance Yesod PB where resources = [$mkResources| /: GET: indexHandler /entries/$entryId: GET: entry /entries/$entryId/$filename: GET: media /feed: GET: feed
If I were to convert this to a datatype, it would be:
data PBRoutes = Home | Entry String | File String String | Feed
You do still have nested data-types here. Namely the String. In this case it is trivial to handle by hand, but it does pose a problem for things like TH and Regular. That is why I had PathInfo in the original code in the first place.. I couldn't figure out how to write the TH code with out it. I simply wouldn't nest a datatype inside any of the constructors. I
understand that you want to do this in some circumstances, but I would simply "duplicate" the parsing code for the Entry and File constructors, since I find that parsing code trivial. In particular:
parsePB ["entries", eid] = Entry eid parsePB ["entries", eid, filename] = File eid filename
I don't see a need for providing a sophisticated parser.
If you are going to duplicate the code instead of calling fromPathSegments, then you don't really need PathInfo at all, right? The current code is designed so that you are not forced to use PathInfo. We have: data Site = Site { ... , parsePathSegments :: [String] -> Either String url } And you can do: Site { parsePathSegments = parsePB } The only real reason to have PathInfo is to build composable parsers as far as I can tell. So, I guess maybe you are suggesting that PathInfo should be a separate package? I don't see a big win here since we will depend on parsec 2 anyway, and since web-routes-wai would need to depend on it anyway to provide the wai related functions that do use PathInfo.. I did add a new parser combinator though: patternParse :: ([String] -> Either String a) -> URLParser a so you can do: fromPathSegments = patternParse parsePB patternParse consumes all the remaining segments and passes them to parsePB.
So, I've thought about the syntax for this, and I have this idea in mind.
$(createRoutes MyRoutes [$parseRoutes| /: name: Home methods: [GET] /user/#userid: name: User methods: [GET, PUT, DELETE] /static: name: Static subsite: StaticRoutes dispatch: staticRoutes |])
This would generate a datatype:
data MyRoutes = Home | User Integer | Static StaticRoutes
So your idea is to generate the data-type from the routes, rather than try to map the routes onto an existing datatype? Your approach sounds easier. The advantage of the latter is that you could change the look of the url with out having to go change all your code that uses the URL type.. Not sure how doable the latter is though. Handler functions would be getHome, getUser, putUser, deleteUser. Static
would be a pluggable subsite; I'd have to play around with the syntax of that a bit. Also, this will allow *any* type of application, not just wai (I want this to be as general as possible).
right. I see no reason for it to be wai specific. Speaking of wai, there is a bug in wai-extra in SimpleServer. It does not put a space between the status code and the status message ~/n-heptane/projects/haskell/web-routes $ curl -v http://localhost:3000/MyHomeoeu * About to connect() to localhost port 3000 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 3000 (#0)
GET /MyHomeoeu HTTP/1.1 User-Agent: curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10 libssh2/0.18 Host: localhost:3000 Accept: */*
< HTTP/1.1 404Not Found * no chunk, no close, no size. Assume close to signal end Note the second to last line. - jeremy

On Mon, Mar 29, 2010 at 11:37 AM, Jeremy Shaw
On Mon, Mar 29, 2010 at 12:16 PM, Michael Snoyman
wrote: The reason I'm unexcited is that I never would have dreamed of defining my routes that way. I don't feel like drawing out this point too much, because you clearly *would* define your routes that way. However, just to draw the distinction in how I would do things differently, I'll use an example of mine that you quoted earlier:
instance Yesod PB where resources = [$mkResources| /: GET: indexHandler /entries/$entryId: GET: entry /entries/$entryId/$filename: GET: media /feed: GET: feed
If I were to convert this to a datatype, it would be:
data PBRoutes = Home | Entry String | File String String | Feed
You do still have nested data-types here. Namely the String. In this case it is trivial to handle by hand, but it does pose a problem for things like TH and Regular. That is why I had PathInfo in the original code in the first place.. I couldn't figure out how to write the TH code with out it.
I simply wouldn't nest a datatype inside any of the constructors. I
understand that you want to do this in some circumstances, but I would simply "duplicate" the parsing code for the Entry and File constructors, since I find that parsing code trivial. In particular:
parsePB ["entries", eid] = Entry eid parsePB ["entries", eid, filename] = File eid filename
I don't see a need for providing a sophisticated parser.
If you are going to duplicate the code instead of calling fromPathSegments, then you don't really need PathInfo at all, right? The current code is designed so that you are not forced to use PathInfo.
Right, I just didn't understand the purpose of PathInfo; you've explained it very clearly now, thank you.
We have:
data Site = Site { ... , parsePathSegments :: [String] -> Either String url }
And you can do:
Site { parsePathSegments = parsePB }
The only real reason to have PathInfo is to build composable parsers as far as I can tell. So, I guess maybe you are suggesting that PathInfo should be a separate package? I don't see a big win here since we will depend on parsec 2 anyway, and since web-routes-wai would need to depend on it anyway to provide the wai related functions that do use PathInfo..
I did add a new parser combinator though:
patternParse :: ([String] -> Either String a) -> URLParser a
so you can do:
fromPathSegments = patternParse parsePB
patternParse consumes all the remaining segments and passes them to parsePB.
So, I've thought about the syntax for this, and I have this idea in mind.
$(createRoutes MyRoutes [$parseRoutes| /: name: Home methods: [GET] /user/#userid: name: User methods: [GET, PUT, DELETE] /static: name: Static subsite: StaticRoutes dispatch: staticRoutes |])
This would generate a datatype:
data MyRoutes = Home | User Integer | Static StaticRoutes
So your idea is to generate the data-type from the routes, rather than try to map the routes onto an existing datatype?
Your approach sounds easier. The advantage of the latter is that you could change the look of the url with out having to go change all your code that uses the URL type.. Not sure how doable the latter is though.
Well, I've started implementing it: it can now generate the data types, but doesn't do the parsing, building and dispatching functions. Those should be fairly simple, but I'm just running out of time (Passover seder in a few hours...). Thought I'd let you have a sneak preview:
http://github.com/snoyberg/web-routes-quasi
Handler functions would be getHome, getUser, putUser, deleteUser. Static
would be a pluggable subsite; I'd have to play around with the syntax of that a bit. Also, this will allow *any* type of application, not just wai (I want this to be as general as possible).
right. I see no reason for it to be wai specific.
Speaking of wai, there is a bug in wai-extra in SimpleServer. It does not put a space between the status code and the status message
~/n-heptane/projects/haskell/web-routes $ curl -v http://localhost:3000/MyHomeoeu * About to connect() to localhost port 3000 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 3000 (#0)
GET /MyHomeoeu HTTP/1.1 User-Agent: curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10 libssh2/0.18 Host: localhost:3000 Accept: */*
< HTTP/1.1 404Not Found * no chunk, no close, no size. Assume close to signal end
Note the second to last line.
- jeremy
Thanks for catching that, bug fix is (as you can imagine) just one line; I'll upload when I have a free moment. Michael

Minor update: I think that YAML syntax for these kinds of routes is a little
bit verbose; any thoughts on this syntax:
/ Home GET
/user/#userid User GET PUT DELETE
/static Static StaticRoutes staticRoutes
/foo/*slurp Foo
/bar/$barparam Bar
First column is the pattern, second is the constructor name, and after that
you have three possibilities:
Nothing is a handler function for any request method. Above, the fourth and
fifth entries.
A list of request methods will allow a handler function for each request
method. Above, the first and second entries.
A datatype and function name, allowing a subsite datatype and subsite
function. Above,the third entry. I'll need to develop this one a bit more.
Michael
On Mon, Mar 29, 2010 at 4:17 PM, Michael Snoyman
On Mon, Mar 29, 2010 at 11:37 AM, Jeremy Shaw
wrote: On Mon, Mar 29, 2010 at 12:16 PM, Michael Snoyman
wrote: The reason I'm unexcited is that I never would have dreamed of defining my routes that way. I don't feel like drawing out this point too much, because you clearly *would* define your routes that way. However, just to draw the distinction in how I would do things differently, I'll use an example of mine that you quoted earlier:
instance Yesod PB where resources = [$mkResources| /: GET: indexHandler /entries/$entryId: GET: entry /entries/$entryId/$filename: GET: media /feed: GET: feed
If I were to convert this to a datatype, it would be:
data PBRoutes = Home | Entry String | File String String | Feed
You do still have nested data-types here. Namely the String. In this case it is trivial to handle by hand, but it does pose a problem for things like TH and Regular. That is why I had PathInfo in the original code in the first place.. I couldn't figure out how to write the TH code with out it.
I simply wouldn't nest a datatype inside any of the constructors. I
understand that you want to do this in some circumstances, but I would simply "duplicate" the parsing code for the Entry and File constructors, since I find that parsing code trivial. In particular:
parsePB ["entries", eid] = Entry eid parsePB ["entries", eid, filename] = File eid filename
I don't see a need for providing a sophisticated parser.
If you are going to duplicate the code instead of calling fromPathSegments, then you don't really need PathInfo at all, right? The current code is designed so that you are not forced to use PathInfo.
Right, I just didn't understand the purpose of PathInfo; you've explained it very clearly now, thank you.
We have:
data Site = Site { ... , parsePathSegments :: [String] -> Either String url }
And you can do:
Site { parsePathSegments = parsePB }
The only real reason to have PathInfo is to build composable parsers as far as I can tell. So, I guess maybe you are suggesting that PathInfo should be a separate package? I don't see a big win here since we will depend on parsec 2 anyway, and since web-routes-wai would need to depend on it anyway to provide the wai related functions that do use PathInfo..
I did add a new parser combinator though:
patternParse :: ([String] -> Either String a) -> URLParser a
so you can do:
fromPathSegments = patternParse parsePB
patternParse consumes all the remaining segments and passes them to parsePB.
So, I've thought about the syntax for this, and I have this idea in mind.
$(createRoutes MyRoutes [$parseRoutes| /: name: Home methods: [GET] /user/#userid: name: User methods: [GET, PUT, DELETE] /static: name: Static subsite: StaticRoutes dispatch: staticRoutes |])
This would generate a datatype:
data MyRoutes = Home | User Integer | Static StaticRoutes
So your idea is to generate the data-type from the routes, rather than try to map the routes onto an existing datatype?
Your approach sounds easier. The advantage of the latter is that you could change the look of the url with out having to go change all your code that uses the URL type.. Not sure how doable the latter is though.
Well, I've started implementing it: it can now generate the data types, but doesn't do the parsing, building and dispatching functions. Those should be fairly simple, but I'm just running out of time (Passover seder in a few hours...). Thought I'd let you have a sneak preview:
http://github.com/snoyberg/web-routes-quasi
Handler functions would be getHome, getUser, putUser, deleteUser. Static
would be a pluggable subsite; I'd have to play around with the syntax of that a bit. Also, this will allow *any* type of application, not just wai (I want this to be as general as possible).
right. I see no reason for it to be wai specific.
Speaking of wai, there is a bug in wai-extra in SimpleServer. It does not put a space between the status code and the status message
~/n-heptane/projects/haskell/web-routes $ curl -v http://localhost:3000/MyHomeoeu * About to connect() to localhost port 3000 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 3000 (#0)
GET /MyHomeoeu HTTP/1.1 User-Agent: curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10 libssh2/0.18 Host: localhost:3000 Accept: */*
< HTTP/1.1 404Not Found * no chunk, no close, no size. Assume close to signal end
Note the second to last line.
- jeremy
Thanks for catching that, bug fix is (as you can imagine) just one line; I'll upload when I have a free moment.
Michael

On Thu, Apr 1, 2010 at 12:19 PM, Michael Snoyman
Minor update: I think that YAML syntax for these kinds of routes is a little bit verbose; any thoughts on this syntax:
/ Home GET /user/#userid User GET PUT DELETE /static Static StaticRoutes staticRoutes /foo/*slurp Foo /bar/$barparam Bar
First column is the pattern, second is the constructor name, and after that you have three possibilities:
Nothing is a handler function for any request method. Above, the fourth and fifth entries. A list of request methods will allow a handler function for each request method. Above, the first and second entries. A datatype and function name, allowing a subsite datatype and subsite function. Above,the third entry. I'll need to develop this one a bit more.
How would this pattern be translated to the new scheme ? /entries/$entryId/$filename: GET: media I am guessing: /entries/$entryId/$filename Media GET And the Media constructor would be: | Media String String ? - jeremy /entries/$entryId/$filename:

On Thu, Apr 1, 2010 at 12:13 PM, Jeremy Shaw
On Thu, Apr 1, 2010 at 12:19 PM, Michael Snoyman
wrote: Minor update: I think that YAML syntax for these kinds of routes is a little bit verbose; any thoughts on this syntax:
/ Home GET /user/#userid User GET PUT DELETE /static Static StaticRoutes staticRoutes /foo/*slurp Foo /bar/$barparam Bar
First column is the pattern, second is the constructor name, and after that you have three possibilities:
Nothing is a handler function for any request method. Above, the fourth and fifth entries. A list of request methods will allow a handler function for each request method. Above, the first and second entries. A datatype and function name, allowing a subsite datatype and subsite function. Above,the third entry. I'll need to develop this one a bit more.
How would this pattern be translated to the new scheme ?
/entries/$entryId/$filename: GET: media
I am guessing:
/entries/$entryId/$filename Media GET
And the Media constructor would be:
| Media String String
?
- jeremy
/entries/$entryId/$filename:
Exactly, with the dispatch function being: getMedia :: String -> String -> Application (Actually, there's most likely going to be some type of argument datatype as well, which is where the complication I alluded to with subsites comes from. However, I'll address this more clearly when there's some actual code to back it up.) Michael

I'm sorry that I didn't have time to read and reply to everything, but Michael, if you decide to go the TH way, why use an intermediate datatype at all? Why not just map the /user/#userid directly to a |handleUser| function? I guess it's really a matter of personal preference. I don't really like TH, and try to avoid it as much as possible (which is why I did the Regular stuff). However, once you go TH, you can go all the way. -chris On 1 apr 2010, at 23:27, Michael Snoyman wrote:
On Thu, Apr 1, 2010 at 12:13 PM, Jeremy Shaw
wrote: On Thu, Apr 1, 2010 at 12:19 PM, Michael Snoyman wrote: Minor update: I think that YAML syntax for these kinds of routes is a little bit verbose; any thoughts on this syntax: / Home GET /user/#userid User GET PUT DELETE /static Static StaticRoutes staticRoutes /foo/*slurp Foo /bar/$barparam Bar
First column is the pattern, second is the constructor name, and after that you have three possibilities:
Nothing is a handler function for any request method. Above, the fourth and fifth entries. A list of request methods will allow a handler function for each request method. Above, the first and second entries. A datatype and function name, allowing a subsite datatype and subsite function. Above,the third entry. I'll need to develop this one a bit more.
How would this pattern be translated to the new scheme ?
/entries/$entryId/$filename: GET: media
I am guessing:
/entries/$entryId/$filename Media GET
And the Media constructor would be:
| Media String String
?
- jeremy
/entries/$entryId/$filename:
Exactly, with the dispatch function being:
getMedia :: String -> String -> Application
(Actually, there's most likely going to be some type of argument datatype as well, which is where the complication I alluded to with subsites comes from. However, I'll address this more clearly when there's some actual code to back it up.)
Michael

On Apr 3, 2010, at 3:37 PM, Chris Eidhof wrote:
I'm sorry that I didn't have time to read and reply to everything, but Michael, if you decide to go the TH way, why use an intermediate datatype at all? Why not just map the /user/#userid directly to a | handleUser| function?
That is what the version in yesod does. But there is second part to the problem. In your code where you use the URLs, you want to be able to ensure that you are only generating valid urls. That is where the type / web-routes comes into play. Let me know if you want more explanation. - jeremy

On Mon, Mar 29, 2010 at 8:47 AM, Michael Snoyman
We'd been discussing the PathInfo class previously; I understand now what you're trying to achieve with it, but I think for a lot of use cases using a parser like that will be unnecesary. For those cases, I'd hate to introduce a parsec dependency, especially given the 2/3 split we're dealing with right now.
web-routes depends on network, and network depends on parsec 2. So we already depend (indirectly) on parsec 2 whether we want to or not. (On debian, for example, to install libghc6-network-dev, libghc6-parsec-dev must also be installed). So, I think that means that adding parsec 2 as a direct dependency should not be an issue in terms of adding any new dependencies? (Ignoring the question as to whether parsec is the right tool for the job in the first place). - jeremy
participants (5)
-
Chris Eidhof
-
Gregory Collins
-
Jeremy Shaw
-
Michael Snoyman
-
Victor Nazarov