On Thu, Mar 25, 2010 at 4:25 PM, Jeremy Shaw <jeremy@n-heptane.com> wrote:

On Thu, Mar 25, 2010 at 12:29 PM, Michael Snoyman <michael@snoyman.com> wrote:

OK, here are my initial code comments:

* Do we want to move everything into Web.URLT? More to the point, I'm not sure I see the point of calling this URLT, since it doesn't really require any monad transformers; maybe we should call it web-routes and then the module would be Web.Routes?

I think Web.Routes is a fine name. I'll make it happen. In the rest of this post I refer to things by the old names, but I do intend to change the module names and rename the package to web-routes.

* I like the PathInfo class and to/fromPathSegments. Perhaps we should bundle that with the decode/encodePathInfo into a single module?

I put PathInfo in a separate module because I am a little dubious of classes these days. I find it a bit annoying that you can only have one PathInfo instance per type. And I think it helps show that using PathInfo is not actually required. But, in practice, I think having less modules is probably a good thing in this case, since it does not affect the dependency chain at all. Just because I *can* put every function in it's own module doesn't mean I should. ;) Also, we probably do want people to provide PathInfo instances, even if they don't have to..

I also am beginning to share a mistrust of classes; I think I went a little too overboard on them on a few previous packages (namely, convertible-text) and am now having a reaction in the opposite direction. I'm sure one day I'll find the Golden Path...

* I'd like to minimize dependencies as much as possible for the basic package. The two dependencies I've noticed are Consumer and applicative-extras. I think the type signatures would be clearer *without* those packages included, eg:

   fromPathSegments :: [String] -> Either ErrMsg a

Except that is not a usable type. fromPathSegments may consume, some, but not all of the path segments. Consider the type:

data SiteURL = Foo Int Int

fromPathSegments is going to receive the path segments:

["Foo","1","2"]

If you wrote a parser by hand, you would want it to look a little something like:

do string "Foo"
   slash
   i <- fromPathSegments
   slash

   j <- fromPathSegments
   eol
   return (Foo i j)

The key concept here is that when you call fromPathSegments to get the first argument of Foo you need to know how many of the path segments were consumed / are remaining, so you can pass only those segments to the second fromPathSegments.

So you really need a type like:

   fromPathSegments :: [String] -> (Either ErrMsg a, [String])

which outputs the unconsumed path segments.

Well, given that as a criterion, I agree with the rest of your analysis entirely. However, I think we're looking at the purpose of fromPathSegments very differently. I'm not quite certain I understand why we would want to output the unconsumed segments; if something is unconsumed, then it seems like it's an invalid URL and should fail.

In your example, if I request "/Foo/5/6/7", fromPathSegments would return (Right (Foo 5 6), ["7"]); but what is going to consume that 7 now? The use case I envisioned for something like this is:

data BlogRoutes = ...
data MySite = MyHome | MyBlog BlogRoutes
fromPathSegments ("blog":rest) = MyBlog `fmap` fromPathSegments

But this is obviously a ripe target for a monad of some sort -- trying keep track of the unconsumed portions by hand seems like it would asking for trouble...

The Consumer monad takes care of that and provides the functions you would expect such as, next, peek, and poke. And it seems nice to be able to use Monad, MonadPlus, Applicative, Alternative, etc, for composing fromPathSegments into larger parsers ?

But, perhaps there is a better choice of monad, or a better way of dealing with the problem? Or maybe it's not really a problem?

I think Failing is a pretty nifty data-type for dealing with errors. But perhaps it is not a big win here.. The #1 thing that makes Failing better than (Either [String] a) is it's Applicative instance. Specifically, Failing will accumulate and return all the errors which have occurred, not the just first failure (which is the behavior for Applicative (Either e)).

So for example, let's say you are doing are trying to lookup a bunch of keys from the query string. The key / value pairs in the query string are typically independent of each other. So let's say you do:

(,) <$> lookup "foo" <*> lookup "bar"

but neither of those keys exist. With Either you will only get the error 'could not find "foo"'. But with Failing you will get the error 'could not find "foo". could not find "bar"'. It is nice to get a report of all the things that are broken, instead of getting only one error at a time, fixing it, and then getting another error, etc.

However, I am not sure if this property is all that useful which urlt. If you are trying to parse a url like:

(string "Foo" *> Foo) <$> fromPathSegments <*> fromPathSegments

And the parsing of "Foo" fails.. then there is no use in finding out if the other segments parse ok -- because they are likely to be garbage. Maybe it failed because it got the string "FOo" instead of "Foo", but more likely it got something completely unrelated like, /bar/c/2.4.

So, perhaps Either is a better choice even with out considering dependencies... I think that Applicative / Alternative instances for Either are only defined in transformers in the Control.Monad.Error module -- which is a bit annoying. But we don't actually need those to implement urlt itself.

This brings up another detail though.

the fromPathSegments / Consumer stuff is basically implementing a parser. Except, unlike something like parsec, we do not keep track of the current position for reporting errors. I wonder if we should perhaps use a slightly richer parser environment. Within a web app, once you got your to/from instances debugged, you will never get a parse error, so having great error messages is not essential. But, for other people linking to your site it could be potentially helpful. Though, it seems like the current error messages out to be sufficient given how short the urls are..

I don't think fancy error reporting will help here. More to the point: we could always layer a fancy parser on top of a simpler typeclass. For that matter, the same argument can be made for Failing and Consumer.

I'm not certain what exactly the type of ErrMsg should be here; I don't really have a problem using [String], which would be close to the definition of Failing.

* I think it's very important to allow users to supply customized 404 pages. Essentially, we need to augment handleWai (possibly others) with a (ErrMsg -> Application) parameter.

Yeah, there are (at least) two possibilities, add an extra param for the handler. Or bubble the error up to the top:

handleWai_1 :: (url -> String) -> (String -> Failing url) -> String -> ([ErrorMsg] -> Application) -> ((url -> String) -> url -> Application) -> Application

handleWai_1 fromUrl toUrl approot handleError handler =
  \request ->
   do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request
   case fUrl of
   (Failure errs) -> handleError errs request

   (Success url) -> handler (showString approot . fromUrl) url request

handleWai_2 :: (url -> String) -> (String -> Failing url) -> String -> ((url -> String) -> url -> Application) -> (Request -> IO (Failing Response))

handleWai_2 fromUrl toUrl approot handler =
  \request ->
   do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo request
   case fUrl of
   (Failure errs) -> return (Failure errs)

   (Success url) -> fmap Success $ handler (showString approot . fromUrl) url request

The second choice is perhaps more flexible. Which do you prefer? In the first option, the handleError function could be a Maybe value -- and if you supply Nothing you get some default 404 page?

I personally prefer the first option exactly as you describe it, but you're also correct that the second is more flexible. If anyone else reading this thread would prefer the second, speak now or forever hold your peace ;).

In happstack we have a third possiblity. The ServerMonad is an instance of MonadPlus so we can throw out the error message and just call mzero:

implSite :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m a

implSite domain approot siteSpec =
  do r <- implSite_ domain approot siteSpec
   case r of
   (Failure _) -> mzero
   (Success a) -> return a

implSite_ :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String -> FilePath -> Site url String (m a) -> m (Failing a)
implSite_ domain approot siteSpec =
   dirs approot $ do rq <- askRq

   let pathInfo = intercalate "/" (rqPaths rq)
   f = runSite (domain ++ approot) siteSpec pathInfo
   case f of

   (Failure errs) -> return (Failure errs)
   (Success sp) -> Success <$> (localRq (const $ rq { rqPaths = [] }) sp)

then we can do:

msum [ implSite "domain" "approot" siteSpec
   , default404
   ]

if implSite calls mzero, then the next handler (in this case default404) is tried.

* It might be nice to have "type WaiSite url = Site url String Application". By the way, are you certain you want to allow parameterization over the pathInfo type?

I'm not certain I don't want to allow it... I have a vague notion that I might want to use Text sometimes instead of String. Though if I was really committed to that then I should make toPathInfo and fromPathInfo parameterized over pathInfo as well... So perhaps I will axe it from Site for now. I need to change the name of that type and it's record names too I think.

Referring to the fear of typeclasses mentioned above: I'd like to avoid MPTCs even more so. In fact, as I look at it, each extra parameter we add creates more potential for incompatible components. For instance, I can see an argument being made to use extensible exceptions for the fromPathSegments return type, but I'd rather keep things standard with [String] than create more division.

The only packages that I feel qualified to speak about then are urlt and urlt-wai, and my recommendation would be:

urlt contains decode/encodePathInfo, PathInfo class and related functions, Site and related functions. If you agree on allowing the parameterization of 404 errors, then also provide a default 404 error.

urlt-wai contains WaiSite, handleWai and related functions.

Yeah, that is what I was thinking. urlt would contain what is currently in;

URLT.Base
URLT.PathInfo
URLT.HandleT
URLT.Monad
URLT.QuickCheck

QuickCheck module does not actually depend on QuickCheck, which is nice because QC1 vs QC2 is a big problem right now.

It might also be nice to include:

URLT.TH

with depends on template-haskell. But I am not sure that depending on template-haskell is an issue because template-haskell comes with ghc6, and the code in URLT.TH already handles the breakage that happened with TH 2.4.

I have a different motive for keeping the TH code out: it seems like all of the other pieces of code should be relatively stable from early on, while the TH code (and quasi-quoting, and regular) will probably have some major changes happening for a while. It would be nice to have a consistent major release number for long periods of time on the core.

If I switch to Either instead of Failing I believe the dependencies would be:

base, Consumer, template-haskell, network, utf8-string

urlt-wai would just include:

URLT.Wai

Sounds great. Let me know when this is available for review. If you want me to do any of the merging/renaming, I have some time now (I arrived in southern California at 3:30 in the morning...).

Michael