
On January 21, 2011 14:01:36 Ryan Ingram wrote:
Interesting little paper, Tyson.
Hi Ryan, Thanks for comments and kind words.
I think what a programmer actually wants from ambiguity resolution is something *predictable*; C++'s system is definitely stretching the boundaries of predictability, but any case where I have to break out a calculator to decide whether the compiler is going to choose specification A or specification B for my program seems like a failure. I'd much rather the solution wasn't always 'the most probable' but at least was easy for me to figure out without thinking too hard.
I think you really hit the nail on the head there. To be useful at all, it is absolutely critical that you don't have to reach for your calculator. Fortunately I believe this is the case. The basic result of the paper was that assuming - self-similarity (functional programming) and - finite-length (typed programs) you get p(n) = 1 / 2^{2n-1) as the probability of a specific shape composed of n elementary components. Note that the measure is defined on the shape: the way elementary types are composed, not what they are. Double and Char are indistinguishable from a shapes perspective, Int -> Double -> Char and (Int -> Double) -> Char are not. An example of the probabilities give same types (shapes) would then be Double: p(1) = 1/2 (elementary components: Double) Char: p(1) = 1/2 (elementary components: Char) [Int]: p(2) = 1/8 (elementary components: [], Int) [Char] -> IO (): p(5) = 1/512 (elementary components: [], Char, ->, IO, ()) As p(n) is monotonically decreasing function in n, the more elementary types the shape is composed of, the less probable it is. I really don't think there could be a much more pleasing result in terms of a predictable measure. It is very simple, and I believe it corresponds well to our intuition. The more stuff in a type, the more complex it is. It may seem strange to have went to such ends for such a simple result, but that is just because these simple examples don't show the depth. Specifically - we gained the ability to encode the intuition into a piece of software, and - we can now go beyond where our intuition starts to break down. To see this last point, consider the shapes represented by Double vs Double -> Int. The formulation says the former shape will arise more frequently in programs, and I imagine we agree. But are we more likely to see the shape (a -> b - > a) -> a -> [b] -> a where a and b are place holders for any internal shape, or Int -> Char -> Int Suddenly it is not so easy for us. The former is a more complex composition, but it is also a less rigid composition. The formulation has no such problem.
The goal is to easily know when I have to manually specify ambiguity resolution and when I can trust the compiler to do it for me. I didn't completely follow the math in your paper, so maybe it turns out simply if it was implemented, but it wasn't clear to me. At the least, I think you should add examples of the types of ambiguity resolution you'd like the compiler to figure out and what your probability measure chooses as the correct answer in each case.
The specific ambiguous situation I was looking at resolving when I came up with the framework was figuring out what application operator to use. Consider various applications incorporating a computation context c (e.g., []) \f -> f :: (a -> b) -> a -> b \f -> fmap f :: (a -> b) -> c a -> c b \f -> f . pure :: (c a -> b) -> a -> b \f -> apply f :: c (a -> b) -> c a -> c b \f -> apply f . pure :: c (a -> b) -> (a -> c b) \f -> apply f . pure . pure :: c (c a -> b)) -> a -> c b \f -> join . func f :: (a -> c b) -> c a -> c b \f -> join . apply f :: c (a -> c b) -> c a -> c b \f -> join . apply f . pure :: c (a -> c b) -> a -> c b \f -> join . apply f . pure . pure :: c (ca -> cb) -> a -> c b where I've listed them in order of increasing structure requirements on c (the bottom four requiring a monad, the prior three require applicative, etc.) In any one situation, more than one application may be possible (such as the "under", "over", "state", "wear" example mentioned by Conor). I want a rigorous argument for ordering these to follow the least surprise principal. I probably should be more specific about this in the paper, but I really liked the framework, and didn't want to pigeon hole it to this specific application. Cheers! -Tyson