Thank you for highlighting these problems; I should really test my own code
more thoroughly. After reading these most recent examples, the translation to
existing monads is definitely too neat, and a lot of semantics of the monad are
'defaulted' on. In particular for the probability monad examples I see I had
the mistaken impression that it would preserve the random-world semantics of
the do-notation whereas autolifting actually imposes a random-evaluation
semantics, which would not be how I envision an autolifted probabilistic DSL.

Overall, I think I pretty much got caught up in how cool it was going to be to
use <$>, <*>, join/enter/exit as primitives to make any monad-based DSL work
'concisely' in an environment of existing typeclasses. That is kind of the
thing I want to do at a high level; implement DSLs like probabilistic
programming languages as primitives that that play transparently with existing
expressions.

But now it seems clear to me that this autolifting approach will not be useful
with any monad where it is important to control sharing and effects, which is
critical in the probability monad (and all others I can think of); in fact it
seems necessary to incur the do-notation 'overhead' (which I now see is not
overhead at all!) to work with them meaningfully at all, no matter how 'pure'
they look in other settings. Because of this we see that the Prob monad as it
is defined here is mostly unusable with autolifting. Again, thanks for the
examples; I think I now have a much better intuition for do/bind and why they
are required.

At this point, however, I would still want to see if it is possible to do the
autolifting in a more useful manner, so that the user still has some control
over effects. Things like the share combinator in the paper you linked will
probably be very useful. I will definitely go over it in detail.

From my previous experience however, this might also be accomplished by
inserting something between the autolifting and the target monad. 

I think it would be more helpful now to talk more about where I'm coming from.
Indeed, the probability monad examples feature heavily here because I'm coming
off of implementing a probabilistic programming language in Python that worked
through autolifting, so expressions in it looked like host language
expressions. It preserved the random-world semantics because it was using a
"quote"-like applicative functor to turn a function composition in the language
into an expression-tree rep of the same. I am not sure yet how to express it in
Haskell (as I need to get more comfortable with GADTs), but pure would take a
term to an abstract version of it, and fmap would take a function and abstract
term to an abstract term representing the answer. One would then have the call
graph available after doing this on lifted functions. I think of this as an
automatic way of performing the 'polymorphic embedding' referenced in 

[Hofer et al 2008] Polymorphic Embedding of DSLs.

By keeping IDs on different abstract terms, expressions like X + X (where X =
coin 0.5) would take the proper distributions under random-world semantics. 

In general for monads where the control of sharing is important, it can be seen
as limiting the re-running of effects to one per name. Each occurence of a name
using the same unwrapped value. 

I had the initial impression, now corrected, that I could just come up with an
autolifting scheme in Haskell, use it with the usual probability monad and
somehow get this random-world semantics for free. No; control of sharing and
effects is in fact critical, but could be done through using the autolifting as
a way to turn expressions into a form where control of them is possible. 

For now, though, it looks like I have a lot of things to read through. 

Again, thanks Oleg and everyone else for all the constructive feedback. This
definitely sets a personal record for misconceptions corrected / ideas
clarified per day. 

On Wed, Nov 17, 2010 at 12:08 AM, <oleg@okmij.org> wrote:

Let me point out another concern with autolifting: it makes it easy to
overlook sharing. In the pure world, sharing is unobservable; not so
when effects are involved.

Let me take the example using the code posted in your first message:

> t1 = let x = 1 + 2 in x + x

The term t1 is polymorphic and can be evaluated as an Int or as a
distribution over Int:


> t1r = t1 ::Int        -- 6
> t1p = t1 ::Prob Int   -- Prob {getDist = [(6,1.0)]}

That looks wonderful. In fact, too wonderful. Suppose later on we
modify the code to add a non-trivial choice:

> t2 = let x = coin 0.5 + 1 in x + x
> -- Prob {getDist = [(4,0.25),(3,0.25),(3,0.25),(2,0.25)]}

The result isn't probably what one expected. Here, x is a shared
computation rather than a shared value. Therefore, in (x + x)
the two occurrences of 'x' correspond to two _independent_ coin flips.
Errors like that are insidious and very difficult to find. There are
no overt problems, no exceptions are thrown, and the results might
just look right.

Thus the original code had to be translated into monadic style more
carefully: `let' should not have been translated as it was. We should
have replaced let with bind, using either of the following patterns:

> t2b1 = do x <- coin 0.5 + 1; return $ x + x
> -- Prob {getDist = [(4,0.5),(2,0.5)]}

> t2b2 = coin 0.5 + 1 >>= \xv -> let x = return xv in x + x
> -- Prob {getDist = [(4,0.5),(2,0.5)]}

After all, let is bind (with the different order of arguments): see
Moggi's original computational lambda-calculus.

Our example points out that monadifying Int->Int function as
m Int -> m Int can be quite dangerous. For example, suppose we have
a pure function fi:

> fi :: Int -> Int
> fi x = x + x

and we simple-mindedly monadified it:

> fp :: Monad m => m Int -> m Int
> fp x = liftM2 (+) x x

We can use it as follows: after all, the function accepts arguments of
the type m Int:

> tfp = fp (coin 0.5)
> -- Prob {getDist = [(2,0.25),(1,0.25),(1,0.25),(0,0.25)]}

The result shows two independent coin flips. Most of the readers of
the program will argue that in an expression (x + x), two occurrences
of x should be _correlated_. After all, that's why we use the
same name, 'x'. But they are not correlated in our translation.

Furthermore, translating let as bind is suboptimal. Consider

> t3b2 = coin 0.5 + 1 >>= \xv -> let x = return xv in (5 :: Prob Int)
> -- Prob {getDist = [(5,0.5),(5,0.5)]}

although we don't use the result of a coin flip, we performed the coin
flip nevertheless, doubling the search space. We know that such
doubling is disastrous even for toy probabilistic problems.

These issues are discussed at great length in the paper with Sebastian
Fischer and Chung-chieh Shan, ICFP 2009.

http://okmij.org/ftp/Computation/monads.html#lazy-sharing-nondet