Re: [Haskell-cafe] Help on syntactic sugar for combining lazy & strict monads?

31 Jul 2021


      On Thu, 29 Jul 2021, Benjamin Redelings wrote:
...
The idea of changing observation to look like `Observation a -> Dist a -> 
Dist a` is interesting, but I am not sure if this works in practice.  
Generally you cannot actually produce an exact sample from a distribution 
plus an observation.  MCMC, for example, produces collections of samples that 
you can average against, and the error decreases as the number of samples 
increases.  But you can't generate a single point that is a sample from the 
posterior.
Isn't that a problem of the concrete implementation of the probability 
monad (MCMC)? Certainly not something that I, as the modeller, would 
like to be concerned with. Other implementations might not have this 
limitation. 
Are we talking about the same thing, the mathematical conditional 
probability? This has the type I described, so it should not be 
an unusual design choice. Moreover, it is a partial function. What happens 
if I try to condition on an impossible observation in Bali-phy?
...
Maybe it would be possible to change use separate types for distributions 
from which you cannot directly sample?  Something like `Observation a -> 
SampleableDist -> NonsampleableDist a`.
I haven't seen types of much else in your system, so I can not provide 
meaningful insights here. My concern would be that parts of models become 
tainted as Nonsampleable too easily, a trapdoor operation. But this is 
just guesswork, I don't grasp sampling well enough, apparently. Here is my 
attempt, please correct me if this is wrong.

Sampling is a monad morphism from a "true" monad of probability 
distributions (e.g. the Giry monad) to a state monad transforming a 
source of randomness, in such a way that any infinite stream of samples 
has the property that for any measurable set U, the probability measure of 
U equals the limit of the ratio of samples inside and outside of U, as the 
prefix of the infinite stream grows to infinity. 
Conditioning on an observation O should translate to the state monad in a 
way so that the sample-producer is now forbidden to output anything 
outside O. 
Hence conditioning on an impossible observation must produce a state 
transformer that never outputs anything: the bottom function.
...
I will think about whether this would solve the problem with laziness...
-BenRI
On 7/29/21 11:35 PM, Benjamin Redelings wrote:
...
Hi Olaf,
I think you need to look at two things:
1. The Giry monad, and how it deals with continuous spaces.
2. The paper "Practical Probabilistic Programming with Monads" - 
https://doi.org/10.1145/2804302.2804317
Also, observing 2.0 from a continuous distribution is not nonsensical.
-BenRI
On 7/21/21 11:15 PM, Olaf Klinke wrote:
...
...
However, a lazy interpreter causes problems when trying to introduce
*observation* statements (aka conditioning statements) into the monad
[3].  For example,
run_lazy $ do
   x <- normal 0 1
   y <- normal x 1
   z <- normal y 1
   2.0 `observe_from` normal z 1
   return y
In the above code fragment, y will be forced because it is returned, and
y will force x.  However, the "observe_from" statement will never be
forced, because it does not produce a result that is demanded.
I'm very confused. If the observe_from statement is never demanded, then 
what semantics should it have? What is the type of observe_from? It seems 
it is
a -> m a -> m ()
for whatever monad m you are using. But conditioning usually is a function
Observation a -> Dist a -> Dist a
so you must use the result of the conditioning somehow. And isn't the 
principle of Monte Carlo to approximate the posterior by sampling from it? 
I tend to agree with your suggestion that observations and sampling can 
not be mixed (in the same do-notation) but the latter have to be collected 
in a prior, then conditioned by an observation.
What is the semantic connection between your sample and obsersvation 
monad? What is the connection between both and the semantic probability 
distributions? I claim that once you have typed everything, it becomes 
clear where the problem is.
Olaf
P.S. It has always bugged me that probabilists use elements and events 
interchangingly, while this can only be done on discrete spaces. So above 
I would rather like to write
(2.0==) `observe_from` (normal 0 1)
which still is a non-sensical statement if (normal 0 1) is a continuous 
distribution where each point set has probability zero.