Re: [Haskell-cafe] Help on syntactic sugar for combining lazy & strict monads?

30 Jul 2021

      The idea of changing observation to look like `Observation a -> Dist a 
-> Dist a` is interesting, but I am not sure if this works in practice.  
Generally you cannot actually produce an exact sample from a 
distribution plus an observation.  MCMC, for example, produces 
collections of samples that you can average against, and the error 
decreases as the number of samples increases.  But you can't generate a 
single point that is a sample from the posterior.

Maybe it would be possible to change use separate types for 
distributions from which you cannot directly sample?  Something like 
`Observation a -> SampleableDist -> NonsampleableDist a`.

I will think about whether this would solve the problem with laziness...

-BenRI

On 7/29/21 11:35 PM, Benjamin Redelings wrote:
...
Hi Olaf,
I think you need to look at two things:
1. The Giry monad, and how it deals with continuous spaces.
2. The paper "Practical Probabilistic Programming with Monads" - 
https://doi.org/10.1145/2804302.2804317
Also, observing 2.0 from a continuous distribution is not nonsensical.
-BenRI
On 7/21/21 11:15 PM, Olaf Klinke wrote:
...
...
However, a lazy interpreter causes problems when trying to introduce
*observation* statements (aka conditioning statements) into the monad
[3].  For example,
run_lazy $ do
   x <- normal 0 1
   y <- normal x 1
   z <- normal y 1
   2.0 `observe_from` normal z 1
   return y
In the above code fragment, y will be forced because it is returned, 
and
y will force x.  However, the "observe_from" statement will never be
forced, because it does not produce a result that is demanded.
I'm very confused. If the observe_from statement is never demanded, 
then what semantics should it have? What is the type of observe_from? 
It seems it is
a -> m a -> m ()
for whatever monad m you are using. But conditioning usually is a 
function
Observation a -> Dist a -> Dist a
so you must use the result of the conditioning somehow. And isn't the 
principle of Monte Carlo to approximate the posterior by sampling 
from it? I tend to agree with your suggestion that observations and 
sampling can not be mixed (in the same do-notation) but the latter 
have to be collected in a prior, then conditioned by an observation.
What is the semantic connection between your sample and obsersvation 
monad? What is the connection between both and the semantic 
probability distributions? I claim that once you have typed 
everything, it becomes clear where the problem is.
Olaf
P.S. It has always bugged me that probabilists use elements and 
events interchangingly, while this can only be done on discrete 
spaces. So above I would rather like to write
(2.0==) `observe_from` (normal 0 1)
which still is a non-sensical statement if (normal 0 1) is a 
continuous distribution where each point set has probability zero.