
The idea of changing observation to look like `Observation a -> Dist a -> Dist a` is interesting, but I am not sure if this works in practice. Generally you cannot actually produce an exact sample from a distribution plus an observation. MCMC, for example, produces collections of samples that you can average against, and the error decreases as the number of samples increases. But you can't generate a single point that is a sample from the posterior. Maybe it would be possible to change use separate types for distributions from which you cannot directly sample? Something like `Observation a -> SampleableDist -> NonsampleableDist a`. I will think about whether this would solve the problem with laziness... -BenRI On 7/29/21 11:35 PM, Benjamin Redelings wrote:
Hi Olaf,
I think you need to look at two things:
1. The Giry monad, and how it deals with continuous spaces.
2. The paper "Practical Probabilistic Programming with Monads" - https://doi.org/10.1145/2804302.2804317
Also, observing 2.0 from a continuous distribution is not nonsensical.
-BenRI
On 7/21/21 11:15 PM, Olaf Klinke wrote:
However, a lazy interpreter causes problems when trying to introduce *observation* statements (aka conditioning statements) into the monad [3]. For example,
run_lazy $ do x <- normal 0 1 y <- normal x 1 z <- normal y 1 2.0 `observe_from` normal z 1 return y
In the above code fragment, y will be forced because it is returned, and y will force x. However, the "observe_from" statement will never be forced, because it does not produce a result that is demanded.
I'm very confused. If the observe_from statement is never demanded, then what semantics should it have? What is the type of observe_from? It seems it is a -> m a -> m () for whatever monad m you are using. But conditioning usually is a function Observation a -> Dist a -> Dist a so you must use the result of the conditioning somehow. And isn't the principle of Monte Carlo to approximate the posterior by sampling from it? I tend to agree with your suggestion that observations and sampling can not be mixed (in the same do-notation) but the latter have to be collected in a prior, then conditioned by an observation.
What is the semantic connection between your sample and obsersvation monad? What is the connection between both and the semantic probability distributions? I claim that once you have typed everything, it becomes clear where the problem is.
Olaf
P.S. It has always bugged me that probabilists use elements and events interchangingly, while this can only be done on discrete spaces. So above I would rather like to write (2.0==) `observe_from` (normal 0 1) which still is a non-sensical statement if (normal 0 1) is a continuous distribution where each point set has probability zero.