
On Thu, 29 Jul 2021, Benjamin Redelings wrote:
The idea of changing observation to look like `Observation a -> Dist a -> Dist a` is interesting, but I am not sure if this works in practice. Generally you cannot actually produce an exact sample from a distribution plus an observation. MCMC, for example, produces collections of samples that you can average against, and the error decreases as the number of samples increases. But you can't generate a single point that is a sample from the posterior.
Isn't that a problem of the concrete implementation of the probability monad (MCMC)? Certainly not something that I, as the modeller, would like to be concerned with. Other implementations might not have this limitation. Are we talking about the same thing, the mathematical conditional probability? This has the type I described, so it should not be an unusual design choice. Moreover, it is a partial function. What happens if I try to condition on an impossible observation in Bali-phy?
Maybe it would be possible to change use separate types for distributions from which you cannot directly sample? Something like `Observation a -> SampleableDist -> NonsampleableDist a`.
I haven't seen types of much else in your system, so I can not provide meaningful insights here. My concern would be that parts of models become tainted as Nonsampleable too easily, a trapdoor operation. But this is just guesswork, I don't grasp sampling well enough, apparently. Here is my attempt, please correct me if this is wrong. Sampling is a monad morphism from a "true" monad of probability distributions (e.g. the Giry monad) to a state monad transforming a source of randomness, in such a way that any infinite stream of samples has the property that for any measurable set U, the probability measure of U equals the limit of the ratio of samples inside and outside of U, as the prefix of the infinite stream grows to infinity. Conditioning on an observation O should translate to the state monad in a way so that the sample-producer is now forbidden to output anything outside O. Hence conditioning on an impossible observation must produce a state transformer that never outputs anything: the bottom function.
I will think about whether this would solve the problem with laziness...
-BenRI
On 7/29/21 11:35 PM, Benjamin Redelings wrote:
Hi Olaf,
I think you need to look at two things:
1. The Giry monad, and how it deals with continuous spaces.
2. The paper "Practical Probabilistic Programming with Monads" - https://doi.org/10.1145/2804302.2804317
Also, observing 2.0 from a continuous distribution is not nonsensical.
-BenRI
On 7/21/21 11:15 PM, Olaf Klinke wrote:
However, a lazy interpreter causes problems when trying to introduce *observation* statements (aka conditioning statements) into the monad [3]. For example,
run_lazy $ do x <- normal 0 1 y <- normal x 1 z <- normal y 1 2.0 `observe_from` normal z 1 return y
In the above code fragment, y will be forced because it is returned, and y will force x. However, the "observe_from" statement will never be forced, because it does not produce a result that is demanded.
I'm very confused. If the observe_from statement is never demanded, then what semantics should it have? What is the type of observe_from? It seems it is a -> m a -> m () for whatever monad m you are using. But conditioning usually is a function Observation a -> Dist a -> Dist a so you must use the result of the conditioning somehow. And isn't the principle of Monte Carlo to approximate the posterior by sampling from it? I tend to agree with your suggestion that observations and sampling can not be mixed (in the same do-notation) but the latter have to be collected in a prior, then conditioned by an observation.
What is the semantic connection between your sample and obsersvation monad? What is the connection between both and the semantic probability distributions? I claim that once you have typed everything, it becomes clear where the problem is.
Olaf
P.S. It has always bugged me that probabilists use elements and events interchangingly, while this can only be done on discrete spaces. So above I would rather like to write (2.0==) `observe_from` (normal 0 1) which still is a non-sensical statement if (normal 0 1) is a continuous distribution where each point set has probability zero.