Re: [Haskell-cafe] Non-deterministic function/expression types in Haskell?

Long story short: You can stay inside Haskell's type system or extend it, but you end up with monads either way. I'll explain why.
(a) why are monads a perfect solution to my problem? Two reasons. First: Probability distributions do form a monad [Giry,CS,Kock]. It's a mathematical fact. So why not exploit it? It gives you powerful combinators and powerful abstractions. Second: Because the requirements on the type modificators [N] and [D] are describing many features monads have. First observe that you don't really need [D] because any deterministic computation can be embedded into the non-deterministic computations by making the set of possibilities a singleton. That is precisely what 'return' does in the context of monads. (Think of return for the list monad.) any function that takes a non-deterministic input must have a non-deterministic output. That is common with monads, too: There is no generic function that can extract values out of monads. we should merge variables with identical values if the types are non-deterministic. That is problematic on several levels. First, is "identical values" decidable by the compiler? Second, equality might depend on the implementation, which might change. For example, (normal 0 1) could contain a random number generator with an intrinsic state. sample:: (()->a[N]) -> a[N] Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it. iid:: Int -> (() -> a[N]) -> [a[N]] The function Control.Monad.sequence plays this role for monads, with a slightly different type signature. Control.Monad.sequence . Data.List.repeat :: Monad m => m a -> m [a] Let the name not deceive you: It is not about sequential computations, rather about ordering the random variables sequentially.
(b) can we extend the HM type system to support non-determinism directly? Yes, you can. But then you have a different language. There many publications describing that [Draheim,Lago,Borgström,RP]. In a nutshell, start with the simply typed lambda calculus and add an operation
choose :: I -> a -> a -> a where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument. However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things: (1) Define what the compiled program should do at runtime (operational semantics). (2) Define what the program means, mathematically (denotational semantics). For (1), the prevailing approach seems to be to equate a probabilistic term with its behaviour under sampling. Markov Chains are popular [MCMC]. I read it's non-trivial, however, to find a Markov chain that behaves according to a mathematically defined probability distribution. The probability that you know more about that than I do is 1. For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain". There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't. Finally, you might want to play with non-determinism other than the probabilistic one. For example, there is the infinite-search package on hackage, which provides a monad of plain non-deterministic choice beyond finite lists. It is even possible to define a Haskell probability monad in the same spirit. I can provide some code if you wish. Regards, Olaf References [Giry] http://dx.doi.org/10.1007/BFb0092872 [CS] http://dx.doi.org/10.1007/s10485-013-9324-9 [Kock] http://www.tac.mta.ca/tac/volumes/26/4/26-04.pdf [Draheim] http://www.springer.com/de/book/9783642551970 [Lago] https://arxiv.org/abs/1104.0195 [Borgström] https://arxiv.org/abs/1512.08990 [RP] http://www.cs.tufts.edu/~nr/pubs/pmonad.pdf [MCMC] http://okmij.org/ftp/kakuritu/Hakaru10/index.html [DOM] http://www.worldscientific.com/worldscibooks/10.1142/6284

Hi Olaf, Thanks! This was helpful. I'll engage below: On 01/13/2018 04:45 PM, Olaf Klinke wrote:
Long story short: You can stay inside Haskell's type system or extend it, but you end up with monads either way. I'll explain why. I might see what you mean about ending up with monads either way. Specifically, I think its important to separate random distributions from their random samples, which might correspond (in a Random-like monad) to the difference between an action and performing the action. Is this something like what you are saying? I think there may still be some extra value to the type system I mentioned in performing CSE (common sub-expression elimination) safely.
(a) why are monads a perfect solution to my problem? Two reasons. First: Probability distributions do form a monad [Giry,CS,Kock]. It's a mathematical fact. So why not exploit it? It gives you powerful combinators and powerful abstractions.
They certainly do. I've written a haskell interpreter, and my definitions for a (very hacky) Random monad are here, with two different interpreters for it (sample and sample') below: https://github.com/bredelings/BAli-Phy/blob/9c96374013453fe382e609cca357a2c0... I'm currently exploiting the monadic structure. For example, given a distribution dist, I basically use 'sequence (replicate n dist)' to sample n i.i.d. values from it. So, to some extent I could see the fact that Monads provide a 'sequence' function as a benefit. In the framework I proposed we would have to write 'map sample (replicate n dist)', which does not seem very burdensome though. The fact that you have to run these distributions inside of an interpreter also makes some things easy that would be difficult to do otherwise, since the interpreter (a) can carry around modifiable state and (b) creates a call stack, like in call-by-value languages. In contrast if you say something like "let {x = normal 0 1 ()} in x*x", then it doesn't really have a call chain, since the thunk for x can get forced from multiple different contexts that were not responsible for the allocation of the thunk on the heap. Instead it seems that each heap location has a let-allocation chain, but without any intepreter state threaded through the chain. Does that make sense? Is there any literature on the let-allocation chain? It seems like this would come up during debugging. Despite all this, it still seems to me that there might be reasons not to "exploit" the monadic structure of probability distributions, at least not in the traditional fashion. See below.
Second: Because the requirements on the type modificators [N] and [D] are describing many features monads have. First observe that you don't really need [D] because any deterministic computation can be embedded into the non-deterministic computations by making the set of possibilities a singleton. That is precisely what 'return' does in the context of monads. (Think of return for the list monad.)
any function that takes a non-deterministic input must have a non-deterministic output. That is common with monads, too: There is no generic function that can extract values out of monads. Hmm.... one reason I'm hesitant to "exploit" the monadic structure of Random is that I don't want to have the [D] values outside the monad and [N] values inside it. The need to use a function like "return" to lift (if that is the right terminology) non-monadic values into the monad, and the need to run [N] terms in an interpreter to un-lift values of the monad seems to be more of an obstruction than a benefit. It means that you can't do things that you could do with normal haskell functions.
For example, with a normal haskell function you could write `let {x = f 0 x} in x`. However, with a monadic object you can't write `do {x <- f 0 x; return x}`. Hmm.... I guess you could maybe do `let {x = interpret $ do {y <- f 0 x; return y}} in x` though, where 'interpret' is an interpreter for the monad. Hmm.... I don't know. This seems weird. It certainly seems more verbose than `let {x = sample $ f 0 x} in x`. OK! So, let's say that 'unsafePerformIO' is an interpreter for the IO monad. In my non-monadic framework, I am suggesting that we define: sample:: (() -> a[N]) -> a[N] sample dist = dist () In the monadic framework, we define something like random_iterpreter = Random a -> IO a sample :: Random a -> a sample dist = (unsafePerformIO . random_iterpreter) dist Then, I think we get equivalent *expressiveness* without modifying the HM type system. Furthermore, I think that (unsafePerformIO . random_interpreter) can be completely safe if we imagine that we can generate truly random variables somehow. However, I think that the extension to the HM type system still is useful in solving some problems that are created by using unsafePerformIO with CSE (see below).
we should merge variables with identical values if the types are non-deterministic. That is problematic on several levels. First, is "identical values" decidable by the compiler? Second, equality might depend on the implementation, which might change.
I'm assuming we only consider expressions equivalent if they have the same syntax tree. (So that should avoid problems with overloading ==). This leaves the second problem, you mention:
For example, (normal 0 1) could contain a random number generator with an intrinsic state. If we have "let {x=normal 0 1;y=normal 0 1} in E" then the two normal 0 1 actions could be executed in either order. However, for random samples I think that is not a problem in some situations, since the distributions would be the same. However, regardless of whether we care about ordering, it is definitely wrong to merge x and y to get "let {x=normal 0 1} in E[y := x]". I think this is the problem you are talking about.
But this problem is exactly what I am proposing a solution for! The idea is that 'normal' would have type 'double -> double -> double[N]', and therefore merging the expressions would be prohibited by the rule that says we cannot merge two identical expressions of type a[N] (see my original e-mail). Interestingly, this could maybe used to handle cases like "let {x = unsafePerformIO $ readchar file; y = unsafePerformIO $ readchar file} in E". We could define unsafePerformIO as having type unsafePerformIO: IO a -> a[N]. This would NOT solve the problem that the code could perform the readchar's in either order, but it WOULD avoid merging x & y. I guess one question is: does GHC avoid this merger already? And, if so, does it avoid this merger by refusing to merge variables? If GHC refuses to merge variables with identical ASTs that call unsafePerformIO then I would assert that it is ALREADY using the type system I am proposing.
sample:: (()->a[N]) -> a[N] Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it. Hmm... I'm not sure this is right. I agree that type t[D] is isomorphic with () -> t[D], because it is easy enough to come up with an isomorphism f where (f (f_inverse value)) = value, and also (f_inverse (f (f_inverse value))) = value.
f::forall level.(()->t[level]) -> t[level] f dist = dist () f::forall level.t[level] -> () -> t[level] f_inverse value = \() -> value But the whole point of having t[N] is that ()->t[N] should not be isomorphic to t[N]. Thus, if we evaluate let {x = f (f_inverse E); y = f (f_inverse E) in F} where E is non-deterministic, then I think x and y can be different. I think this means that f is not an isomorphism when applied to non-deterministic expression, so that you should say "disregarding non-terminating or non-deterministic" computations. There are some complications here, in that I am quantifying over [N,D] levels in the definitions of f and f_inverse, so that they take their [N,D] levels from their arguments. I am treating variables as non-values, since they stand for entire expressions that might be substituted for them. My hope is that this allows placing the [N,D] levels on the input and result types instead of on the arrow, but there could be problems with this. It is counter-intuitive, since normal 0 1 () could yield the value 2, and 2 itself is not random, but was obtained randomly. But I think that the system works if implemented, though it might not match the standard interpretation of types?
iid:: Int -> (() -> a[N]) -> [a[N]] The function Control.Monad.sequence plays this role for monads, with a slightly different type signature. Control.Monad.sequence . Data.List.repeat :: Monad m => m a -> m [a] Let the name not deceive you: It is not about sequential computations, rather about ordering the random variables sequentially.
Hmm... I don't completely understand this. Are you saying that E1 >>= (\x -> E2) does not require that E1 is performed before E2? That seems possible only if E2 does not use x. But maybe I'm missing something.
(b) can we extend the HM type system to support non-determinism directly? Yes, you can. But then you have a different language. There many publications describing that [Draheim,Lago,Borgström,RP]. In a nutshell, start with the simply typed lambda calculus and add an operation
choose :: I -> a -> a -> a
where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument. I'm not completely sure of the role of the real number here. Are we saying that 'choose 0.6 0 1' would (for example) return 0 with
However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things: (1) Define what the compiled program should do at runtime (operational semantics). (2) Define what the program means, mathematically (denotational semantics). Hmm.... yeah I think the denotational semantics could be quite hard. I
Hmm.... I intentionally attempt to define "non-deterministic" expressions instead of "probabilistic" expressions precisely to avoid this problem :-) Thus, my version of 'choose' simply states that 'choose 1 2' can return either 1 or 2. It does not say anything about the probability of returning either choice, only that both are "valid" reductions. Therefore, my version of choose would have type: choose: a -> a -> a[N]. probability 0.6 and 1 with probability 0.4? It seems that you could maybe define a function that takes 2 random numbers: choose :: I -> I -> a -> a -> a where (a) choose u p 0 1 would return 0 if u
IO (). This makes the program deterministic if we know the value of the real number, but probabilistic if we supply a uniform 0 1 random number. think that Chung Chieh Shan is working on some aspect of this this, especially measures on R^n.
For (1), the prevailing approach seems to be to equate a probabilistic term with its behaviour under sampling. Markov Chains are popular [MCMC]. I read it's non-trivial, however, to find a Markov chain that behaves according to a mathematically defined probability distribution. The probability that you know more about that than I do is 1. Ha ha :-) Yes, designing MCMC approaches to sample from a distribution is hard. Obviously for some expressions like sampling from a normal you don't need MCMC. I would imagine that one approach to this is simply only allow primitives that you can get by transforming a Uniform[0,1] random variable, or perhaps fairly simple rejection sampling. Then maybe MCMC-type things would be implemented as probabilistic programs (e.g. not primitive operations).
For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain". Hmm... I'm guessing (having not looked at the papers below yet except kind of the Haraku paper) that the powerdomain for Double is the measurable sets (aka Borel sets) for the Reals... There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't. Interesting. I'll take a look at the papers you mention. I have wondered where some of the restrictions come from.
Finally, you might want to play with non-determinism other than the probabilistic one. For example, there is the infinite-search package on hackage, which provides a monad of plain non-deterministic choice beyond finite lists. It is even possible to define a Haskell probability monad in the same spirit. I can provide some code if you wish. Thanks so much for explaining things to me in detail, and for the helpful links!
-BenRI
Regards, Olaf
References [Giry] http://dx.doi.org/10.1007/BFb0092872 [CS] http://dx.doi.org/10.1007/s10485-013-9324-9 [Kock] http://www.tac.mta.ca/tac/volumes/26/4/26-04.pdf [Draheim] http://www.springer.com/de/book/9783642551970 [Lago] https://arxiv.org/abs/1104.0195 [Borgström] https://arxiv.org/abs/1512.08990 [RP] http://www.cs.tufts.edu/~nr/pubs/pmonad.pdf [MCMC] http://okmij.org/ftp/kakuritu/Hakaru10/index.html [DOM] http://www.worldscientific.com/worldscibooks/10.1142/6284

Benjamin, I am a domain theorist and not an expert on writing compilers/interpreters. Hence I can not comment on your proposals of call-by-value and interpreter state. What I wanted to say with my catchphrase is that you have to do the same amount of work. But in the first case, where you have a DSL, the work (= the random monad) is visible to the programmer, whereas in the second case only the author of the compiler gets to see the monad. But the denotational semantics can be the same. If you go for a DSL (which is specifically what you want ot avoid), then a Kleisli category is how it is done. Look at Kleisli in Control.Arrow. For non-probabilistic non-determinism, take the category (Kleisli []). Every morphism a -> b then really is a Haskell function a -> [b], but the Arrow notation hides this. Then choose has the rather trivial implementation choose :: Kleisli [] (a,a) a choose = Kleisli (\(x,y) -> [x,y]) You could abstract this into a type class: class (Arrow a) => Nondet a where choose :: a (x,x) x Let's try this: import Prelude hiding ((.),id) import Control.Category import Control.Monad data KleisliDN m x y = D (x -> y) | N (x -> m y) instance Monad m => Category (KleisliDN m) where id = D id (D g) . (D f) = D (g . f) (N g) . (N f) = N (g <=< f) (D g) . (N f) = N (liftM g . f) (N g) . (D f) = N (g . f) data DN m x = Det x | Nondet (m x) Observe that Kleisli m () x is isomorophic to DN m x, so here you have explicit tagging with D and N on the value-level. If you want the compiler to reject certain combinations of D and N, then write a typeclass class (Category c, Category d) => Subcategory c d where liftMorphism :: c a b -> d a b instance Monad m => Subcategory (->) (Kleisli m) where liftMorphism f = Kleisli (return.f) and use the category (->) for D and Kleisli m for N.
Am 15.01.2018 um 19:17 schrieb Benjamin Redelings
: But this problem is exactly what I am proposing a solution for! The idea is that 'normal' would have type 'double -> double -> double[N]', and therefore merging the expressions would be prohibited by the rule that says we cannot merge two identical expressions of type a[N] (see my original e-mail).
Interestingly, this could maybe used to handle cases like "let {x = unsafePerformIO $ readchar file; y = unsafePerformIO $ readchar file} in E". We could define unsafePerformIO as having type
unsafePerformIO: IO a -> a[N].
This would NOT solve the problem that the code could perform the readchar's in either order, but it WOULD avoid merging x & y. I guess one question is: does GHC avoid this merger already? And, if so, does it avoid this merger by refusing to merge variables? If GHC refuses to merge variables with identical ASTs that call unsafePerformIO then I would assert that it is ALREADY using the type system I am proposing.
sample:: (()->a[N]) -> a[N] Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it. Hmm... I'm not sure this is right. I agree that type t[D] is isomorphic with () -> t[D], because it is easy enough to come up with an isomorphism f where (f (f_inverse value)) = value, and also (f_inverse (f (f_inverse value))) = value.
f::forall level.(()->t[level]) -> t[level] f dist = dist ()
f::forall level.t[level] -> () -> t[level] f_inverse value = \() -> value
But the whole point of having t[N] is that ()->t[N] should not be isomorphic to t[N]. Thus, if we evaluate let {x = f (f_inverse E); y = f (f_inverse E) in F} where E is non-deterministic, then I think x and y can be different. With my code above, the types are isomorphic:
toKleisli :: DN m x -> KleisliDN m () x toKleisli (Det x) = D (const x) toKleisli (Nondet mx) = N (const mx) fromKleisli :: KleisliDN m () x -> DN m x fromKleisli (D f) = Det (f ()) fromKleisli (N f) = Nondet (f ()) () -> t[N] takes a unit and returns a process randomly generating a t. Since there is only one unit, there is only one process described by this function. I've seen functions of type () -> t in OCaml where they are used to bring lazy evaluation into an otherwise strict language.
I think this means that f is not an isomorphism when applied to non-deterministic expression, so that you should say "disregarding non-terminating or non-deterministic" computations.
There are some complications here, in that I am quantifying over [N,D] levels in the definitions of f and f_inverse, so that they take their [N,D] levels from their arguments. I am treating variables as non-values, since they stand for entire expressions that might be substituted for them. My hope is that this allows placing the [N,D] levels on the input and result types instead of on the arrow, but there could be problems with this. It is counter-intuitive, since normal 0 1 () could yield the value 2, and 2 itself is not random, but was obtained randomly. But I think that the system works if implemented, though it might not match the standard interpretation of types? I don't get all of what you are saying here. But it reminds me of monadic bind. When you say:
do x <- normal 0 1 if x == 2 then return True else return False then the resulting Boolean is still in the monad, and the type of the overall expression tells you that the 2 is a random one.
choose :: I -> a -> a -> a
where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument. I'm not completely sure of the role of the real number here. Are we saying that 'choose 0.6 0 1' would (for example) return 0 with probability 0.6 and 1 with probability 0.4?
Exactly. And that is all you need for probabilistic choice, in the following categorical sense. The probabilistic powerdomain of a domain D is the free convex cone over D. A convex cone is a structure that supports the operation choose and multiplication by real numbers from the unit interval [0,1]. (Here we allow total probabilities less than 1, otherwise we don't have a domain. The true probability distributions sit at the top of this domain. The free convex cone over the two-element set {x,y} is a triangle, with "certainly x" at the top left, "certainly y" at the top right, and the constant zero measure at the bottom corner.)
It seems that you could maybe define a function that takes 2 random numbers:
choose :: I -> I -> a -> a -> a
where (a) choose u p 0 1 would return 0 if u
IO (). This makes the program deterministic if we know the value of the real number, but probabilistic if we supply a uniform 0 1 random number. That sounds like PRNG to me. If you know the seed, you know the outcome.
However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things: (1) Define what the compiled program should do at runtime (operational semantics). (2) Define what the program means, mathematically (denotational semantics). Hmm.... yeah I think the denotational semantics could be quite hard. I think that Chung Chieh Shan is working on some aspect of this this, especially measures on R^n. That was my point: With monads, the denotational semantics is easy. The operational semantics is hard.
For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain". Hmm... I'm guessing (having not looked at the papers below yet except kind of the Haraku paper) that the powerdomain for Double is the measurable sets (aka Borel sets) for the Reals... No, the probabilistic powerspace of D is the set of all measures on the Borel sigma-algebra of D, ordered pointwise. That is, f <= g if f(A) <= g(A) for all measurable sets A. The beautiful thing about this is that the monadic bind is precisely integration against a measure.
There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't. Interesting. I'll take a look at the papers you mention. I have wondered where some of the restrictions come from. It's whack-a-mole of the categorical kind. There are various subcategories of domains. For example, all Haskell types map to domains that have the additional property "bifinite". One can show that the probabilistic powerdomain of any domain is a domain, but the probabilistic powerdomain of a bifinite domain is never bifinite. Hence there can be no Haskell type that faithfully represents all the random computations of a given type. It is known that the probabilistic powerdomain of a "continuous" domain is again continuous. Sadly, the continuous domains don't have function spaces that are continuous. Thus you get a semantics for a language with choice, but where functions are not first-class citizens.
Olaf
participants (2)
-
Benjamin Redelings
-
Olaf Klinke