ANN: random-fu 0.1.0.0

newer
A question on existential types...

mokus＠deepbondi.net

3 Jun 2010 3 Jun '10

1:34 p.m.

Announcing the 0.1.0.0 release of the "random-fu" library for random number generation[1]. This release hopefully stabilizes the core interfaces (those exported from the base module "Data.Random"). Warning to anyone upgrading from earlier releases: 'Discrete' has been renamed 'Categorical', the entropy source classes have been redesigned, and many things are no longer exported from the root module "Data.Random" (In particular, DevRandom - this is not available on windows, so it will likely move to its own package eventually so that client code dependencies on it will be made explicit). Unfortunately, Hackage appears to have choked on some of the package's dependencies (specifically, 2 dependencies also depend on time, and were built using different versions) so its documentation (which I put quite a bit of work into) is not displayed on the Hackage site. In the past I have dealt with that by uploading new versions with hacks to make sure the thing builds, but I really would rather not continue to do so. Is there any procedure by which I can request a manual rebuild of the package so that its documentation will be generated and displayed? Incidentally, this is a recurring problem I have run into several times for several packages. Can we *please* come up with a way for sdist or similar to just include pre-built documentation? Or if I were to spend some time working on such a thing, would it be accepted (assuming it was done up to all applicable standards of quality)? For now, I have added some pre-built haddock docs to the repository so that they may be browsed online[2] (if code.haskell.org ever starts responding to my HTTP requests. It's just not my day today, I guess). [1] http://hackage.haskell.org/package/random-fu-0.1.0.0 [2] http://code.haskell.org/~mokus/random-fu/doc/haddock/index.html

Show replies by date

Alexander Solla

3 Jun 3 Jun

9:58 p.m.

On Jun 3, 2010, at 6:34 AM, mokus@deepbondi.net wrote:

...

Announcing the 0.1.0.0 release of the "random-fu" library for random number generation[1]. This release hopefully stabilizes the core interfaces (those exported from the base module "Data.Random").

Great work, I'm upgrading now. The only feature suggestion I can suggest is the addition of a convolution operator to combine distributions (reified as RVar's in this implementation, though of course the difference between a random variable over a distribution and the distribution is rather thin)

mokus＠deepbondi.net

11:19 p.m.

...

The only feature suggestion I can suggest is the addition of a convolution operator to combine distributions (reified as RVar's in this implementation, though of course the difference between a random variable over a distribution and the distribution is rather thin)

I don't think I understand. My familiarity with probability theory is fairly light. Are you referring to the fact that the PDF of the sum of random variables is the convolution of their PDFs? If so, the sum of random variables can already be computed as "liftA2 (+) :: Num a => RVar a -> RVar a -> RVar a" since RVar is an applicative functor (or using liftM2 since it's also a monad). Or perhaps you mean an operator that would take, say, 2 values of the 'Uniform' data type and return an instance of the 'Triangular' type corresponding to the convolution of the distributions?

Alexander Solla

11:59 p.m.

On Jun 3, 2010, at 4:19 PM, mokus@deepbondi.net wrote:

...

I don't think I understand. My familiarity with probability theory is fairly light. Are you referring to the fact that the PDF of the sum of random variables is the convolution of their PDFs? If so, the sum of random variables can already be computed as "liftA2 (+) :: Num a => RVar a -> RVar a -> RVar a" since RVar is an applicative functor (or using liftM2 since it's also a monad).

Or perhaps you mean an operator that would take, say, 2 values of the 'Uniform' data type and return an instance of the 'Triangular' type corresponding to the convolution of the distributions?

I think I had something like the former in mind. I didn't realize liftA2/M2 would do it. When I did this last, I just wrote a monadic action to sample values from different RVars. I should learn the higher order monad functions. On the other hand, it might be kind of nice if RVar's knew which PDF they are over. It's hard for me to see how that would be done with Haskell.

mokus＠deepbondi.net

4 Jun 4 Jun

1:40 a.m.

...

On the other hand, it might be kind of nice if RVar's knew which PDF they are over. It's hard for me to see how that would be done with Haskell.

If anyone knows a way this could be done while still allowing general functions to be mapped over RVars, I'd love to hear about it. My suspicion though is that it is not possible. It would be a very similar problem to computing the inverse of a function since the PDF is a measure of the size of the preimage of an event in the probability space (if I'm putting all those words together correctly ;)).

Alexander Solla

5:19 a.m.

On Jun 3, 2010, at 6:40 PM, mokus@deepbondi.net wrote:

...

If anyone knows a way this could be done while still allowing general functions to be mapped over RVars, I'd love to hear about it. My suspicion though is that it is not possible. It would be a very similar problem to computing the inverse of a function since the PDF is a measure of the size of the preimage of an event in the probability space (if I'm putting all those words together correctly ;)).

We don't necessarily have to compute the inverse of the distribution via sampling to do it. It can be done algebraically, in terms of the convolution operator. Since the types are enumerated, wouldn't something like... work? -- A set and binary operation. We have an algebra. I like the J for 'join'. -- With this algebra, we can use the real-complex analytical methods to -- interpret the terms later, if we want to actually reify a Distribution -- instance as a "Real" (Float, Double) function.

...

data DistributionJ a = UniformDistribution Uniform a | ... | ExponentialDistribution Exponential a | DistributionJ a `Convolve` (DistributionJ a)

-- I hope I understand the semantics for the PromptT monad.

...

newtype RVarT m a = RVarT { unRVarT :: PromptT (Prim, DistributionJ) m a }

I guess threading fst and snd in all the low level computations is inelegant, but it's a step closer.

James Andrew Cook

11:29 a.m.

On Jun 4, 2010, at 1:19 AM, Alexander Solla wrote:

...

We don't necessarily have to compute the inverse of the distribution via sampling to do it. It can be done algebraically, in terms of the convolution operator. Since the types are enumerated, wouldn't something like... work?

-- A set and binary operation. We have an algebra. I like the J for 'join'. -- With this algebra, we can use the real-complex analytical methods to -- interpret the terms later, if we want to actually reify a Distribution -- instance as a "Real" (Float, Double) function.

...
data DistributionJ a = UniformDistribution Uniform a | ... | ExponentialDistribution Exponential a | DistributionJ a `Convolve` (DistributionJ a)

-- I hope I understand the semantics for the PromptT monad.

...
newtype RVarT m a = RVarT { unRVarT :: PromptT (Prim, DistributionJ) m a }

I guess threading fst and snd in all the low level computations is inelegant, but it's a step closer.

An algebraic solution would work as long as the user only made use of a very small set of supported operations. Unfortunately, even detecting whether they have done so in any implementation supporting Monad is not really feasible. The Functor/Applicative/Monad classes allow inclusion of arbitrary functions in the construction of an RVar, which is a very big part of the usefulness of the system. Even in a simple no-op defininion such as "fmap id x", fmap has no way of discovering whether the function is "id", "negate", "primes !!", or worse, so the only way (I believe) to effectively propagate the algebraic part would be to drop the Functor, Applicative and Monad instances and work with RVars solely via some combinator library. At that point, why not just work with the symbolic form directly and choose at the end whether to sample or compute a pdf? You could make DistributionJ an instance of Distribution to make it sampleable, and do precisely that. In fact, that would probably be a pretty nifty thing to do. -- James PS - the usage of DistributionJ in the prompt wouldn't quite work, because a "DistributionJ a" would just be a standin for a computation of type "m a" where "m" is the monad in which the variable is eventually sampled. There is no way to rediscover the particular constructor that was used for the prompt, for approximately the same reasons as above.

Richard O'Keefe

12:13 a.m.

There's something in that package that I don't understand, and I feel really stupid about this. data RVarT m a type RVar = RVarT Identity class Distribution d t where rvar :: d t -> RVar t rvarT :: d t -> RVarT n t Where does "n" come from?

wren ng thornton

2:03 a.m.

Richard O'Keefe wrote:

...

There's something in that package that I don't understand, and I feel really stupid about this.

data RVarT m a

type RVar = RVarT Identity

class Distribution d t where rvar :: d t -> RVar t rvarT :: d t -> RVarT n t

Where does "n" come from?

Presumably from universal quantification in rvarT? That is, the implementation of rvarT should be polymorphic in n, in which case the particular n doesn't matter (as well it shouldn't, since if it did that'd interfere with the composability of the transformer). Though, since RVar is a synonym for RVarT, I can't imagine why rvar is a method instead of a shorthand defined outside of the class. (If RVar were primitive then I could imagine performance reasons, but since it isn't...) -- Live well, ~wren

James Andrew Cook

11 a.m.

On Jun 3, 2010, at 10:03 PM, wren ng thornton wrote:

...

Though, since RVar is a synonym for RVarT, I can't imagine why rvar is a method instead of a shorthand defined outside of the class. (If RVar were primitive then I could imagine performance reasons, but since it isn't...)

The reason for this is purely a matter of convenience combined with a bit of historical accident and/or laziness. The RVar type predates the RVarT type, and thus rvar predates rvarT as well. My personal idioms for defining distributions were invented before RVarT and have been slow to change. These idioms involve defining the 'rvar' implementations in terms of the public interface of simpler distributions, which are mostly exporting plain 'RVar's in order to reduce the need for type annotations fixing 'n' elsewhere. In particular, functions such as 'uniform' and 'normal' which directly construct RVars are very useful in defining the rvar implementation of other types. I have been reluctant to drop the rvar function from the Distribution class because it is very useful to be able to define other Distribution instances in terms of these functions instead of the uglier explicit use of rvarT - e.g. "rvarT StdUniform" in place of just 'stdUniform'. Eventually I'll probably give up this particular objection – probably sooner rather than later now that you've made me think about it – but I'm presently in the state where I know it isn't "right" to have both (for some value of "right") but I don't know yet what a better solution is, given my conflicting objectives (one of which is to refrain from changing everything at once, so users have a chance to keep up with the library's evolution). Making the convenience functions mentioned above return RVarT seems natural, but when I tried it I found it made their usage require type annotations in many other places to fix the type of the underlying monad, and I have not yet decided whether it is worth all that. I may yet introduce separate RVarT-typed convenience functions as well, but I'm not sure I want to do that either. Suggestions are welcome :) -- James

wren ng thornton

5 Jun 5 Jun

1:42 a.m.

James Andrew Cook wrote:

...

In particular, functions such as 'uniform' and 'normal' which directly construct RVars are very useful in defining the rvar implementation of other types. I have been reluctant to drop the rvar function from the Distribution class because it is very useful to be able to define other Distribution instances in terms of these functions instead of the uglier explicit use of rvarT - e.g. "rvarT StdUniform" in place of just 'stdUniform'. Eventually I'll probably give up this particular objection – probably sooner rather than later now that you've made me think about it – but I'm presently in the state where I know it isn't "right" to have both (for some value of "right") but I don't know yet what a better solution is, given my conflicting objectives (one of which is to refrain from changing everything at once, so users have a chance to keep up with the library's evolution).

Oh I'm sure writing the rvar implementation is prettier. However, since clients have to define rvarT too, they can't escape the ugliness :) If RVar were a datatype, then I could imagine there being performance reasons for wanting to define them separately. However, since it isn't, it seems like having both methods in the class is just begging for code duplication and inconsistency bugs. Without the performance benefit, I can't see a reason for having both--- ---unless, perhaps, you have a way of deriving a definition of rvarT from rvar. If so, then there could be efficiency issues in the other direction. I could see some people just giving a pretty implementation of rvar and using the inefficient rvarT, whereas other people would put up with the ugly in order to have an efficient rvarT... (I haven't looked at the package to know if this is actually a possibility)

...

Making the convenience functions mentioned above return RVarT seems natural, but when I tried it I found it made their usage require type annotations in many other places to fix the type of the underlying monad, and I have not yet decided whether it is worth all that. I may yet introduce separate RVarT-typed convenience functions as well, but I'm not sure I want to do that either.

That's what I was going to propose :) You could define the RVarT versions, but then keep the RVar versions around for their restricted type. That would allow for moving rvar out of the class (eliminating possible code duplication bugs) and still allow for using uniformT, normalT, etc when defining rvarT. -- Live well, ~wren

James Andrew Cook

8 Jun 8 Jun

12:27 a.m.

On Jun 4, 2010, at 9:42 PM, wren ng thornton wrote:

...

---unless, perhaps, you have a way of deriving a definition of rvarT from rvar. If so, then there could be efficiency issues in the other direction. I could see some people just giving a pretty implementation of rvar and using the inefficient rvarT, whereas other people would put up with the ugly in order to have an efficient rvarT... (I haven't looked at the package to know if this is actually a possibility)

This is precisely the present state of affairs, and I'm probably the worst offender. In fact, most of the distributions provided in the library just use the default implementation of rvarT, which as you guessed has a bit of a penalty associated. I've had it in the back of my mind for quite a while to revisit that design, it just hasn't been enough of a priority to actually do it. I've been having more fun mucking about in the numerical code :)

...

...
Making the convenience functions mentioned above return RVarT seems natural, but when I tried it I found it made their usage require type annotations in many other places to fix the type of the underlying monad, and I have not yet decided whether it is worth all that. I may yet introduce separate RVarT-typed convenience functions as well, but I'm not sure I want to do that either.

That's what I was going to propose :)

You could define the RVarT versions, but then keep the RVar versions around for their restricted type. That would allow for moving rvar out of the class (eliminating possible code duplication bugs) and still allow for using uniformT, normalT, etc when defining rvarT.

Yea, now that you put that in words, I like the idea and see that it wouldn't be as disruptive as I subliminally assumed. I'll probably get to it sooner rather than later - thanks! -- James

mokus＠deepbondi.net

4 Jun 4 Jun

2:08 a.m.

...

There's something in that package that I don't understand, and I feel really stupid about this.

data RVarT m a

type RVar = RVarT Identity

class Distribution d t where rvar :: d t -> RVar t rvarT :: d t -> RVarT n t

Where does "n" come from?

There's no reason to feel stupid when faced with something unfamiliar. Even if you are familiar with monad transformers, this may not be a place you expect to find them, and 'n' in this case would usually be an 'm' in other places (it is the underlying monad being extended). Since I'm not sure at which level your unfamiliarity lies, I'll just give a from-scratch crash course. Feel free to ignore as much as is necessary, and please don't take this long-winded reply as any sort of condescension :). I'll refrain from introducing monads and monad transformers, as the internet is already full enough of those sorts of introductions. RVarT is a monad transformer that adds a source of "random" data to a preexisting monad, the latter being the role the 'n' serves in rvarT's type. RVar is just the "pure" version where the underlying monad (Identity) is sort of a type-level no-op. With that background in mind, the 2 methods of Distribution, rvar and rvarT, are exactly equivalent, just specialized so that the compiler can avoid unnecessary conversions in some cases. The types are even isomorphic, I believe, due to parametricity. Both methods take the distribution in question (the "d t") and make an "RVarT n t" that has that distribution (RVar is RVarT Identity, so n == Identity). The reason the type variable is 'n' instead of 'm' as is more traditional is related to the types of the function runRVarT and similar functions for sampling the RVars:

...

runRVarT :: (Lift n m, RandomSource m s) => RVarT n a -> s -> m a

This involves 2 monads, and 'n' was used for the second of them. For consistency, 'n' is often used as the name of the corresponding variable in type signatures using RVarT. In runRVarT's type, 'n' is the monad underlying the random variable and 'm' is the monad in which it is being sampled. They are allowed to differ so that random variables can be given more general types. If they had to be the same, the RVar would have to carry around the monad in which it would eventually be sampled (and would incidentally be granted access to all its capabilities via Control.Monad.Trans.lift, which would be undesirable). It would also restrict the monads in which the RVar could be sampled. With this scheme, one RVar/RVarT can be sampled in many monads if desired (and I have used this ability more than once in real code). Finally, some may still wonder why there is a monad transformer here at all - a plain RVar would already be sampleable in any monad that can feed it some random data. Originally that's what the library had, but a kind and perceptive contributor (Reiner Pope) rectified that. As a result, the same framework supports some really nifty tricks, most importantly the ability to define random processes reusing all the existing definitions of random variables.

5503

Age (days ago)

5508

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

Alexander Solla
James Andrew Cook
mokus＠deepbondi.net
Richard O'Keefe
wren ng thornton