Dominic Steinitz <dominic@steinitz.org> writes:On 12 Sep 2021, at 13:00, haskell-cafe-request@haskell.org wrote:
In particular, I am a mathematician/statistician working in evolutionary
biology. I work with multivariate distributions (hardly any of those are readily
available on Hackage), I work with a lot of random numbers (the support for
random sampling is mediocre, at best; 'splitmix' is standard by now but not
supported by the most important statistics library of Haskell), I work with
numerical optimization (I envy Pythonians for their libraries, although I still
prefer Haskell because what I achieve, at least I get right), I work with Markov
chains (yes, I had to write my own MCMC library in order to run proper Markov
chains), I need to plot my data (there is no superb standard plotting library
available in Haskell). By now, I do maintain library packages providing answers
to some of these problems, but it was (and is) a lot of work.
I have to take issue with your statement about random sampling. I think we have a really good story with random numbers now. They are of high quality and fast. R and possibly Python and Julia by comparison still use Mersenne Twister, of lower
quality, slower and without a good story for generating independent sequences for parallel computations. I maintain random-fu (sampling from distributions) and using the new random number generator it is now several times (x4?) faster than it
was. Conceivably it could be made even faster.
Thank you for mentioning 'random-fu'. It makes me feel like wanting to changefrom using 'statistics' to 'random-fu'. I started using 'statistics' because Iliked (depended on?) the notion of a 'Distribution' which can be instance ofmany classes (but I just saw that this is also the case for 'random-fu', maybe Ioverlooked it). I liked that there is a distinction between discrete andcontinuous distributions, and that there are more statistical functionsavailable such as quantiles, and so on. The package 'statistics' only supportsrandom number generation using the Mersenne Twister. It also does not supportmultivariate distributions. Right now, I am considering changing to 'random-fu'.What also kept me from using 'random-fu' is the following sentence in thedescription of the package:"Quality is prioritized over speed, but performance is an important goal too."This sounds to me like 'random-fu' focuses on the generation ofcryptographically secure random numbers which is not what I need.
I think the original author meant they were not aiming for C like speed. The library certainly is not intended to generate crypto strength random numbers.
Here’s my take on what random-fu did:
- Provides an interface to "sources of entropy” so you can plug in any RNG and produce random values for various specified types.
- Provides a domain specific language so that you can manipulate random values using an early precursor of free monads (the prompt monad)
- Provides a way of sampling from distributions.
- Provides cumulative distribution functions and probability density functions (where they exist). I think this is a bit of a later addition and I would like it to be comparable to what R provides.
The new random interface means that (1) is no longer required. With new random (1.2) you can plug in your favourite RNG without having to add anything to random-fu (this was not the case e.g. for adding MWC previously).
Please give details on where you think we can improve and better still contribute your own improvements :-)
In my opinion it would be great to:- separate continuous from discrete distributions
Certainly possible but I am not sure of the benefits and what would it look like concretely?
- have one set of type classes used by 'random-fu' and 'statistics' (and all
other packages working with distributions)
I find this harder to visualise and what its consequences and benefits would be.
- implement more and multivariate distributions (I implemented the 'dirichlet'
distribution for 'statistics'; it is available on Hackage but it is not
completely finished, and I don't consider myself able enough to contribute to
core libraries yet; there is also 'random-fu-multivariate' but it only has the
multivariate normal distribution)
I created random-fu-multivariate with the intention of adding more multivariate distributions when I needed them - I haven’t thus far. It would be great if folks added to it. The reason to separate it from random-fu was that it relies on extra Haskell and external packages (LAPACK for Cholesky).
Please do contribute. My approach is to look at what R / Python / Julia have already done and read the old masters such as
http://www.eirene.de/Devroye.pdf I no longer feel totally confident that other programming language ecosystems have optimal implementations (vide Mersenne Twister).
In terms of MCMC, I think Jared Tobin wrote some libraries but I don’t think they are maintained. I maintain an SMC library but I don’t know how much use it gets. Tom Nielsen, Henrik Nilsson and I wrote Haskell “bindings” for Stan:
https://nottingham-repository.worktribe.com/output/1151875/getting-more-out-of-stan-some-ideas-from-the-haskell-bindings. It would be a lot of work to e.g. re-create Stan in Haskell natively.
I am aware of Jared Tobins packages! They are great entry points but notflexible enough for what I am doing. If you are interested, have a look at the'mcmc' package, which I am developing.Thank you, I didn't know about the Stan Haskell bindings and will have a look.
I agree about plotting but inline-r makes it possible to use ggplot in R via Haskell which makes things like drawing maps with reasonable projections relatively straightforward.
More generally, I think we have a good set of bindings for the ODE solver library SUNDIALS and also for other numeric libraries (e.g. LAPACK and BLAS). The problem we have is not enough hands working on such things.
I now sadly return to programming in Julia.
Thanks for you input!
PS - there is probably more I could say on numerical stuff in Haskell but the above already looks like “stream of consciousness”.
Dominic Steinitz
dominic@steinitz.org
http://idontgetoutmuch.org
Twitter: @idontgetoutmuch
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.