[CC'ing some people involved in the relevant ticket discussions or showing up in git blame]
Hello all,
So it seems random was due for an overhaul. I would like to initiate a discussion period to talk about what changes should happen before the next major release. I think this is timely because there is
already a pending backwards-incompatible change in the API
(factoring out the SplittableGen class). We might as well make any other fixes now and make all the changes at once. For reference, I've attached a list of the open tickets I know about pertaining to System.Random at the end of this email.
There implementation has two major pieces:
(1) Sources of random bits (class RandomGen)
(2) Instances of class Random that map use random bits to create Haskell types
For now I've left (1) alone and have made heavy changes to part (2) in a branch to fix correctness and performance problems:
These new changes require a small API change to RandomGen. The issue is that the legacy RandomGen API isn't very good at generating random
bits. Rather, it
next creates numbers in an arbitrary range:
genRange. First of all, I would argue that -- completely aside from performance considerations -- this extra flexibility for RandomGens increases the likelihood of errors in the clients of the API. As supporting evidence I would cite tickets
#5278 and
#5280 -- it looks like even use of the API in System/Random.hs itself was incorrect!
My proposed API addition/restriction:
I propose that we add a "genBits" function that reports how many bits of randomness are created by a generator.
class RandomGen g where
...
genBits :: g -> Int
I added a default implementation of genBits that computes the answer based on genRange. This need not be a backwards-incompatible change.
What about generators that have a genRange which is not a power of two? We could consider making genBits return a Maybe value to distinguish these. But I would prefer that we
instead
restrict RandomGen instances, requiring that they generate a clean number of random bits. This requires that genRange be (0,2^N-1) or (-2^N,2^N-1). But we can tolerate slight deviations; for example, in the
new_api branch, stdGen has a genRange of (0,2^31 - 86). Probably close enough. (In any case, in the future I'd like to see stdGen replaced with something better anyway.)
An alternative would be to go even further and require every RandomGen instances to generate the full range of Ints. I think any future replacement for stdGen will probably meet this additional requirement, though I would be interested in important generators that do not, and in any references on the original API design of RandomGen. As long as we have genBits I don't feel strongly about it but am interested in other opinions.
Anyway, I've used genBits to reimplement the random and randomR methods. Let me know what you think. This new version should fix tickets
#5278,
#5280,
#5133. But give me a code review and help me head off future new bugs!
Regarding other tickets:
#3575 and
#3620 will have to wait for stdGen to get replaced. Whereas
#427 and
#2280 are complaints about the performance; they will also require a new stdGen to be adequately addressed. However, the current batch of changes
does change the performance of some Random instances significantly. (See appendix.) There's plenty of room for improvement, and in particular, the new random approach would work better if
#4102 were implemented in Data.Bits.
The excessive laziness problems are still there. (
#4218 is still unfixed.) Perhaps that's something we can fix in this audit.
I'd also be happy to hear other proposals for API changes and additions. System.Random can't depend on bytestring, so I doubt we want block RNG in the RandomGen class. I could, however, imagine having nextWord16, nextWord32, etc.
Something worthwhile if we're doing an API audit is to look at some of the other Hackage packages. Here are some that I am familiar with:
And some that I'm not:
Pardon the long email!
Cheers,
-Ryan
Appendix A: List of tickets
#5278 System.Random.randomIvalInteger makes invalid assumptions about RandomGen
#5280 System.Random commits (rand `mod` base) error.
#5133 Random instance for Float can generate values out of requested range
#4218 System.Random is way too lazy
#427 Random.StdGen slowness
#2280 randomR too slow
#3575 mkStdGen and split conspire to make some programs predictable
#3620 The seeds generated by split are not independent libraries/rando
#3352 Data.Word should define instances of Random for each type
Appendix B: Performance data
Here's an "S curve" showing the speedup or slowdown of the new code when generating different types of data with random or randomR (the R labeled ones are randomR). They range from from some massive speedups on floats and doubles as well as a hefty performance regression when Integer is restricted to a small range by randomR. (I'm looking into it.)
speedup of new over old:
new_api revision f85c6a55b731d5e380c1d6e880105c9933aa9d48
old, revision 130e421e912d394653a43c987be99
161.87843858371693 "Float"
120.29838090036381 "Float R"
54.66538723769602 "CDouble"
41.01849643134359 "CDouble R"
5.546651773234119 "Int"
5.250176145938512 "Integer"
2.4963231822924556 "Word16"
2.340365231384218 "Bool R"
2.2810769806267412 "Bool"
2.1576446209293216 "Double R"
2.059792819214518 "Double"
1.4283216783216783 "Integer R Big"
1.235562774008646 "Word16 R"
1.008346493029544 "stdGen/next"
0.713868706483777 "Int R"
0.6738365001610773 "Char R"
0.5974339051794343 "Char"
0.16546783206240073 "Integer R"
The baseline stdGen/next hasn't been changed. Thus it's "speedup" is 1.0.