ANN: combinatorics

-------------------------------------------- -- combinatorics 0.1.0 -------------------------------------------- The combinatorics package offers efficient *exact* computation of common combinatorial functions like the binomial coefficients and factorial. (For fast *approximations*, see the math-functions package instead.) -------------------------------------------- -- Long Description -------------------------------------------- * Math.Combinatorics.Primes: provides the prime numbers via Runciman's lazy wheel sieve algorithm. Provided here since efficient algorithms for combinatorial functions often require prime numbers. The current version memoizes the primes as an infinite list CAF, which could lead to memory leaks in long-running programs with irregular access to large primes. I'm looking into a GHC patch to allow resetting individual CAFs from within compiled programs so that you can explicitly decide when to un-memoize the primes. (In GHCi when you reload a module all the CAFs are reset. However, there's no way to access this feature from within compiled programs as yet.) * Math.Combinatorics.Binomial: offers a fast computation of the binomial coefficients based on the prime factorization of the result. As it turns out, it's easier to compute the prime factorization of the answer than it is to compute the answer directly! And you don't even need the factorial function to do so. Albeit, with a fast factorial function, the naive definition of binomial coefficients gives this algorithm a run for its money. * Math.Combinatorics.Factorial: offers a fast computation of factorial numbers. As Peter Luschny comments[1], the factorial function is often shown as a classic example of recursive functions, like addition of Peano numbers, however that naive way of computing factorials is extremely inefficient and does a disservice to those learning about recursion. The current implementation uses the split-recursive algorithm which is more than sufficient for casual use. I'm working on implementing the parallel prime-swing algorithm, which should be a bit faster still. [1] http://www.luschny.de/math/factorial/FastFactorialFunctions.htm -------------------------------------------- -- Links -------------------------------------------- Homepage: http://code.haskell.org/~wren/ Hackage: http://hackage.haskell.org/package/combinatorics Darcs: http://community.haskell.org/~wren/combinatorics Haddock (Darcs version): http://community.haskell.org/~wren/combinatorics/dist/doc/html/combinatorics -- Live well, ~wren

* wren ng thornton
* Math.Combinatorics.Primes: provides the prime numbers via Runciman's lazy wheel sieve algorithm. Provided here since efficient algorithms for combinatorial functions often require prime numbers. The current version memoizes the primes as an infinite list CAF, which could lead to memory leaks in long-running programs with irregular access to large primes. I'm looking into a GHC patch to allow resetting individual CAFs from within compiled programs so that you can explicitly decide when to un-memoize the primes. (In GHCi when you reload a module all the CAFs are reset. However, there's no way to access this feature from within compiled programs as yet.)
Why not to make it more pure? That is, return a lazy list of Ints (but not a CAF), which user can throw away by the usual GC means? The functions from the other modules that use primes would have to be put in a Reader monad. That would make it a little bit more awkward to use, but personally I'd prefer that over memory leaks. -- Roman I. Cheplyaka :: http://ro-che.info/

On 1/29/12 5:48 AM, Roman Cheplyaka wrote:
* wren ng thornton
[2012-01-28 23:06:08-0500] * Math.Combinatorics.Primes: provides the prime numbers via Runciman's lazy wheel sieve algorithm. Provided here since efficient algorithms for combinatorial functions often require prime numbers. The current version memoizes the primes as an infinite list CAF, which could lead to memory leaks in long-running programs with irregular access to large primes. I'm looking into a GHC patch to allow resetting individual CAFs from within compiled programs so that you can explicitly decide when to un-memoize the primes. (In GHCi when you reload a module all the CAFs are reset. However, there's no way to access this feature from within compiled programs as yet.)
Why not to make it more pure? That is, return a lazy list of Ints (but not a CAF), which user can throw away by the usual GC means?
The functions from the other modules that use primes would have to be put in a Reader monad. That would make it a little bit more awkward to use, but personally I'd prefer that over memory leaks.
I'd also prefer a more pure solution, but I don't think that the Reader monad is the right tool here. I played around with that approach, but it requires extremely invasive changes to client code, obfuscating what should be simple mathematical formulae. And, it's too fragile, exposing the implementation in a way that breaks client code should I change a non-prime-using algorithm to a prime-using one, or vice versa. The fragility could be partially avoided by providing both prime-using and non-prime-using algorithms, but then that forces users to decide which one they want--- and if their only concern is performance, then they'd have to go through the code-breaking refactoring anyways, just to determine which is faster for their application. One alternative I'm exploring is, rather than (only) making the primes not a CAF, instead make the prime-using functions non-CAFs as well. That is, provide a makePrimeUsingFunctions function which returns a record/tuple of all the functions, sharing a stream of primes. This way, after allocating the functions, they can be used purely just as in the current model; and when the client wants the primes to be GCed, they can drop their references to the allocated functions which use those primes (allocating new functions later, if necessary). I've used that pattern before for similar resource issues, and it worked nicely. But how well it works depends a lot on the structure of the program needing those resources. In particular it can lead to needing to use the Reader monad (or similar) in order to pass around the functions, even though the functions themselves can be used purely. Unfortunately I don't have any large combinatorial programs on hand to assess whether this would be problematic in practice or not. -- Live well, ~wren

* wren ng thornton
On 1/29/12 5:48 AM, Roman Cheplyaka wrote:
* wren ng thornton
[2012-01-28 23:06:08-0500] * Math.Combinatorics.Primes: provides the prime numbers via Runciman's lazy wheel sieve algorithm. Provided here since efficient algorithms for combinatorial functions often require prime numbers. The current version memoizes the primes as an infinite list CAF, which could lead to memory leaks in long-running programs with irregular access to large primes. I'm looking into a GHC patch to allow resetting individual CAFs from within compiled programs so that you can explicitly decide when to un-memoize the primes. (In GHCi when you reload a module all the CAFs are reset. However, there's no way to access this feature from within compiled programs as yet.)
Why not to make it more pure? That is, return a lazy list of Ints (but not a CAF), which user can throw away by the usual GC means?
The functions from the other modules that use primes would have to be put in a Reader monad. That would make it a little bit more awkward to use, but personally I'd prefer that over memory leaks.
I'd also prefer a more pure solution, but I don't think that the Reader monad is the right tool here. I played around with that approach, but it requires extremely invasive changes to client code, obfuscating what should be simple mathematical formulae. And, it's too fragile, exposing the implementation in a way that breaks client code should I change a non-prime-using algorithm to a prime-using one, or vice versa. The fragility could be partially avoided by providing both prime-using and non-prime-using algorithms, but then that forces users to decide which one they want--- and if their only concern is performance, then they'd have to go through the code-breaking refactoring anyways, just to determine which is faster for their application.
Makes sense; but doesn't making the monad abstract and putting all functions in the monad address the fragility issue? -- Roman I. Cheplyaka :: http://ro-che.info/

On 1/30/12 3:54 PM, Roman Cheplyaka wrote:
Makes sense; but doesn't making the monad abstract and putting all functions in the monad address the fragility issue?
The primary issue with monads is that the syntax is extremely cumbersome for the expected use case. It'd be like paranoid C where, since order of evaluation is unspecified, all subexpressions are floated out into let-bindings. At that point (a) the verbosity is ugly, (b) the code is much harder to follow, and (c) it's all too easy to introduce errors where you use x instead of x' or the like. The semantic model of monads just isn't a good fit for this domain. There really aren't any side effects going on, there's no sequencing of actions, there's no "little language" that's being implemented,... I love me some monads and all, but they just don't fit here. -- Live well, ~wren

Hi, On 29.01.2012, at 23:30, wren ng thornton wrote:
On 1/29/12 5:48 AM, Roman Cheplyaka wrote:
* wren ng thornton
[2012-01-28 23:06:08-0500] Why not to make it more pure? That is, return a lazy list of Ints (but not a CAF), which user can throw away by the usual GC means?
The functions from the other modules that use primes would have to be put in a Reader monad. That would make it a little bit more awkward to use, but personally I'd prefer that over memory leaks.
I'd also prefer a more pure solution, but I don't think that the Reader monad is the right tool here. I played around with that approach, but it requires extremely invasive changes to client code, obfuscating what should be simple mathematical formulae. And, it's too fragile, exposing the implementation in a way that breaks client code should I change a non-prime-using algorithm to a prime-using one, or vice versa. The fragility could be partially avoided by providing both prime-using and non-prime-using algorithms, but then that forces users to decide which one they want--- and if their only concern is performance, then they'd have to go through the code-breaking refactoring anyways, just to determine which is faster for their application.
One alternative I'm exploring is, rather than (only) making the primes not a CAF, instead make the prime-using functions non-CAFs as well. That is, provide a makePrimeUsingFunctions function which returns a record/tuple of all the functions, sharing a stream of primes. This way, after allocating the functions, they can be used purely just as in the current model; and when the client wants the primes to be GCed, they can drop their references to the allocated functions which use those primes (allocating new functions later, if necessary).
A slight variation on that approach is to use implicit parameters to parameterize your functions by the primes. Something allong the following lines:
{-# LANGUAGE ImplicitParams, Rank2Types #-}
data PrimeTable -- abstract
withPrimes :: ((?primes :: PrimeTable) => a) -> a withPrimes e = e where ?primes = ...
factorial :: (?primes :: PrimeTable) => Integer -> Integer factorial = ...
example = withPrimes $ ... factorial n ….
This has the advantage that the user doesn't have to bring all the elemnts of your tuple/record into scope. And you can have two modules with an almost identical content, one uses the implicit primes argument and the other uses a global CAF for its primes. A user would only have to change his type signatures and strategically add/remove a call to withPrimes when switching. Cheers, Jean

On 1/31/12 8:58 AM, Jean-Marie Gaillourdet wrote:
A slight variation on that approach is to use implicit parameters to parameterize your functions by the primes. Something allong the following lines:
That is another option. However, implicit parameters are GHC-only and seldom used even in GHC. The RTS-hacking I mentioned in the announcement would also be GHC-only, which is part of the reason I'd prefer to find a non-cumbersome way of dealing with the issue purely. As it stands the library is Haskell98 (with some trivial CPP to make the Haddocks pretty) and it'd be nice to stay that way. -- Live well, ~wren

-------------------------------------------- -- exact-combinatorics 0.2.0 -------------------------------------------- I've chosen to rename the combinatorics package to exact-combinatorics to make it clearer what the purpose of the package is. Anyone repackaging things for individual distros should use the new package and not bother with the old one. Sorry for the noise and confusion. Along the way I've also moved the Math.Combinatorics.* modules to Math.Combinatorics.Exact.* so as not to claim the whole Math.Combinatorics namespace. The code is otherwise unchanged so far. The original announcement is below. On 1/28/12 11:06 PM, wren ng thornton wrote:
-------------------------------------------- -- combinatorics 0.1.0 --------------------------------------------
The combinatorics package offers efficient *exact* computation of common combinatorial functions like the binomial coefficients and factorial. (For fast *approximations*, see the math-functions package instead.)
-------------------------------------------- -- Long Description --------------------------------------------
* Math.Combinatorics.Primes: provides the prime numbers via Runciman's lazy wheel sieve algorithm. Provided here since efficient algorithms for combinatorial functions often require prime numbers. The current version memoizes the primes as an infinite list CAF, which could lead to memory leaks in long-running programs with irregular access to large primes. I'm looking into a GHC patch to allow resetting individual CAFs from within compiled programs so that you can explicitly decide when to un-memoize the primes. (In GHCi when you reload a module all the CAFs are reset. However, there's no way to access this feature from within compiled programs as yet.)
* Math.Combinatorics.Binomial: offers a fast computation of the binomial coefficients based on the prime factorization of the result. As it turns out, it's easier to compute the prime factorization of the answer than it is to compute the answer directly! And you don't even need the factorial function to do so. Albeit, with a fast factorial function, the naive definition of binomial coefficients gives this algorithm a run for its money.
* Math.Combinatorics.Factorial: offers a fast computation of factorial numbers. As Peter Luschny comments[1], the factorial function is often shown as a classic example of recursive functions, like addition of Peano numbers, however that naive way of computing factorials is extremely inefficient and does a disservice to those learning about recursion. The current implementation uses the split-recursive algorithm which is more than sufficient for casual use. I'm working on implementing the parallel prime-swing algorithm, which should be a bit faster still.
[1] http://www.luschny.de/math/factorial/FastFactorialFunctions.htm
-------------------------------------------- -- Links --------------------------------------------
Homepage: http://code.haskell.org/~wren/
Hackage: http://hackage.haskell.org/package/combinatorics
Darcs: http://community.haskell.org/~wren/combinatorics
Haddock (Darcs version):
http://community.haskell.org/~wren/combinatorics/dist/doc/html/combinatorics
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Live well, ~wren
participants (3)
-
Jean-Marie Gaillourdet
-
Roman Cheplyaka
-
wren ng thornton