fastest Fibonacci numbers in the West

Inspired by a discussion on freenode #haskell, I tried to write the fastest Fibonacci number function possible, i.e. given a natural number input n to compute F_n. mlton seems to enjoy substantially better speed despite equivalent algorithms; it may be enlightening to determine the causes of this, at least for those concerned about performance and the inner workings of the Haskell runtime system(s). In general, I am not usually very concerned about performance, nor am I in this case. But it's something of a mindless microbenchmark or similar. The things I'd normally think of to speed this up would be getting some primitives to find the highest bit of an integer (floor . lg) and to test a given bit of an integer (something vaguely like fromEnum . fromIntegral . (`mod` 2) . (/2^k)), both in some lightweight O(1) with low-hidden-constant manner. This doesn't appear to be a factor in the testing I did, as they're largely 1:1 translations of each other. Still, such things would be useful in various other algorithms, of far greater importance than this one. For the moment, mlton-generated binaries crash computing fib (10^8-1), and there is a 6:1 speed difference for fib (10^7-1) between the two, where mlton-generated binaries take just under 1 minute, and ghc- generated binaries take just under 6 minutes. Anyway, thoughts on how to improve all this from the programmer's point of view, or otherwise explaining what's going on or ameliorating whatever effect is at work here would be appreciated. -- wli

Am Donnerstag, 27. Januar 2005 06:08 schrieb William Lee Irwin III:
Inspired by a discussion on freenode #haskell, I tried to write the fastest Fibonacci number function possible, i.e. given a natural number input n to compute F_n.
For the moment, mlton-generated binaries crash computing fib (10^8-1), and there is a 6:1 speed difference for fib (10^7-1) between the two, where mlton-generated binaries take just under 1 minute, and ghc- generated binaries take just under 6 minutes.
Obviously, your machine is significantly faster than mine. On mine, fib (10^6) takes a little under two minutes, fib (10^7-1) I ^C-ed after twenty. I think ,more readable than unfoldl f x = case f x of Nothing -> [] Just (u, v) -> unfoldl f v ++ [u] divs 0 = Nothing divs k = Just (uncurry (flip (,)) (k `divMod` 2)) would be unfoldl f x = case f x of Nothing -> [] Just (q,r) -> unfoldl f q ++ [r] divs 0 = Nothing divs k = Just (n `quotRem` 2) -- this has no influence on performance, since it's optimized anyway.
Anyway, thoughts on how to improve all this from the programmer's point of view, or otherwise explaining what's going on or ameliorating whatever effect is at work here would be appreciated.
I thought, I'd do it in the ring of integers in Q(sqrt(5)), code attached, this was a whiff faster for n = 700000 on my machine, a whiff slower for n = 10^6 -- any idea how that may come?
-- wli
Daniel

Am Donnerstag, 27. Januar 2005 06:08 schrieb William Lee Irwin III:
Inspired by a discussion on freenode #haskell, I tried to write the fastest Fibonacci number function possible, i.e. given a natural number input n to compute F_n. For the moment, mlton-generated binaries crash computing fib (10^8-1), and there is a 6:1 speed difference for fib (10^7-1) between the two, where mlton-generated binaries take just under 1 minute, and ghc- generated binaries take just under 6 minutes.
On Thu, Jan 27, 2005 at 03:26:39PM +0100, Daniel Fischer wrote:
Obviously, your machine is significantly faster than mine. On mine, fib (10^6) takes a little under two minutes, fib (10^7-1) I ^C-ed
~/tmp/fib $(( 10**6 - 1 )) 211.91s user 1.13s system 83% cpu 4:14.96 total In general the printed results and data structures are expected to be in a range where asymptotic behavior dominates. 10^5-1 seems to be the break-even point where more advanced algorithms in Haskell start overtaking naive implementations in lighter-weight runtime systems. This doesn't help much when usefully-advanced languages have sufficiently lightweight runtime systems. I'd almost expect the overhead of printing the result to dominate all this, but somehow it doesn't appear to be the deciding factor, perhaps because they all call gmp to do it. In the range where asymptotic behavior is expected to dominate, the numbers are huge data structures, and having more of them simultaneously live or using more arbitrary-precision temporaries is painful. On Thu, Jan 27, 2005 at 03:26:39PM +0100, Daniel Fischer wrote:
after twenty. I think ,more readable than unfoldl f x = case f x of Nothing -> [] divs 0 = Nothing divs k = Just (uncurry (flip (,)) (k `divMod` 2)) would be unfoldl f x = case f x of Nothing -> [] divs 0 = Nothing divs k = Just (n `quotRem` 2) -- this has no influence on performance, since it's optimized anyway.
unfoldl was intended to match unfoldr's calling convention verbatim, but produce a list in the reverse order, not really to be easy-to-use in this specific problem. Am Donnerstag, 27. Januar 2005 06:08 schrieb William Lee Irwin III:
Anyway, thoughts on how to improve all this from the programmer's point of view, or otherwise explaining what's going on or ameliorating whatever effect is at work here would be appreciated.
On Thu, Jan 27, 2005 at 03:26:39PM +0100, Daniel Fischer wrote:
I thought, I'd do it in the ring of integers in Q(sqrt(5)), code attached, this was a whiff faster for n = 700000 on my machine, a whiff slower for n = 10^6 -- any idea how that may come?
I suspect the divide-and-conquer scheme for exponentiation doesn't do bitreversal, which eliminates various redundant computations. -- wli
participants (2)
-
Daniel Fischer
-
William Lee Irwin III