Mysterious factorial - Haskell-Cafe - Haskell.org

newer
Preconditions as Typeclasses (Was:...

Mysterious factorial

older
Finally tagless and abstract...

Artyom Kazak

30 Dec 2009 30 Dec '09

10:57 a.m.

Why fact2 is quicker than fact?! fact2 :: Integer -> Integer fact2 x = f x y where f n e | n < 2 = 1 | e == 0 = n * (n - 1) | e > 0 = (f n (e `div` 2)) * (f (n - (e * 2)) (e `div` 2)) y = 2 ^ (truncate (log (fromInteger x) / log 2)) fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1) I tried to write tail-recursive fact, fact as "product [1..n]" - fact2 is quicker! fact2 1000000 == fact 1000000 - I tested.

Attachments:

attachment.html (text/html — 522 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Rafael Gustavo da Cunha Pereira Pinto

30 Dec 30 Dec

11:41 a.m.

fact2 is O(log n) while fact is O(n). fact2 is partitioning the problem. On Wed, Dec 30, 2009 at 08:57, Artyom Kazak wrote:

Why fact2 is quicker than fact?!

fact2 :: Integer -> Integer fact2 x = f x y where f n e | n < 2 = 1 | e == 0 = n * (n - 1) | e > 0 = (f n (e `div` 2)) * (f (n - (e * 2)) (e `div` 2)) y = 2 ^ (truncate (log (fromInteger x) / log 2))

fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1)

I tried to write tail-recursive fact, fact as "product [1..n]" - fact2 is quicker!

fact2 1000000 == fact 1000000 - I tested.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Rafael Gustavo da Cunha Pereira Pinto

Reply

Sign in to reply online Use email software

Ralf Hinze

12:13 p.m.

fact2 is O(log n) while fact is O(n). fact2 is partitioning the problem.

But fact2 sparks off two recursive calls. If we assume that multiplication is a constant-time operations, both algorithms have the same asymptotic running time (try a version that operates on Ints). For Integers, this assumption is, of course, false ... Cheers, Ralf

On Wed, Dec 30, 2009 at 08:57, Artyom Kazak wrote:

...
Why fact2 is quicker than fact?!

fact2 :: Integer -> Integer fact2 x = f x y where f n e | n < 2 = 1

| e == 0 = n * (n - 1) | e > 0 = (f n (e `div` 2)) * (f (n - (e * 2)) (e `div` 2))

y = 2 ^ (truncate (log (fromInteger x) / log 2))

fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1)

I tried to write tail-recursive fact, fact as "product [1..n]" - fact2 is quicker!

fact2 1000000 == fact 1000000 - I tested.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

Rafael Gustavo da Cunha Pereira Pinto

1:28 p.m.

fact2 sparks 2*n multiplications for every (n^2) factors fact sparks n multiplications for every n factors On Wed, Dec 30, 2009 at 10:13, Ralf Hinze wrote:

...
fact2 is O(log n) while fact is O(n). fact2 is partitioning the problem.

But fact2 sparks off two recursive calls. If we assume that multiplication is a constant-time operations, both algorithms have the same asymptotic running time (try a version that operates on Ints). For Integers, this assumption is, of course, false ...

Cheers, Ralf

...
On Wed, Dec 30, 2009 at 08:57, Artyom Kazak wrote:

...
Why fact2 is quicker than fact?!

fact2 :: Integer -> Integer fact2 x = f x y where f n e | n < 2 = 1

| e == 0 = n * (n - 1) | e > 0 = (f n (e `div` 2)) * (f (n - (e * 2)) (e `div` 2))

y = 2 ^ (truncate (log (fromInteger x) / log 2))

fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1)

I tried to write tail-recursive fact, fact as "product [1..n]" - fact2 is quicker!

fact2 1000000 == fact 1000000 - I tested.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Rafael Gustavo da Cunha Pereira Pinto

Reply

Sign in to reply online Use email software

Ralf Hinze

2:37 p.m.

fact2 sparks 2*n multiplications for every (n^2) factors

fact sparks n multiplications for every n factors

Okay, let's count:

data Tree a = Leaf a | Fork (Tree a) (Tree a) deriving (Show)

fact 1 = Leaf 1 fact n = Leaf n `Fork` fact (n - 1)

fact2 x = f x y where f n e | n < 2 = Leaf 1 | e == 0 = Leaf n `Fork` Leaf (n - 1) | e > 0 = (f n (e `div` 2)) `Fork` (f (n - (e * 2)) (e `div` 2)) y = 2 ^ (truncate (log (fromInteger x) / log 2))

size (Leaf a) = 0 size (Fork l r) = 1 + size l + size r

The code now creates a binary tree (instead of performing a multiplication). Main> size (fact 1000000) 999999 Main> size (fact2 1000000) 1000008 Convinced? Cheers, Ralf

Reply

Sign in to reply online Use email software

Roel van Dijk

4:28 p.m.

I can't offer much insight but could the answer lie in the Integer type? I suspect that with a sufficiently large fixed Int type (2^14 bits?) the performance of the two functions would be almost equal. Could it be that the second function delays the multiplication of large numbers as long as possible? The divide-and-conquer approach?

Reply

Sign in to reply online Use email software

Daniel Fischer

4:40 p.m.

Am Mittwoch 30 Dezember 2009 17:28:37 schrieb Roel van Dijk:

I can't offer much insight but could the answer lie in the Integer type? I suspect that with a sufficiently large fixed Int type (2^14 bits?) the performance of the two functions would be almost equal.

For fact (10^6), we'd need rather 2^24 bits for the performance to be comparable. Imagine such an ALU :D

Could it be that the second function delays the multiplication of large numbers as long as possible? The divide-and-conquer approach?

Exactly.

Reply

Sign in to reply online Use email software

Ralf Hinze

5:35 p.m.

As an aside, in one of my libraries I have a combinator for folding a list in a binary-subdivision scheme.

foldm :: (a -> a -> a) -> a -> [a] -> a foldm (*) e x | null x = e | otherwise = fst (rec (length x) x) where rec 1 (a : as) = (a, as) rec n as = (a1 * a2, as2) where m = n `div` 2 (a1, as1) = rec (n - m) as (a2, as2) = rec m as1

Then factorial can be defined more succinctly

factorial n = foldm (*) 1 [1 .. n]

Cheers, Ralf

Reply

Sign in to reply online Use email software

Luke Palmer

6:10 p.m.

On Wed, Dec 30, 2009 at 10:35 AM, Ralf Hinze wrote:

As an aside, in one of my libraries I have a combinator for folding a list in a binary-subdivision scheme.

...
foldm :: (a -> a -> a) -> a -> [a] -> a

I would use: foldm :: Monoid m => [m] -> m Which is just a better implementation of mconcat / fold. The reason I prefer this interface is that foldm has a precondition in order to have a simple semantics: the operator you're giving it has to be associative. I like to use typeclasses to express laws. You don't need to prove anything to use a function, but you do often need to prove something to make an instance of a typeclass. Fortunately Monoid already has what we need.

...
foldm (*) e x | null x = e | otherwise = fst (rec (length x) x) where rec 1 (a : as) = (a, as) rec n as = (a1 * a2, as2) where m = n `div` 2 (a1, as1) = rec (n - m) as (a2, as2) = rec m as1

Then factorial can be defined more succinctly

...
factorial n = foldm (*) 1 [1 .. n]

Cheers, Ralf _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

Ralf Hinze

6:15 p.m.

I would use:

foldm :: Monoid m => [m] -> m

Which is just a better implementation of mconcat / fold. The reason I prefer this interface is that foldm has a precondition in order to have a simple semantics: the operator you're giving it has to be associative. I like to use typeclasses to express laws.

That's a valid point. Unfortunately, Haskell is not Coq, so there is no guarantee that the monoid laws are actually satisfied. And using a type-class has the downside that you can have only one instance per type. Maybe, you want to use both `foldm (+) 0' and `foldm (*) 1'. So the bottom-line is that you should have both versions. Cheers, Ralf

Reply

Sign in to reply online Use email software

Daniel Fischer

4:34 p.m.

Am Mittwoch 30 Dezember 2009 11:57:28 schrieb Artyom Kazak:

Why fact2 is quicker than fact?!

fact2 :: Integer -> Integer fact2 x = f x y where f n e | n < 2 = 1

| e == 0 = n * (n - 1) | e > 0 = (f n (e `div` 2)) * (f (n - (e * 2)) (e `div` 2))

y = 2 ^ (truncate (log (fromInteger x) / log 2))

fact :: Integer -> Integer fact 1 = 1 fact n = n * fact (n - 1)

I tried to write tail-recursive fact, fact as "product [1..n]" - fact2 is quicker!

fact2 1000000 == fact 1000000 - I tested.

If you follow the evaluation of fact2, it is basically the same as fact3 n = binaryMult [1 .. n] where binaryMult [p] = p binaryMult xs = binaryMult (pairMult xs) pairMult (x:y:zs) = x*y : pairMult zs pairMult xs = xs , just without the list construction, but with a few more ones [aside: You should subtract one from the exponent of y. As it is, in the first call to f, the second factor is always 1 because x < 2*y. Doesn't make much of a difference regarding performance, but it seems cleaner.]. Perhaps fact3 is a little easier to follow. Looking at fact3 (2^k), we see that in the first iteration of binaryMult, 2^(k-1) products of small integers (<= k+1 bits) are carried out. These multiplications are fast. In the second iteration, we have 2^(k-2) products of still small integers (<= 2*k bits). These multiplications are a tad slower, but still fast. In the third iteration, we have 2^(k-3) products of integers of (<= k*2^2 bits) and so on. We see that the overwhelming majority of the 2^k-1 multiplications carried out don't involve huge numbers and thus are relatively fast (for k = 32, no multiplication in the first five iterations involves a number with more than 1000 bits, so no more than 3% of the multiplications involve large numbers; for k = 20, the first product with more than 100 bits is produced in the sixth iteration, so less than 20000 multiplications involve a number with more than 1000 bits, less than 1000 multiplications have a factor with more than 10000 bits, less than 128 have a factor with more than 100000 bits). If the factorial is computed sequentially, like fact0 n = foldl' (*) 1 [2 .. n] -- or product [2 .. n], just don't build huge thunks like in fact above , you have many multiplications involving one huge number (and one small), since fact k has of the order of k*log k bits. 1000! has about 8500 bits, 10000! has about 120000 bits and 100000! has about 1.5 million bits. So that way, computing the factorial of 2^20 needs more than 990000 multiplications where one factor has more than 100000 bits and over 900000 multiplications where one factor has more than one million bits. Since multiplications where one factor is huge take long, even if the other factor is small, we see why a sequential computation of a factorial is so much slower than a tree- like computation.

Reply

Sign in to reply online Use email software

5671

Age (days ago)

5671

Last active (days ago)

Download

10 comments

6 participants

tags

participants (6)

Artyom Kazak
Daniel Fischer
Luke Palmer
Rafael Gustavo da Cunha Pereira Pinto
Ralf Hinze
Roel van Dijk