Re: [Haskell-cafe] Re: Importing Data.Char speeds up ghc around 70%

22 Dec 2007

      Am Samstag, 22. Dezember 2007 22:57 schrieb Joost Behrends:
...
Daniel Fischer  writes:
...
Of course, one minute after I sent my previous mail, I receive this one
:( However, one point, it might be faster to factor out all factors p in
found and only then compute the intsqrt, like
found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++
replicate k p}
    where
      p = divisor x
      (xstop,k) = go (dividend x) 0
      go n m
| r == 0 	= go q (m+1)
      | otherwise = (n,m)
where
          (q,r) = n `divMod` p
True - but be aware, that this will slightly slow down the computation for
not multiple factors.
Why? You have to do one more division than the power of the prime dividing 
anyway. Or are you thinking about 'replicate 1 p' for simple factors? That 
will indeed be a bit slower than [p], but I dare say factorise a number with 
at least two prime squares or one prime cube dividing it, and you gain.
...
And - as you recently noted - the really expensive
part are all the tried factors, which do not divide the queried number.
Yes, for anything with sufficiently many small primes not dividing but one or 
more large, anything you do with the prime factors is peanuts.

Okay, make it (xstop,k) = go (dividend x `quot` p) 1, and replace divMod with 
quotRem. then you still have one superfluous division, but that's your 
making, not mine:

d2 x | dividend x `mod` divisor x > 0 =
     | otherwise = 

make it 

d2 x | r == 0	= x{divisor = divisor x + 2}
     | otherwise = x{dividend = xstop, bound = intsqrt xstop, result = result 
x ++ replicate k p}
	where
	  p = divisor x
	  (q,r) = dividend x `quotRem` p
	  (xstop,k) = go q 1
	  go n m
	    | r' == 0 = go q' (m+1)
	    | otherwise = (n,m)
	      where
		(q',r') = n `quotRem` p

and you have no superfluous division, exactly one intsqrt per prime dividing 
your number.
...
All this is just a first approach to the problem. When i talk of "naively
programmed", then i want to say, that number theorists might have much
better numerical orders marching through all primes plus some more odd
numbers. I didn't search for that on the net.
A simple fast method, although not guaranteed to succeed is Pollard's 
rho-method, preceded by trial division by small primes (up to 1,2,10 million 
or so, depending on what you have) and a good pseudo-primality test (google 
for Rabin-Miller, if you're interested).

I haven't but skimmed your code below, but I think the idea is something like

factor :: Integer -> [Integer]
factor 0 = error "0 has no decomposition"
factor 1 = []
factor n
    | n < 0     = (-1):factor (-n)
    | n < 4     = [n]
    | otherwise = go n srn tests
      where
        srn = isqrt n
        go m sr pps@(p:ps)
            | r == 0    = p:go q (isqrt q) pps
            | p > sr   = if m == 1 then [] else [m]
            | otherwise = go m sr ps
              where
                (q,r) = m `quotRem` p
        tests = 2:3:5:7:11:scanl (+) 13 (tail $ cycle dlist)

isqrt :: Integer -> Integer
isqrt n
    | n < 10^60 = floor (sqrt $ fromInteger n)
    | otherwise = 10^20*isqrt (n `quot` 10^40)

dlist :: [Integer]
dlist = zipWith (-) (tail wheel) wheel

smallPrimes :: [Integer]
smallPrimes = [2,3,5,7,11]

wheelMod :: Integer
wheelMod = product smallPrimes

wheel :: [Integer]
wheel = [r | r <- [1,3 .. wheelMod+1]
            , all ((/= 0) . (mod r)) (tail smallPrimes)]

Now, the construction of the wheel and the list of differences, dlist, is 
relatively time-consuming, but can be sped up by using a good sieving method 
(my choice would be using an array (STUArray s Int Bool)). Of course, when 
you go much beyond 13 with smallPrimes, you'll end up with a rather large 
dlist in your memory, but smallPrimes = [2,3,5,7,11,13,17] should still be 
okay, no indexing means it's probably faster than Seq.

Cheers,
Daniel
...
The last version was some kind of resign from tries like this:
firstPrimes = [3,5,7,11,13,17]
start = last firstPrimes
pac = product firstPrimes
slen = length lsumds
lsumds = drop 1 (fst$getSummands (singleton start, start)) where
    getSummands :: (Seq Int, Int) -> (Seq Int, Int)
    getSummands r |snd r < bnd    = getSummands ((fst r)|>k, snd r + k)
|otherwise      = r
where
            bnd = 2*pac + start
            k = getNext (snd r)
            getNext n |and [(n+2)`mod`x>0 | x<-firstPrimes] = 2
|otherwise                            = 2 + getNext
                      | (n+2)
smallmod :: Int -> Int -> Int
smallmod n m | n
divstep :: (DivIter,Int) -> (DivIter, Int)
divstep (x,n) | and [(fromInteger $ divisor x)0] =
                          (x {divisor = divisor x + 2}, n)
| (fromInteger$divisor x) < start =
(x {dividend = xidiv,
                                          bound = intsqrt(xidiv),
                                          result = result x ++ [divisor
x]}, n)
| ximod>0 =
(x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen)
| otherwise = (x {dividend = xidiv,
bound    = intsqrt(xidiv),
                                result   = result x ++ [divisor x]}, n)
    where
        (xidiv, ximod) = divMod (dividend x) (divisor x)
divisions :: (DivIter, Int) -> (DivIter, Int)
divisions (y,n) | divisor y <= bound y = divisions (divstep (y,n))
| otherwise            = (y,0)
Here the additions to divisor are taken from the sequence lsmnds (List of
SuMaNDS) - the type Seq from Data.Sequence is faster with the function
index than Data.List with !!. getSummands is a kind of reduced sieve of
Eratosthenes. The main improvement is the longest line:
|ximod>0 = (x {divisor = divisor x + toInteger (index lsumds n)},
smallmod (n+1) slen)
I even considered converting lsmnds to ByteString and storing them - the
build of lsmnds for firstPrimes = [3,5,7,11,13,17,19,23,29] (which already
has some MB footprint) takes several minutes.
But we have to track the number of iteration we are in. And that eats up
much more than the reduction of divisions for "failing" factors. The code
works (called slightly modificated by primefactors), but needs 5.41 minutes
for 2^61+1 :((. Also expensive might be the lookup in lsumds - the code
gets even slower with longer lists for firstPrimes.
divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) is derived from
lsmnds [3,5] = [4,2,4,2,4,6,2,6].
For me the whole matter is closed for now - the 1.34 minutes are no bad
result. Amd anyway the code might represent a not too bad lower bound for
efficiency of decomposing algorithms.
Auf Wiedersehen, Joost
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe