
Hi, while still working on optimizing (naively programmed) primefactors i watched a very strange behavior of ghc. The last version below takes 2.34 minutes on my system for computing 2^61+1 = 3*768614,336404,564651. Importing Data.Char without anywhere using it reduces this time to 1.34 minute - a remarkable speed up. System is WindowsXP on 2.2GHZ Intel, 512MB Ram. I give the complete code here - hopefully all tabs are (4) blanks. Can this be reproduced ? I compile just with --make -O2. module Main where import IO import System.Exit import Data.Char main = do hSetBuffering stdin LineBuffering putStrLn "Number to decompose ?" s <- getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound :: Integer, result :: [Integer]} intsqrt m = floor (sqrt $ fromInteger m) primefactors :: Integer -> [Integer] primefactors n | n<2 = [] | even n = o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[]) twosect :: (Integer,[Integer]) -> (Integer,[Integer]) twosect m |odd (fst m) = m |even (fst m) = twosect (div (fst m) 2, snd m ++ [2]) found :: DivIter -> DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x) d2 :: DivIter -> DivIter d2 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 2} |otherwise = found x d4 :: DivIter -> DivIter d4 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 4} |otherwise = found x d6 :: DivIter -> DivIter d6 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 6} |otherwise = found x divisions :: DivIter -> DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y) |divisor y <= bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise = y

On Sat, Dec 22, 2007 at 06:00:29PM +0000, Joost Behrends wrote:
Hi,
while still working on optimizing (naively programmed) primefactors i watched a very strange behavior of ghc. The last version below takes 2.34 minutes on my system for computing 2^61+1 = 3*768614,336404,564651. Importing Data.Char without anywhere using it reduces this time to 1.34 minute - a remarkable speed up. System is WindowsXP on 2.2GHZ Intel, 512MB Ram.
I give the complete code here - hopefully all tabs are (4) blanks. Can this be reproduced ? I compile just with --make -O2.
If you can reproduce it on your machine (rm <executable> *.o *.hi between tests for maximum reliability), it's definitely a bug. http://hackage.haskell.org/trac/ghc/wiki/ReportABug Stefan

Stefan O'Rear
If you can reproduce it on your machine (rm <executable> *.o *.hi between tests for maximum reliability), it's definitely a bug.
http://hackage.haskell.org/trac/ghc/wiki/ReportABug
Stefan
Yes, it was the same as before. Had i reboot meanwhile, because i was out for meal. 1.34 vs. 2.34 minutes are still the same times now, WITH deletion of the object files and the .exe between both compiles and runs. What is happening here ? Does importing Data.Char shadow (in a hidden way) some timeworn types or methods in the Prelude ? If so, ghc might be still faster than it looks now. Cheers, Joost

On Sat, Dec 22, 2007 at 07:26:09PM +0000, Joost Behrends wrote:
Stefan O'Rear
writes: If you can reproduce it on your machine (rm <executable> *.o *.hi between tests for maximum reliability), it's definitely a bug.
http://hackage.haskell.org/trac/ghc/wiki/ReportABug
Stefan
Yes, it was the same as before. Had i reboot meanwhile, because i was out for meal. 1.34 vs. 2.34 minutes are still the same times now, WITH deletion of the object files and the .exe between both compiles and runs.
What is happening here ? Does importing Data.Char shadow (in a hidden way) some timeworn types or methods in the Prelude ? If so, ghc might be still faster than it looks now.
There is absolutely no way this could be correct behavior. Stefan

Hi
What is happening here ? Does importing Data.Char shadow (in a hidden way) some timeworn types or methods in the Prelude ? If so, ghc might be still faster than it looks now.
There is absolutely no way this could be correct behavior.
Not true - importing an additional (but apparently unused) module can introduce new rules, pragmas and instances into scope, which can effect the optimisation. However, in this case, it looks like a bug - in either GHC or Joost's testing. Thanks Neil

Am Samstag, 22. Dezember 2007 19:00 schrieb Joost Behrends:
Hi,
while still working on optimizing (naively programmed) primefactors i watched a very strange behavior of ghc. The last version below takes 2.34 minutes on my system for computing 2^61+1 = 3*768614,336404,564651. Importing Data.Char without anywhere using it reduces this time to 1.34 minute - a remarkable speed up. System is WindowsXP on 2.2GHZ Intel, 512MB Ram.
I give the complete code here - hopefully all tabs are (4) blanks. Can this be reproduced ? I compile just with --make -O2.
I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute). Cheers, Daniel
module Main where
import IO import System.Exit import Data.Char
main = do hSetBuffering stdin LineBuffering putStrLn "Number to decompose ?" s <- getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main
data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound :: Integer, result :: [Integer]}
intsqrt m = floor (sqrt $ fromInteger m)
primefactors :: Integer -> [Integer] primefactors n | n<2 = []
| even n = o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res
where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[])
twosect :: (Integer,[Integer]) -> (Integer,[Integer]) twosect m |odd (fst m) = m
|even (fst m) = twosect (div (fst m) 2, snd m ++ [2])
found :: DivIter -> DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x)
d2 :: DivIter -> DivIter d2 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 2}
|otherwise = found x
d4 :: DivIter -> DivIter d4 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 4}
|otherwise = found x
d6 :: DivIter -> DivIter d6 x |dividend x `mod` divisor x > 0 = x { divisor = divisor x + 6}
|otherwise = found x
divisions :: DivIter -> DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y)
|divisor y <= bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise = y
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Daniel Fischer
I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute).
Yes, Daniel the code is not completely correct. There is a known bug. Don't know the connection to your number, but it doesn't decompose 29*29*31*31 correctly (gives [29,29,961] neither 59*59*61*61 (gives [59*59*3721]). Difficult to find it, but that has nothing to do with the strange differences of speed. Cheers, Joost

Am Samstag, 22. Dezember 2007 20:48 schrieb Joost Behrends:
Daniel Fischer
writes: I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute).
Yes, Daniel
the code is not completely correct. There is a known bug. Don't know the connection to your number, but it doesn't decompose 29*29*31*31 correctly (gives [29,29,961] neither 59*59*61*61 (gives [59*59*3721]).
Perhaps Prelude Main> primefactors $ 7*17*29 [7,493] Prelude Main> primefactors $ 7*23*29 [7,667] Prelude Main> primefactors $ 7*11*23 [7,253] help you find the bug? If not, ask and ye shall be answered.
Difficult to find it, but that has nothing to do with the strange differences of speed.
True, and I have no idea how that could come.
Cheers, Joost
Cheers, Daniel

Daniel Fischer
I can't reproduce it, both run in 130s here (SuSE 8.2, 1200MHz Duron). However, it's running over 30 minutes now trying to factorise 2^88+1 without any sign of approaching success, which suggests your code has a bug (the factorization is [257,229153,119782433,43872038849], so even a naive approach shouldn't take much longer than a minute).
I have found the problem: We must possibly work recursive on a found factor. This was done in former versions, but got lost when isolating the function "found". Here is a corrected version - complete again for reproducing easily the strange behavior with Data.Char. It decomposes 2^88+1 in 13 seconds. module Main where import IO import System.Exit --import Data.Char main = do hSetBuffering stdin LineBuffering putStrLn "Number to decompose ?" s <- getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound :: Integer, result :: [Integer]} intsqrt m = floor (sqrt $ fromInteger m) primefactors :: Integer -> [Integer] primefactors n | n<2 = [] | even n = o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[]) twosect :: (Integer,[Integer]) -> (Integer,[Integer]) twosect m |odd (fst m) = m |even (fst m) = twosect (div (fst m) 2, snd m ++ [2]) found :: DivIter -> DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x) d2 :: DivIter -> DivIter d2 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 2} |otherwise = d2$found x d4 :: DivIter -> DivIter d4 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 4} |otherwise = d4$found x d6 :: DivIter -> DivIter d6 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 6} |otherwise = d6$found x divisions :: DivIter -> DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y) |divisor y <= bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise = y And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char. Hmmm ... Cheers, Joost

Am Samstag, 22. Dezember 2007 21:28 schrieb Joost Behrends: Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m | r == 0 = go q (m+1) | otherwise = (n,m) where (q,r) = n `divMod` p and then leaving out the recursive call in d2 etc. For a measurable difference, you'd need a number with some high prime powers as factors, but still, saves some work even for squares.
I have found the problem: We must possibly work recursive on a found factor. This was done in former versions, but got lost when isolating the function "found". Here is a corrected version - complete again for reproducing easily the strange behavior with Data.Char. It decomposes 2^88+1 in 13 seconds.
module Main where
import IO import System.Exit --import Data.Char
main = do hSetBuffering stdin LineBuffering putStrLn "Number to decompose ?" s <- getLine if s == [] then exitWith ExitSuccess else do putStrLn (show$primefactors$read s) main
data DivIter = DivIter {dividend :: Integer, divisor :: Integer, bound :: Integer, result :: [Integer]}
intsqrt m = floor (sqrt $ fromInteger m)
primefactors :: Integer -> [Integer] primefactors n | n<2 = []
| even n = o2 ++ (primefactors o1) | otherwise = if z/=1 then result res ++[z] else result res
where res = divisions (DivIter {dividend = o1, divisor = 3, bound = intsqrt(o1), result = o2}) z = dividend res -- is 1 sometimes (o1,o2) = twosect (n,[])
twosect :: (Integer,[Integer]) -> (Integer,[Integer]) twosect m |odd (fst m) = m
|even (fst m) = twosect (div (fst m) 2, snd m ++ [2])
found :: DivIter -> DivIter found x = x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]} where xidiv = (dividend x) `div` (divisor x)
d2 :: DivIter -> DivIter d2 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 2}
|otherwise = d2$found x
d4 :: DivIter -> DivIter d4 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 4}
|otherwise = d4$found x
d6 :: DivIter -> DivIter d6 x |dividend x `mod` divisor x > 0 = x {divisor = divisor x + 6}
|otherwise = d6$found x
divisions :: DivIter -> DivIter divisions y |or[divisor y == 3, divisor y == 5] = divisions (d2 y)
|divisor y <= bound y = divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) |otherwise = y
And now it uses also 1.34 minutes for 2^61+1 without importing Data.Char. Hmmm ...
Cheers, Joost
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S

On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote:
I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S
The Prelude imports GHC.List, iirc. Stefan

Here's the Prelude imports I see at the moment. Didn't chase down the
dependencies in all the code initially and now I see that GHC.Show does
import GHC.List. Still, I suspect this has something to do with fusion
nonetheless.
#ifdef __GLASGOW_HASKELL__
import GHC.Base
import GHC.IOBase
import GHC.Exception
import GHC.Read
import GHC.Enum
import GHC.Num
import GHC.Real
import GHC.Float
import GHC.Show
import GHC.Err ( error, undefined )
#endif
(from
http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Prelude.html)
--s
On Dec 22, 2007 4:44 PM, Stefan O'Rear
On Sat, Dec 22, 2007 at 04:40:00PM -0500, Sterling Clover wrote:
I'm curious if you get the same performance difference importing GHC.Listinstead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas that would probably optimize your (++) usage. If this is the case, not sure if its a bug or not, but all this will have to be thought through as more stream fusion is rolled out anyway, I suspect? --S
The Prelude imports GHC.List, iirc.
Stefan
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHbYVTFBz7OZ2P+dIRAi02AJ41CyIVwCRLH2MU51Sc8Rjrtgxy+ACeL1m8 F2a0Id2PErsKgjOyggkT8Ig= =T0A3 -----END PGP SIGNATURE-----

I'm curious if you get the same performance difference importing GHC.List instead of Data.Char? I chased some dependencies, and Data.Char imports GHC.Arr, which in turn imports GHC.List, which provides a bunch of fusion rules pragmas
Sterling Clover
Yes - the same difference: 1.33 minutes vs. 2.30 now. I was near at reporting this as a bug, but rejected that idea. What does "bug" mean here ? I am really a rookie at Haskell - this working on primefactors is nothing but an excercise (however i casually try number theoretic problems and often missed a program like this). But - as it appears to me - the dramatic advance in speed ghc has made recently is to a great extent due to improved types and their methods. What i see at haskell.org/ghc/docs/latest/html/libraries looks like huge road works to me. And then it is not a "bug", if elder libraries as the Prelude perhaps are not completely up to date. Another problem is, that my program was not completely correct. But it didn't crash and got most numbers decomposed correct. But the strange behavior disappeared with a correct version. Perhaps the Prelude will get better now. Cheers, Joost

Hi
Yes - the same difference: 1.33 minutes vs. 2.30 now.
I was near at reporting this as a bug, but rejected that idea. What does "bug" mean here ?
If it can be reproduced on anyones machine, it is a bug. If you can bundle up two programs which don't read from stdin (i.e. no getLine calls) or the standard arguments (i.e. getArgs) which differ only by the Data.Char import, and have massive speed difiference, then report a bug. You should probably also give your GHC versions and platforms etc. Thanks Neil

Neil Mitchell
If it can be reproduced on anyones machine, it is a bug. If you can bundle up two programs which don't read from stdin (i.e. no getLine calls) or the standard arguments (i.e. getArgs) which differ only by the Data.Char import, and have massive speed difiference, then report a bug.
You should probably also give your GHC versions and platforms etc.
Thanks for your attention too ! Now i tried a version without input (just computing the primefactors of the constant 2^61+1, what it did correctly in spite of its bugginess). And it didn't have the Data.Char bug (and Data.List bug) too. As my original code hasn't on Linux also it seems. Thus it happens only in an exotic configuration. Windows will stay exotic in the Haskell community. Before should noone has reproduced it at least on Windows (XPpro SP2 is my version), i will do nothing more. The hardware is Intel Celeron 2.2GHZ, 512 MB Ram. ghc 6.8.1 lives on D:\\Programme (not on the system drive C:, which poses problems to Cabal, told aside). I just made the standard installation (do not remember, whether by unzipping alone or there was an MSI) not touching anything, of course. I was happy about no autoconf (which i see in the category of desasters like EMM386 for MS-DOS). But something is strange: ghci doesn't accept qualified imports. It produces a parse error. That seems a bug to me, because my ghci accepts "import (any module)" nevertheless. But i see it as a minor bug - the compiler is much more important. Happy days, Joost

Joost Behrends wrote:
Neil Mitchell
writes: If it can be reproduced on anyones machine, it is a bug. If you can bundle up two programs which don't read from stdin (i.e. no getLine calls) or the standard arguments (i.e. getArgs) which differ only by the Data.Char import, and have massive speed difiference, then report a bug.
You should probably also give your GHC versions and platforms etc.
Thanks for your attention too !
Now i tried a version without input (just computing the primefactors of the constant 2^61+1, what it did correctly in spite of its bugginess). And it didn't have the Data.Char bug (and Data.List bug) too.
As my original code hasn't on Linux also it seems.
Thus it happens only in an exotic configuration. Windows will stay exotic in the Haskell community. Before should noone has reproduced it at least on Windows (XPpro SP2 is my version), i will do nothing more.
The hardware is Intel Celeron 2.2GHZ, 512 MB Ram. ghc 6.8.1 lives on D:\\Programme (not on the system drive C:, which poses problems to Cabal, told aside).
Even if it only happens on Windows, if it isn't specific to your hardware then it could still be a bug. I have seen strange artifacts like this before that turned out to be caused by one of two things: - bad cache interactions, e.g. we just happen to have laid out the code in such a way that frequently accessed code sequences push each other out of the cache, or the relative position of the heap and stack have a bad interaction. This happens less frequently these days with 4-way and 8-way associative caches on most CPUs. - alignment issues, such as storing or loading a lot of misaligned Doubles in the second case, I've seen the same program run +/- 50% in performance from run to run, just based on random alignment of the stack. But it's not likely to be the issue here, I'm guessing. If it is code misalignment, that's something we can easily fix (but I don't *think* we're doing anything wrong in that respect). I have an Opteron box here that regularly gives +/- 20% from run to run of the same program with no other load on the machine. I have no idea why... Cheers, Simon

Simon Marlow wrote: ...
I have seen strange artifacts like this before that turned out to be caused by one of two things:
- bad cache interactions, e.g. we just happen to have laid out the code in such a way that frequently accessed code sequences push each other out of the cache, or the relative position of the heap and stack have a bad interaction. This happens less frequently these days with 4-way and 8-way associative caches on most CPUs.
- alignment issues, such as storing or loading a lot of misaligned Doubles
in the second case, I've seen the same program run +/- 50% in performance from run to run, just based on random alignment of the stack. But it's not likely to be the issue here, I'm guessing. If it is code misalignment, that's something we can easily fix (but I don't *think* we're doing anything wrong in that respect).
I have an Opteron box here that regularly gives +/- 20% from run to run of the same program with no other load on the machine. I have no idea why...
... This got me wondering how I could test for code misalignment problems. I expect there's a cleverer way, but how about a single executable containing several copies of the same code to be tested and a loop that runs and times the different copies. A consistently higher or lower runtime from one copy would indicate a misalignment problem. (I'm assuming the different copies of the code would probably load at fairly random alignments, random padding could be added.) It might have to run the copies in a different order each time round the loop to avoid the possibility of external periodic events affecting a particular copy. Richard.

Daniel Fischer
Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like
found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m | r == 0 = go q (m+1) | otherwise = (n,m) where (q,r) = n `divMod` p
True - but be aware, that this will slightly slow down the computation for
not multiple factors. And - as you recently noted - the really expensive
part are all the tried factors, which do not divide the queried number.
All this is just a first approach to the problem. When i talk of "naively
programmed", then i want to say, that number theorists might have much better
numerical orders marching through all primes plus some more odd numbers.
I didn't search for that on the net.
The last version was some kind of resign from tries like this:
firstPrimes = [3,5,7,11,13,17]
start = last firstPrimes
pac = product firstPrimes
slen = length lsumds
lsumds = drop 1 (fst$getSummands (singleton start, start)) where
getSummands :: (Seq Int, Int) -> (Seq Int, Int)
getSummands r |snd r < bnd = getSummands ((fst r)|>k, snd r + k)
|otherwise = r
where
bnd = 2*pac + start
k = getNext (snd r)
getNext n |and [(n+2)`mod`x>0 | x<-firstPrimes] = 2
|otherwise = 2 + getNext (n+2)
smallmod :: Int -> Int -> Int
smallmod n m | n

Am Samstag, 22. Dezember 2007 22:57 schrieb Joost Behrends:
Daniel Fischer
writes: Of course, one minute after I sent my previous mail, I receive this one :( However, one point, it might be faster to factor out all factors p in found and only then compute the intsqrt, like
found x = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (xstop,k) = go (dividend x) 0 go n m
| r == 0 = go q (m+1) | otherwise = (n,m)
where (q,r) = n `divMod` p
True - but be aware, that this will slightly slow down the computation for not multiple factors.
Why? You have to do one more division than the power of the prime dividing anyway. Or are you thinking about 'replicate 1 p' for simple factors? That will indeed be a bit slower than [p], but I dare say factorise a number with at least two prime squares or one prime cube dividing it, and you gain.
And - as you recently noted - the really expensive part are all the tried factors, which do not divide the queried number.
Yes, for anything with sufficiently many small primes not dividing but one or more large, anything you do with the prime factors is peanuts. Okay, make it (xstop,k) = go (dividend x `quot` p) 1, and replace divMod with quotRem. then you still have one superfluous division, but that's your making, not mine: d2 x | dividend x `mod` divisor x > 0 = | otherwise = make it d2 x | r == 0 = x{divisor = divisor x + 2} | otherwise = x{dividend = xstop, bound = intsqrt xstop, result = result x ++ replicate k p} where p = divisor x (q,r) = dividend x `quotRem` p (xstop,k) = go q 1 go n m | r' == 0 = go q' (m+1) | otherwise = (n,m) where (q',r') = n `quotRem` p and you have no superfluous division, exactly one intsqrt per prime dividing your number.
All this is just a first approach to the problem. When i talk of "naively programmed", then i want to say, that number theorists might have much better numerical orders marching through all primes plus some more odd numbers. I didn't search for that on the net.
A simple fast method, although not guaranteed to succeed is Pollard's rho-method, preceded by trial division by small primes (up to 1,2,10 million or so, depending on what you have) and a good pseudo-primality test (google for Rabin-Miller, if you're interested). I haven't but skimmed your code below, but I think the idea is something like factor :: Integer -> [Integer] factor 0 = error "0 has no decomposition" factor 1 = [] factor n | n < 0 = (-1):factor (-n) | n < 4 = [n] | otherwise = go n srn tests where srn = isqrt n go m sr pps@(p:ps) | r == 0 = p:go q (isqrt q) pps | p > sr = if m == 1 then [] else [m] | otherwise = go m sr ps where (q,r) = m `quotRem` p tests = 2:3:5:7:11:scanl (+) 13 (tail $ cycle dlist) isqrt :: Integer -> Integer isqrt n | n < 10^60 = floor (sqrt $ fromInteger n) | otherwise = 10^20*isqrt (n `quot` 10^40) dlist :: [Integer] dlist = zipWith (-) (tail wheel) wheel smallPrimes :: [Integer] smallPrimes = [2,3,5,7,11] wheelMod :: Integer wheelMod = product smallPrimes wheel :: [Integer] wheel = [r | r <- [1,3 .. wheelMod+1] , all ((/= 0) . (mod r)) (tail smallPrimes)] Now, the construction of the wheel and the list of differences, dlist, is relatively time-consuming, but can be sped up by using a good sieving method (my choice would be using an array (STUArray s Int Bool)). Of course, when you go much beyond 13 with smallPrimes, you'll end up with a rather large dlist in your memory, but smallPrimes = [2,3,5,7,11,13,17] should still be okay, no indexing means it's probably faster than Seq. Cheers, Daniel
The last version was some kind of resign from tries like this:
firstPrimes = [3,5,7,11,13,17] start = last firstPrimes pac = product firstPrimes slen = length lsumds
lsumds = drop 1 (fst$getSummands (singleton start, start)) where getSummands :: (Seq Int, Int) -> (Seq Int, Int) getSummands r |snd r < bnd = getSummands ((fst r)|>k, snd r + k)
|otherwise = r
where bnd = 2*pac + start k = getNext (snd r) getNext n |and [(n+2)`mod`x>0 | x<-firstPrimes] = 2
|otherwise = 2 + getNext | (n+2)
smallmod :: Int -> Int -> Int smallmod n m | n
divstep :: (DivIter,Int) -> (DivIter, Int) divstep (x,n) | and [(fromInteger $ divisor x)
0] = (x {divisor = divisor x + 2}, n) | (fromInteger$divisor x) < start =
(x {dividend = xidiv, bound = intsqrt(xidiv), result = result x ++ [divisor x]}, n)
| ximod>0 =
(x {divisor = divisor x + toInteger (index lsumds n)}, smallmod (n+1) slen)
| otherwise = (x {dividend = xidiv,
bound = intsqrt(xidiv), result = result x ++ [divisor x]}, n) where (xidiv, ximod) = divMod (dividend x) (divisor x)
divisions :: (DivIter, Int) -> (DivIter, Int) divisions (y,n) | divisor y <= bound y = divisions (divstep (y,n))
| otherwise = (y,0)
Here the additions to divisor are taken from the sequence lsmnds (List of SuMaNDS) - the type Seq from Data.Sequence is faster with the function index than Data.List with !!. getSummands is a kind of reduced sieve of
Eratosthenes. The main improvement is the longest line: |ximod>0 = (x {divisor = divisor x + toInteger (index lsumds n)},
smallmod (n+1) slen)
I even considered converting lsmnds to ByteString and storing them - the build of lsmnds for firstPrimes = [3,5,7,11,13,17,19,23,29] (which already has some MB footprint) takes several minutes.
But we have to track the number of iteration we are in. And that eats up much more than the reduction of divisions for "failing" factors. The code works (called slightly modificated by primefactors), but needs 5.41 minutes for 2^61+1 :((. Also expensive might be the lookup in lsumds - the code gets even slower with longer lists for firstPrimes.
divisions (d6$d2$d6$d4$d2$d4$d2$d4 y) is derived from
lsmnds [3,5] = [4,2,4,2,4,6,2,6].
For me the whole matter is closed for now - the 1.34 minutes are no bad result. Amd anyway the code might represent a not too bad lower bound for efficiency of decomposing algorithms.
Auf Wiedersehen, Joost
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi again, Daniel cannot sleep tonight - perhaps from the feeling to loose too much time with these things. Yes - it's the wheel. And a dlist made from [3,5,7,11,13,17] was optimal in some of my experiments too. You will probably know it - but perhaps there are third-party readers: A last try to improve the "final" version was to replace Integer by Int64 (importing Prelude qualified). There was no difference ! That is good news and bad - the good news (and that's much more important), that Integer is absoulutely cleanly implemented. However, if this is so, i have little sympathy for being forced to use fromInteger and toInteger - for this special case i would prefer automatic coercion. And - sorry - cannot do other than seeing anything else as ill-advised dogmatism. But perhaps i am spoiled from Python :). I did this, because for secure decision about divisibility the program isn't useful beyond that. Thanks for that entry point: I took "google Rabin-Miller" in a comment of this last version. Happy days, Joost
participants (7)
-
Daniel Fischer
-
Joost Behrends
-
Neil Mitchell
-
Richard Kelsall
-
Simon Marlow
-
Stefan O'Rear
-
Sterling Clover