
Hello, I am solving a problem of finding the largest prime factor of 600851475143 (Project Euler is great for learning new language!), and came with the following solution (it uses the most ineffective approach to finding prime numbers, however is able to factor the above number in fraction of second): factors :: Integer -> [Integer] factors n = doFactors n (eratosthenesFilter [1..n]) doFactors n primes | null newPrimes = [] | otherwise = let topPrime = head newPrimes in if topPrime == n then [topPrime] else topPrime : (doFactors (n `quot` topPrime) primes) where newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes eratosthenesFilter :: [Integer] -> [Integer] eratosthenesFilter [] = [] eratosthenesFilter (num : nums) | num == 1 = eratosthenesFilter nums | otherwise = num : (eratosthenesFilter $ doFilter num nums) where doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums What would you do different (including stylistic changes)? What are the general comments about efficiency (not of the algorithm, but of the implementation: for example, is it fine to use break at every invocation of doFactors?) and elegance of the solution? Thanks and regards, Sergey

On http://www.haskell.org/haskellwiki/Prime_numbers "primeFactors" should do what you want (although I don't like the the pattern matching on "1") Cheers Christian Sergey V. Mikhanov wrote:
Hello,
I am solving a problem of finding the largest prime factor of 600851475143 (Project Euler is great for learning new language!), and came with the following solution (it uses the most ineffective approach to finding prime numbers, however is able to factor the above number in fraction of second):
factors :: Integer -> [Integer]
factors n = doFactors n (eratosthenesFilter [1..n])
doFactors n primes | null newPrimes = [] | otherwise = let topPrime = head newPrimes in if topPrime == n then [topPrime] else topPrime : (doFactors (n `quot` topPrime) primes) where newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
eratosthenesFilter :: [Integer] -> [Integer]
eratosthenesFilter [] = [] eratosthenesFilter (num : nums) | num == 1 = eratosthenesFilter nums | otherwise = num : (eratosthenesFilter $ doFilter num nums) where doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
What would you do different (including stylistic changes)? What are the general comments about efficiency (not of the algorithm, but of the implementation: for example, is it fine to use break at every invocation of doFactors?) and elegance of the solution?
Thanks and regards, Sergey

Sergey V. Mikhanov wrote:
factors :: Integer -> [Integer] factors n = doFactors n (eratosthenesFilter [1..n])
doFactors n primes | null newPrimes = [] | otherwise = let topPrime = head newPrimes in if topPrime == n then [topPrime] else topPrime : (doFactors (n `quot` topPrime) primes) where newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
eratosthenesFilter :: [Integer] -> [Integer] eratosthenesFilter [] = [] eratosthenesFilter (num : nums) | num == 1 = eratosthenesFilter nums | otherwise = num : (eratosthenesFilter $ doFilter num nums) where doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
What would you do different (including stylistic changes)? What are the general comments about efficiency (not of the algorithm, but of the implementation: for example, is it fine to use break at every invocation of doFactors?) and elegance of the solution?
Stylistically, one usually uses shorter variable names in Haskell. Also, the guards in doFactors are better expressed as pattern matching and the if can be turned into guards. factors :: Integer -> [Integer] factors n = go n $ eratosthenes [2..n] where go n [] = [] go n (p:ps) | n `mod` p == 0 = p : go (n `div` p) ps | otherwise = go n ps eratosthenes :: [Integer] -> [Integer] eratosthenes [] = [] eratosthenes (p:ps) = p : erathostenes ps' where ps' = filter (\x -> x > p && (x `mod` p) /= 0) ps Other than that, efficiency is best understood as algorithmic efficiency; there are not straightforward "tweaks" that give you the warm fuzzy feeling of "writing efficient code". Regards, apfelmus -- http://apfelmus.nfshost.com

2009/3/10 Heinrich Apfelmus
Sergey V. Mikhanov wrote:
... some code ...
What would you do different (including stylistic changes)? What are the general comments about efficiency (not of the algorithm, but of the implementation: for example, is it fine to use break at every invocation of doFactors?) and elegance of the solution?
Stylistically, one usually uses shorter variable names in Haskell.
<beginner rant> Sometime too short peraphs? At least, this is one of the things that slows down my understanding of code posted on this list on or on various haskell tutorial. In any other language I know, programmers learn to give meaningful names to variable and functions, so when one reads a program, one can use the name to remember what the function does. Then one cames to haskell ... I guess the short names comes from mathematic background, but still ... haskell is already very succint - even more so when you use pointfree programming - and if one also uses names like a,e,i ( look at Array function definitions ), ... Rant apart, I notice that in my own excercises I tend to shorten names, so maybe there is a reason for that. Nevertheless readability tends to be a big issue in languages used in IT industry, and my feeling is that haskell tends to err on the laconic side of the balance. Out of curiosity, there is any reason why you called the auxiliary function 'go' ? Ciao ------- FB

"Francesco" == Francesco Bochicchio
writes:
Francesco> Nevertheless readability tends to be a big issue in Francesco> languages used in IT industry, and my feeling is that Francesco> haskell tends to err on the laconic side of the Francesco> balance. Strongly seconded. -- Colin Adams Preston Lancashire

Francesco Bochicchio wrote:
Heinrich Apfelmus wrote:
Stylistically, one usually uses shorter variable names in Haskell.
<beginner rant> Sometime too short peraphs? At least, this is one of the things that slows down my understanding of code posted on this list on or on various haskell tutorial. In any other language I know, programmers learn to give meaningful names to variable and functions, so when one reads a program, one can use the name to remember what the function does. Then one cames to haskell ... I guess the short names comes from mathematic background, but still ... haskell is already very succint - even more so when you use pointfree programming - and if one also uses names like a,e,i ( look at Array function definitions ), ...
Rant apart, I notice that in my own excercises I tend to shorten names, so maybe there is a reason for that. Nevertheless readability tends to be a big issue in languages used in IT industry, and my feeling is that haskell tends to err on the laconic side of the balance.
The goal is of course to make code readable, that's why I recommend short names. :D Abstraction is the one driving force for very short names. For example, take the definition of foldr foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs) Since this function is polymorphic, so f , z and the xs can be anything, using more "descriptive" variable names is simply not possible; the key point of fold is its generality. A second, and my main reason for short names, or rather against long names, is that names should be to the point. None of the names newPrimes topPrime doFactors doFilter accurately describe the object they represent. The primes are not "new", the prime is not "on top". The "do" is a prefix does not carry a meaning either, it just conveys that doFactors has something to do with factors . This is best expressed by making doFactors a local definition in the where-block of factors . The name eratosthenesFilter is ok, but since there is no other eratosthenes around, no meaning is lost by shortening it to simply eratosthenes . Not to mention that the conventional term is "sieve", not "filter". The documentation has to elaborate on it anyway. The generality of the name num hints that a single letter name is preferable. The names that I think are great because they are to the point are factors primes
Out of curiosity, there is any reason why you called the auxiliary function 'go' ?
Convention. Often, an auxiliary function that does basically the same thing as the main function factors but with an extra parameter will be named factors' . The apostrophe has the drawback that it's easy to forget, so some people now name such auxiliary functions go instead. Regards, apfelmus -- http://apfelmus.nfshost.com

Greetings!
A second, and my main reason for short names, or rather against long names, is that names should be to the point. None of the names
newPrimes topPrime doFactors doFilter
accurately describe the object they represent. The primes are not "new", the prime is not "on top". The "do" is a prefix does not carry a meaning either, it just conveys that doFactors has something to do with factors . This is best expressed by making doFactors a local definition in the where-block of factors.
Those remarks are fine with me! I asked about the stylistic changes because I came from the, hm, Java world and would like to avoid "writing familiar things in unfamiliar language". In Java, factors() and doFactors() would be a perfectly named methods: factors() is public, auxiliary doFactors() is private and essentially _does_ the factoring.
Out of curiosity, there is any reason why you called the auxiliary function 'go' ?
Convention. Often, an auxiliary function that does basically the same thing as the main function factors but with an extra parameter will be named factors' . The apostrophe has the drawback that it's easy to forget, so some people now name such auxiliary functions go instead.
I think having a local function 'go' in 'factors' is aboslutely plausible: it is local, there's no ambiguity. Regards, Sergey

"Heinrich" == Heinrich Apfelmus
writes:
Heinrich> Abstraction is the one driving force for very short Heinrich> names. For example, take the definition of foldr Heinrich> foldr f z [] = z foldr f z (x:xs) = f x (foldr f z Heinrich> xs) Heinrich> Since this function is polymorphic, so f , z and the xs Heinrich> can be anything, using more "descriptive" variable names Heinrich> is simply not possible; the key point of fold is its Heinrich> generality. Wouldn't unit be a better descriptive name than z? -- Colin Adams Preston Lancashire

Colin Paul Adams wrote:
"Heinrich" == Heinrich Apfelmus
writes: Heinrich> Abstraction is the one driving force for very short Heinrich> names. For example, take the definition of foldr
Heinrich> foldr f z [] = z foldr f z (x:xs) = f x (foldr f z Heinrich> xs)
Heinrich> Since this function is polymorphic, so f , z and the xs Heinrich> can be anything, using more "descriptive" variable names Heinrich> is simply not possible; the key point of fold is its Heinrich> generality.
Wouldn't unit be a better descriptive name than z?
I have never heard of a unit in relation to fold , I'm afraid. Monoids and groups have units, as do physicists and a few other mathematical structures. While z is indeed quite often the unit of a monoid, for instance in sum = foldr (+) 0 product = foldr (*) 1 concat = foldr (++) [] it doesn't have to be the unit of a monoid. Regards, apfelmus -- http://apfelmus.nfshost.com

2009/3/13 Heinrich Apfelmus
Francesco Bochicchio wrote:
Heinrich Apfelmus wrote:
Stylistically, one usually uses shorter variable names in Haskell.
<beginner rant> ...
Rant apart, I notice that in my own excercises I tend to shorten names, so maybe there is a reason for that. Nevertheless readability tends to be a big issue in languages used in IT industry, and my feeling is that haskell tends to err on the laconic side of the balance.
The goal is of course to make code readable, that's why I recommend short names. :D
Abstraction is the one driving force for very short names. For example, take the definition of foldr
foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)
Since this function is polymorphic, so f , z and the xs can be anything, using more "descriptive" variable names is simply not possible; the key point of fold is its generality.
Ok but one could still hint at their structure or purpose: foldr function value (x:xs) = function x ( foldr function value xs ) I believe this would give a little more information to the casual reader.
A second, and my main reason for short names, or rather against long names, is that names should be to the point. None of the names
newPrimes topPrime doFactors doFilter
accurately describe the object they represent. The primes are not "new", the prime is not "on top". The "do" is a prefix does not carry a meaning either, it just conveys that doFactors has something to do with factors . This is best expressed by making doFactors a local definition in the where-block of factors .
I agree that well-documented shared name conventions are better than roll-your-own. (x:xs) is one example of such convention, although I tend to adopt slight variations like (n:nums) for list of numbers and (ch:chars) for list of characters. But roll-your-own is still better than cryptic.
The name eratosthenesFilter is ok, but since there is no other eratosthenes around, no meaning is lost by shortening it to simply eratosthenes . Not to mention that the conventional term is "sieve", not "filter". The documentation has to elaborate on it anyway.
The generality of the name num hints that a single letter name is preferable.
The names that I think are great because they are to the point are
factors primes
I have some resistance to use nouns for functions. In the imperative world,
nouns are for variables, verbs are for functions. I know that in pure functional programming there is not such a thing as variables, but still I would reserve nouns for function parameters and bound expressions. Hence if I have a function that find factors, I would call it findFactors rather than just factors. One such example of misnaming - from a beginner point of view - is the length function in prelude: if it was called count, I believe more beginners would have realized that works by actually counting the elements of a list and not by accessing to some already available 'property' of the list.
Out of curiosity, there is any reason why you called the auxiliary function 'go' ?
Convention. Often, an auxiliary function that does basically the same thing as the main function factors but with an extra parameter will be named factors' . The apostrophe has the drawback that it's easy to forget, so some people now name such auxiliary functions go instead.
I tend to use _ instead of '. Is more visible and keep conveying the idea that the auxiliary function is just a slight variation of the main one.
Regards, apfelmus
Ciao ------ FB

Francesco Bochicchio wrote:
Heinrich Apfelmus wrote:
Abstraction is the one driving force for very short names. For example, take the definition of foldr
foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)
Since this function is polymorphic, so f , z and the xs can be anything, using more "descriptive" variable names is simply not possible; the key point of fold is its generality.
Ok but one could still hint at their structure or purpose:
foldr function value (x:xs) = function x ( foldr function value xs )
I believe this would give a little more information to the casual reader.
Sure, though there is already the convention from mathematics that functions are named f, g and values are denoted by x, y or z .
I have some resistance to use nouns for functions. In the imperative world, nouns are for variables, verbs are for functions. I know that in pure functional programming there is not such a thing as variables, but still I would reserve nouns for function parameters and bound expressions. Hence if I have a function that find factors, I would call it findFactors rather than just factors.
One such example of misnaming - from a beginner point of view - is the length function in prelude: if it was called count, I believe more beginners would have realized that works by actually counting the elements of a list and not by accessing to some already available 'property' of the list.
IMHO, I think that using nouns like length or factors is a good idea, for they sounds nice when used in compound expressions. Compare for instance take (length xs `div` 2) xs take (count xs `div` 2) xs The first can be put into pseudo-english as "take (the length of xs divided by 2) elements of xs" while the verb in the second makes it difficult to use the expression in parenthesis as object to another verb. In contrast, the runtime (O(n) versus O(1) for say a record field) is rather secondary. Implicitly adding the preposition "of" makes the nouns flow, i.e. "factors of n", "length of xs".
Convention. Often, an auxiliary function that does basically the same thing as the main function factors but with an extra parameter will be named factors' . The apostrophe has the drawback that it's easy to forget, so some people now name such auxiliary functions go instead.
I tend to use _ instead of '. Is more visible and keep conveying the idea that the auxiliary function is just a slight variation of the main one.
The convention of using an apostrophe for variations comes from mathematics and it seems that it was put into the Haskell syntax specifically for this purpose. In any case, the rationales and style that I presented are how I interpret the naming conventions from classic papers like John Hughes. Why functional programming matters. http://www.cs.chalmers.se/~rjmh/Papers/whyfp.pdf John Hughes. The Design of a Pretty-Printing Library http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 Most of these conventions root in mathematics, of course. I like them for they encourage thinking purely functionally, but these are of course not the only possibilities. Regards, apfelmus -- http://apfelmus.nfshost.com

On 3/10/2009, "Sergey V. Mikhanov"
Hello,
I am solving a problem of finding the largest prime factor of 600851475143 (Project Euler is great for learning new language!), and came with the following solution (it uses the most ineffective approach to finding prime numbers, however is able to factor the above number in fraction of second):
factors :: Integer -> [Integer]
factors n = doFactors n (eratosthenesFilter [1..n])
doFactors n primes | null newPrimes = [] | otherwise = let topPrime = head newPrimes in if topPrime == n then [topPrime] else topPrime : (doFactors (n `quot` topPrime) primes) where newPrimes = snd $ break (\x -> (n `rem` x) == 0) primes
eratosthenesFilter :: [Integer] -> [Integer]
eratosthenesFilter [] = [] eratosthenesFilter (num : nums) | num == 1 = eratosthenesFilter nums | otherwise = num : (eratosthenesFilter $ doFilter num nums) where doFilter num nums = filter (\x -> x > num && (x `rem` num) /= 0) nums
What would you do different (including stylistic changes)? What are the general comments about efficiency (not of the algorithm, but of the implementation: for example, is it fine to use break at every invocation of doFactors?) and elegance of the solution?
Thanks and regards, Sergey
This is my solution to the same problem. I'm just a beginner with Haskell as well, so just consider this as an alternate solution, not an ideal solution. The bottom 2 functions were pulled out of support code that I use for all my Project Euler solutions, so that's why they seem unnecessarily generic. I think the only real advantages of my solution over yours is that I take advantage of the fact that primes are always odd (except for the number 2) and that the largest prime factor of a number will always be <= half its value. main = putStrLn output output = show result result = largestPrimeFactor 600851475143 largestPrimeFactor n = last $ primeFactors n {- - Gets the prime factors of an integer. -} primeFactors :: (Integral a) => a -> [a] primeFactors n = primeFactorsUsingPrimesList (2:[3, 5 .. n `div` 2]) n {- - Gets the prime factors of a number. The primes list passed as the first - argument is not required to be a list of primes. It is simply required to be - a list of values to try to divide the input from. If this list contains - non-prime values, they should be ordered. If the list does not contain all - of the primes that are divisors of the input value, then the result will be - incorrect. -} primeFactorsUsingPrimesList :: (Integral a) => [a] -> a -> [a] primeFactorsUsingPrimesList _ 1 = [] primeFactorsUsingPrimesList (x:xs) n = if n `rem` x == 0 then x : primeFactorsUsingPrimesList (x:xs) (n `div` x) else primeFactorsUsingPrimesList xs n primeFactorsUsingPrimesList [] n = [n]

"Sergey V. Mikhanov"
I am solving a problem of finding the largest prime factor of 600851475143 (Project Euler is great for learning new language!), [...]
You may find my little auxilliary library [1] for ProjectEuler.net problems handy. For the specific problem of factoring an integer, it uses the wheel factoring method, which uses the idea of Eratosthenes sieving to filter out most non-primes with almost no performance loss compared to the more naive trial division method. [1] http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2388 Greets, Ertugrul. -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://blog.ertes.de/
participants (7)
-
Christian Maeder
-
Colin Paul Adams
-
David Frey
-
Ertugrul Soeylemez
-
Francesco Bochicchio
-
Heinrich Apfelmus
-
Sergey V. Mikhanov