Re: [Haskell-cafe] A tale of Project Euler

On Nov 28, 2007 12:12 PM, Kalman Noel
Sebastian Sylvan:
primes :: [Integer] primes = 2 : filter (null . primeFactors) [3,5..]
primeFactors :: Integer-> [Integer] primeFactors n = factor n primes where factor m (p:ps) | p*p > m = [] | m `mod` p == 0 = p : factor (m `div` p) (p:ps) | otherwise = factor m ps
Your definition gives a strange meaning to primeFactors. I'd want that for all n, product (primeFactors n) == n. I think this law holds for the code posted by Olivier.
Yes you're right. That is property should clearly hold. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Sebastian Sylvan wrote:
On Nov 28, 2007 12:12 PM, Kalman Noel
wrote: Sebastian Sylvan:
primes :: [Integer] primes = 2 : filter (null . primeFactors) [3,5..]
primeFactors :: Integer-> [Integer] primeFactors n = factor n primes where factor m (p:ps) | p*p > m = [] | m `mod` p == 0 = p : factor (m `div` p) (p:ps) | otherwise = factor m ps
Your definition gives a strange meaning to primeFactors. I'd want that for all n, product (primeFactors n) == n. I think this law holds for the code posted by Olivier.
Yes you're right. That is property should clearly hold.
There are problems for which it's important to know how many times a given prime factor occurs. And there are other problems where it is merely necessary to know which primes are factors. I would say it's useful to have *both* functions available... By the way, I'm now using a new and much uglier prime seive: size = 1000000 :: Word64 calc_primes :: IO [Word64] calc_primes = do grid <- newArray (2,size) True seive 2 grid where seive :: Word64 -> IOUArray Word64 Bool -> IO [Word64] seive p g = do mapM_ (\n -> writeArray g n False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps) next :: Word64 -> IOUArray Word64 Bool -> IO (Maybe Word64) next p g = do if p == size then return Nothing else do t <- readArray g p if t then return (Just p) else next (p+1) g Seems moderately fast. At least, I can find the 10,001st prime in under 1 second... The obvious downside of course is that you must know how big to make the array at the start of your program, and you must compute the entire array before you can use any of it. Both limitations not precent in the lazy recursive list implementation...

Am Mittwoch, 28. November 2007 22:31 schrieb Andrew Coppin:
There are problems for which it's important to know how many times a given prime factor occurs. And there are other problems where it is merely necessary to know which primes are factors. I would say it's useful to have *both* functions available...
Yup
By the way, I'm now using a new and much uglier prime seive:
size = 1000000 :: Word64
calc_primes :: IO [Word64] calc_primes = do grid <- newArray (2,size) True seive 2 grid where seive :: Word64 -> IOUArray Word64 Bool -> IO [Word64] seive p g = do mapM_ (\n -> writeArray g n False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps)
next :: Word64 -> IOUArray Word64 Bool -> IO (Maybe Word64) next p g = do if p == size then return Nothing else do t <- readArray g p if t then return (Just p) else next (p+1) g
Seems moderately fast. At least, I can find the 10,001st prime in under 1 second...
One thing: since You check the array bounds, the system needn't check them again, use unsafeWrite and unsafeRead. And use Int for the index, that would be MUCH faster. Another thing: you can stop sieving when p*p > size, another speedup Third thing: In my experience mapM_ is relatively slow, hand coded loops are faster (though much uglier), but YMMV Fourth thing: sieve only odd numbers Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
The obvious downside of course is that you must know how big to make the array at the start of your program, and you must compute the entire array before you can use any of it. Both limitations not precent in the lazy recursive list implementation...
The size of the array can easily be estimated by the prime number theorem, pi(x) ~ x/log x, where pi(x) is the number of primes not exceeding x, so for 10,001 primes, find x with x/log x about 10,000, add a bit for safety, a bound of 120,000 will do. If you want the n-th prime, you don't want to use the smaller primes anyway, if you need all primes, chances are that prime generation will take negligible time in comparison to your main algo anyway. Keep enjoying Project Euler, Daniel

Daniel Fischer wrote:
One thing: since You check the array bounds, the system needn't check them again, use unsafeWrite and unsafeRead. And use Int for the index, that would be MUCH faster.
I can't find the functions you're talking about. I have however changed the index type. (Make little or no noticable speed difference. But then, it's already pretty damn fast in the first place...)
Another thing: you can stop sieving when p*p > size, another speedup
Saves a few hundred milliseconds.
Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
I don't understand the ST monad.
The obvious downside of course is that you must know how big to make the array at the start of your program, and you must compute the entire array before you can use any of it. Both limitations not precent in the lazy recursive list implementation...
The size of the array can easily be estimated by the prime number theorem,
Probably. But it doesn't compare to the elegance and simplicity of being able to generate an arbitrary number of primes on an as-needed basis.
Keep enjoying Project Euler, Daniel
Well, I don't know about "enjoy" - generally I just find it quite frustrating. It's kind of addictive trying to increase your score though... (Anybody here reached 100% yet?) I find that a lot of the problems don't seem to involve much math. (E.g., "here is some data, process it into a list of integers, find the highest one". Where is the deep mathematics in that?) A few of them do, but a lot are simply to do with prime numbers, and as far as I'm away, it is impossible to know anything about prime numbers. In other words, you must compute them all the hard way.

On Nov 29, 2007 6:43 PM, Andrew Coppin
Daniel Fischer wrote:
One thing: since You check the array bounds, the system needn't check them again, use unsafeWrite and unsafeRead. And use Int for the index, that would be MUCH faster.
I can't find the functions you're talking about. I have however changed the index type. (Make little or no noticable speed difference. But then, it's already pretty damn fast in the first place...)
Another thing: you can stop sieving when p*p > size, another speedup
Saves a few hundred milliseconds.
Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
I don't understand the ST monad.
There's not a whole lot to understand if you just want to use it (though it's all very cool from a theoretical standpoint too). Here are my minor changes to your program. import Data.Array.ST import Control.Monad.ST calc_primes :: [Word64] calc_primes = runST ( do grid <- newArray (2,size) True seive 2 grid ) where seive :: Word64 -> STUArray s Word64 Bool -> ST s [Word64] seive p g = do mapM_ (\n -> writeArray g n False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps) next :: Word64 -> STUArray s Word64 Bool -> ST s (Maybe Word64) next p g = do if p == size then return Nothing else do t <- readArray g p if t then return (Just p) else next (p+1) g The benefit should be obvious: No pesky IO type, so you can use it in your pure code. You just need to give a type signature somewhere to show the type system that you're using STUArray, but the rest just uses the same type class as you already used for the IOUArrays. And here are the modifications to use the unsafe reads and writes (42% speedup for me): import Data.Array.Base size = 1000001 :: Word64 calc_primes :: [Word64] calc_primes = runST ( do grid <- newArray (2,size) True seive 2 grid ) where seive :: Word64 -> STUArray s Word64 Bool -> ST s [Word64] seive p g = do mapM_ (\n -> unsafeWrite g (fromIntegral n) False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps) next :: Word64 -> STUArray s Word64 Bool -> ST s (Maybe Word64) next p g = do if p == size then return Nothing else do t <- unsafeRead g (fromIntegral p) if t then return (Just p) else next (p+1) g -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Sebastian Sylvan wrote:
On Nov 29, 2007 6:43 PM, Andrew Coppin
wrote: I don't understand the ST monad.
There's not a whole lot to understand if you just want to use it (though it's all very cool from a theoretical standpoint too).
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
Here are my minor changes to your program.
import Data.Array.ST import Control.Monad.ST
calc_primes :: [Word64] calc_primes = runST ( do grid <- newArray (2,size) True seive 2 grid ) where seive :: Word64 -> STUArray s Word64 Bool -> ST s [Word64] seive p g = do mapM_ (\n -> writeArray g n False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps)
next :: Word64 -> STUArray s Word64 Bool -> ST s (Maybe Word64) next p g = do if p == size then return Nothing else do t <- readArray g p if t then return (Just p) else next (p+1) g
The benefit should be obvious: No pesky IO type, so you can use it in your pure code. You just need to give a type signature somewhere to show the type system that you're using STUArray, but the rest just uses the same type class as you already used for the IOUArrays.
How do you avoid accidentally recomputing the list multiple times?
And here are the modifications to use the unsafe reads and writes (42% speedup for me):
import Data.Array.Base
size = 1000001 :: Word64
calc_primes :: [Word64] calc_primes = runST ( do grid <- newArray (2,size) True seive 2 grid ) where seive :: Word64 -> STUArray s Word64 Bool -> ST s [Word64] seive p g = do mapM_ (\n -> unsafeWrite g (fromIntegral n) False) [p, 2*p .. size] mp' <- next (p+1) g case mp' of Nothing -> return [p] Just p' -> do ps <- seive p' g return (p:ps)
next :: Word64 -> STUArray s Word64 Bool -> ST s (Maybe Word64) next p g = do if p == size then return Nothing else do t <- unsafeRead g (fromIntegral p) if t then return (Just p) else next (p+1) g
I don't see Data.Array.Base documented anywhere. (How did you know it exists?)

On Thu, Nov 29, 2007 at 09:10:16PM +0000, Andrew Coppin wrote:
Sebastian Sylvan wrote:
On Nov 29, 2007 6:43 PM, Andrew Coppin
wrote: I don't understand the ST monad.
There's not a whole lot to understand if you just want to use it (though it's all very cool from a theoretical standpoint too).
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
You can't implement IO in H98 either. You just have to accept it as a given. (As far as ST goes, runST is unsafePerformIO renamed. The only tricky bit is proving safety.) Stefan

On Nov 29, 2007, at 6:19 PM, Stefan O'Rear wrote:
On Thu, Nov 29, 2007 at 09:10:16PM +0000, Andrew Coppin wrote:
Sebastian Sylvan wrote:
On Nov 29, 2007 6:43 PM, Andrew Coppin
wrote: I don't understand the ST monad.
...[and ST uses language extensions Andrew doesn't understand.]
(As far as ST goes, runST is unsafePerformIO renamed. The only tricky bit is proving safety.)
To put it another way, runST is unsafePerformIO where somebody has already done the safety proof for you (so you know it's 100% safe). The "strange" extensions are simply a device to make the safety proof work. Indeed, if you drop the extensions it can all be made to work (just say runST :: ST () a -> a) but you lose the safety proof and it's equivalent to unsafePerformIO. -Jan [The trick used in runST is one of my all-time favorite bits of type theory, and is what convinced me we wanted second-order types back before the first Haskell workshop.]

Hello Andrew, Friday, November 30, 2007, 12:10:16 AM, you wrote:
I don't understand the ST monad.
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
this extension used only to guarantee referential transparency, so you don't need to understand it to use ST if you still want, it's not that hard - rank-2 forall type used here to guarantee that code executed by runST is compatible with *any* return type which makes it impossible to return "innards" of this computation and therefore forces your code to be referential transparent runST may be defined without rank-2 type. it doesn't change anything in its low-level working (this rank-2 type is just empty value, like RealWorld), but allows you to break referential transparency: main = do a <- runST (do a <- newArray return a) ... x <- runST (do x <- readArray a return x) ... low-level implementation of all ST-bound operations is just direct call to equivalent IO operation: newSTArray = unsafeIOtoST newIOArray readSTArray = unsafeIOtoST readIOArray where unsafeIOtoST - just code-less cast operation -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Andrew,
Friday, November 30, 2007, 12:10:16 AM, you wrote:
I don't understand the ST monad.
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
this extension used only to guarantee referential transparency, so you don't need to understand it to use ST
if you still want, it's not that hard - rank-2 forall type used here to guarantee that code executed by runST is compatible with *any* return type which makes it impossible to return "innards" of this computation and therefore forces your code to be referential transparent
runST may be defined without rank-2 type. it doesn't change anything in its low-level working (this rank-2 type is just empty value, like RealWorld), but allows you to break referential transparency:
main = do a <- runST (do a <- newArray return a) ... x <- runST (do x <- readArray a return x) ...
low-level implementation of all ST-bound operations is just direct call to equivalent IO operation:
newSTArray = unsafeIOtoST newIOArray readSTArray = unsafeIOtoST readIOArray
where unsafeIOtoST - just code-less cast operation
Mmm, so basically it's rank-2 types I don't understand. ;-) Well anyway, given the multiple replies I've had, I now have some idea how this stuff works...

On Nov 29, 2007 9:10 PM, Andrew Coppin
Sebastian Sylvan wrote:
On Nov 29, 2007 6:43 PM, Andrew Coppin
wrote: I don't understand the ST monad.
There's not a whole lot to understand if you just want to use it (though it's all very cool from a theoretical standpoint too).
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
Yeah the magic happens in the type for runST: runST :: (forall s. ST s a) -> a That "s" variable there is created "fresh" for the first argument (and is *not* in scope for the second argument to the function). Notice that all the ST stuff (ST actions, STUArrays etc.) also carry along that "s" type variable? That's intentional. Because all the mutable stuff carries the "s", which is not in scope in the return type, it's impossible to accidentally leak side effects out of runST (try running runST on an ST action thar returns an STRef or STUArray or something, and you'll get a type error). That's pretty much all there is to it. A cool type for runST that guarantees that side effects won't leak.
How do you avoid accidentally recomputing the list multiple times?
What do you mean? It's exactly the same as your original program but with ST instead of IO? Why would it get accidentally recomputed in this scenario and not before?
I don't see Data.Array.Base documented anywhere. (How did you know it exists?)
Hmm.. I don't remember. But now you know too! :-) I think it may be a secret GHC library that you're not supposed to know about... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Hello Sebastian, Friday, November 30, 2007, 11:31:22 AM, you wrote:
I don't see Data.Array.Base documented anywhere. (How did you know it exists?)
i use library sources as reference. it allows to study implementation and learn good programming style -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Sebastian Sylvan wrote:
On Nov 29, 2007 9:10 PM, Andrew Coppin
wrote: How do you avoid accidentally recomputing the list multiple times?
What do you mean? It's exactly the same as your original program but with ST instead of IO? Why would it get accidentally recomputed in this scenario and not before?
Because before the moment when it gets executed relative to the main I/O thread is explicitly defined, and now it isn't.
I don't see Data.Array.Base documented anywhere. (How did you know it exists?)
Hmm.. I don't remember. But now you know too! :-) I think it may be a secret GHC library that you're not supposed to know about...
Hmm. Secret... library... How do you guys find out about all this stuff? (And if it's secret, should we be playing with it?)

On Nov 30, 2007 2:39 PM, Andrew Coppin
Sebastian Sylvan wrote:
On Nov 29, 2007 9:10 PM, Andrew Coppin
wrote: How do you avoid accidentally recomputing the list multiple times?
What do you mean? It's exactly the same as your original program but with ST instead of IO? Why would it get accidentally recomputed in this scenario and not before?
Because before the moment when it gets executed relative to the main I/O thread is explicitly defined, and now it isn't.
Yes it is. There's nothing magical about how the I/O thread guarantees when/in what order things get executed; it all has to do with data dependencies. Since the ST monad represents a strict state thread, it will have exactly the same behavior in terms of when/in what order things get executed as in the IO monad. -Brent

On Nov 30, 2007 5:39 PM, Andrew Coppin
Hmm. Secret... library... How do you guys find out about all this stuff?
There's http://www.haskell.org/haskellwiki/Arrays#Unsafe_indexing.2C_freezing.2Fthaw... . Cheers, -- Felipe.

On Thu, 29 Nov 2007, Andrew Coppin wrote:
Sebastian Sylvan wrote:
On Nov 29, 2007 6:43 PM, Andrew Coppin
wrote: I don't understand the ST monad.
There's not a whole lot to understand if you just want to use it (though it's all very cool from a theoretical standpoint too).
From what I can tell, it's not definable without using strange language extensions. (I don't really like using things where it's unclear why it works.)
Is this thread still about the prime sieve? As I mentioned, I think one can avoid the mutable array, because if there is only a small number of array updates with much changes per update, it should be efficient enough to copy the array per update.

Am Freitag, 30. November 2007 14:39 schrieb Henning Thielemann:
Is this thread still about the prime sieve? As I mentioned, I think one can avoid the mutable array, because if there is only a small number of array updates with much changes per update, it should be efficient enough to copy the array per update.
I think in this case it's far less efficient than in-place update. Consider sieving primes up to 10^7, there are 446 primes with p^2 < 10^7, so you would have over 400 array updates. Even if you leave out the even numbers, an array going up to 10^7 would require some 800 KB memory (each odd number one bit), so overall, you'd allocate well over 300 MB (not at once, of course). Using an STUArray runs easily within 1MB allocated once. And why avoid mutable arrays in the first place? What's bad about thing = runSTUArray (do work)? Granted, "work" tends to be far less elegant code than list comprehensions &c, but also much faster. Cheers, Daniel

On Fri, 30 Nov 2007, Daniel Fischer wrote:
Am Freitag, 30. November 2007 14:39 schrieb Henning Thielemann:
Is this thread still about the prime sieve? As I mentioned, I think one can avoid the mutable array, because if there is only a small number of array updates with much changes per update, it should be efficient enough to copy the array per update.
I think in this case it's far less efficient than in-place update. Consider sieving primes up to 10^7, there are 446 primes with p^2 < 10^7, so you would have over 400 array updates. Even if you leave out the even numbers, an array going up to 10^7 would require some 800 KB memory (each odd number one bit), so overall, you'd allocate well over 300 MB (not at once, of course).
"not at once" - that's the point. With some luck there are only two pieces of memory where the data is copied back and forth. It's obviously not the most efficient solution, but a non-monadic solution.

Am Donnerstag, 29. November 2007 19:43 schrieb Andrew Coppin:
Daniel Fischer wrote:
One thing: since You check the array bounds, the system needn't check them again, use unsafeWrite and unsafeRead. And use Int for the index, that would be MUCH faster.
I can't find the functions you're talking about.
As Sebastian already said, they're in Data.Array.Base, sorry, should've mentioned that.
I have however changed the index type. (Make little or no noticable speed difference. But then, it's already pretty damn fast in the first place...)
But it can be considerably faster. On my 5 yo 1200MHz thing, I can sieve up to 10 million in less than half a second: dafis@linux:~/Documents/haskell/move> time ./Sieve2 10000000 664579 real 0m0.419s user 0m0.420s sys 0m0.000s And that's even faster than any sieve in C that I managed to write. (Okay, I don't know how to use bitsets in C, that might beat Haskell)
Another thing: you can stop sieving when p*p > size, another speedup
Saves a few hundred milliseconds.
Which is pretty much for a programme running below 1 second.
Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
I don't understand the ST monad.
Nor do I. Just use it (and be prepared to use scoped type variables).
The obvious downside of course is that you must know how big to make the array at the start of your program, and you must compute the entire array before you can use any of it. Both limitations not precent in the lazy recursive list implementation...
The size of the array can easily be estimated by the prime number theorem,
Probably. But it doesn't compare to the elegance and simplicity of being able to generate an arbitrary number of primes on an as-needed basis.
True. But when speed matters, you have to bite the bullet. And if you don't know up to which bound you need the primes, but know it's a not too low bound, you can sieve the first part in an array and then go on using a list, that's what I do.
Keep enjoying Project Euler, Daniel
Well, I don't know about "enjoy" - generally I just find it quite frustrating. It's kind of addictive trying to increase your score though... (Anybody here reached 100% yet?)
int-e (Bertram Felgenhauer, I believe), well he has not yet solved last week's, so currently he's at 99%, and there were a couple of others but they didn't persevere.
I find that a lot of the problems don't seem to involve much math. (E.g., "here is some data, process it into a list of integers, find the highest one". Where is the deep mathematics in that?) A few of them do, but a lot are simply to do with prime numbers, and as far as I'm away, it is impossible to know anything about prime numbers. In other words, you must compute them all the hard way.
Some of the problems require little math, but more programming techniques. Others require a good grip of maths to find an efficient algorithm. Really deep maths can't be required, the site is not meant for professional mathematicians but for laypeople with mathematical interest. The aim is to make brute force unattractive (brute forcing times from 5 hours upwards), while a mathematical insight will often give the solution in a few nanoseconds. Many problems require both, some maths to find a good algorithm, and some programming techniques to make it run really fast. Cheers, Daniel

Hello Andrew, Thursday, November 29, 2007, 9:43:48 PM, you wrote:
Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
I don't understand the ST monad.
it's just a subset of IO monad, with renamed operations the subset is selected in the way that guarantees reference transparency, i.e. compiler is able to do more checks that your program is correct. so, instead of writing main = do a <- newArray writeArray a 1 3.14 x <- readArray a 1 print x you can write main = do x <- runST (do a <- newArray writeArray a 1 3.14 x <- readArray a 1 return x) print x -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Fri, 30 Nov 2007, Bulat Ziganshin wrote:
Hello Andrew,
Thursday, November 29, 2007, 9:43:48 PM, you wrote:
Fifth thing: better use an STUArray, don't drag IO in if it's not necessary.
I don't understand the ST monad.
it's just a subset of IO monad, with renamed operations
participants (9)
-
Andrew Coppin
-
Brent Yorgey
-
Bulat Ziganshin
-
Daniel Fischer
-
Felipe Lessa
-
Henning Thielemann
-
Jan-Willem Maessen
-
Sebastian Sylvan
-
Stefan O'Rear