As far as I can tell there's nothing wrong with your code. My
hypothesis is that Haskell optimizes call to sumEuler 5000 by
calling it only once in one thread. Here's why I think so:
The program I used for debugging this is:
import Control.Concurrent.Async (async, wait)
import System.IO.Unsafe (unsafePerformIO)
sumEuler :: Int -> Int
sumEuler = sum . map euler . mkList
where
mkList n = seq (unsafePerformIO (putStrLn
"Calculating")) [1..n-1]
euler n = length (filter (relprime n) [1..n-1])
where
relprime x y = gcd x y == 1
p :: Int -> IO Int
p b = return $! sumEuler b
p5000 :: IO Int
p5000 = return $ sumEuler 5000
main :: IO ()
main = do
a <- async $ p5000
b <- async $ p5000
av <- wait a
bv <- wait b
print (av,bv)
The two main modification are adding a debugging message
"Calculating" when a list [1..(n-1)] is evaluated. The second one is
making p into a function. Notice that it uses strict application ($!
- check how it works with simple $). I will use that function in
further examples.
Running this program with "time ./Test 2 +RTS -ls -N2" gives me:
Calculating
(7598457,7598457)
real 0m3.752s
user 0m3.833s
sys 0m0.211s
Just to be sure I have almost the same time when doing only one
computation with:
main :: IO ()
main = do
a <- async $ p5000
av <- wait a
print av
So it seems like the value returned by p5000 is computed only once.
GHC might be noticing that p5000 will always return the same value
and might try cache it or memoize it. If this hypothesis is right
then calling sumEuler with two different values should run in two
different threads. And indeed it is so:
main :: IO ()
main = do
a <- async $ p 5000
b <- async $ p 4999
av <- wait a
bv <- wait b
print (av,bv)
Gives:
Calculating
Calculating
(7598457,7593459)
real 0m3.758s
user 0m7.414s
sys 0m0.064s
So it runs in two threads just as expected. The strict application
($!) here is important. Otherwise it seems that the async thread
returns a thunk and the evaluation happens in print (av, bv) which
is evaluated in a single thread. Also the fact that p5000 is a top
level binding is important. When I do:
main :: IO ()
main = do
a <- async $ p 5000
b <- async $ p 5000
av <- wait a
bv <- wait b
print (av,bv)
I get no optimization (GHC 7.6.3).
Best,
Greg