
Hello everybody, I am following "A Tutorial on Parallel and Concurrent Programming in Haskell" and I have a problem with making Haskell to use my multi-cores (Core 2 Quad CPU). The Haskel version I used is GHC 6.10.1, for Haskell 98. I compile my below program with command: ghc --make -threaded -debug thread0.hs, and run with: thread0 +RTS -N4 while watching the cpu usage on another terminal (by: mpstat -P ALL 1 100), but the program uses only one core of my Ubuntu Linux. Do any of you know why or has any suggestions? Below is my program: import Control.Concurrent import Control.Concurrent.MVar fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) dowork = putStrLn ("fib 35 = " ++ (show (fib 35))) threadA :: MVar Int -> MVar Int -> MVar Int -> IO () threadA valueToSendMVar valueToReadMVar valueToQuit = do -- some work dowork -- perform rendezvous putMVar valueToSendMVar 30 -- send value v <- takeMVar valueToReadMVar putStrLn ("result, fib 30 = " ++ (show v)) dowork -- notify done putMVar valueToQuit 0 -- send value threadB :: MVar Int -> MVar Int -> MVar Int -> IO () threadB valueToReceiveMVar valueToSendMVar valueToQuit = do -- some work dowork -- perform rendezvous by waiting z <- takeMVar valueToReceiveMVar putMVar valueToSendMVar (fib z) -- continue with other work dowork -- notify done putMVar valueToQuit 0 -- send value main :: IO () main = do aQuitA <- newEmptyMVar aQuitB <- newEmptyMVar aMVar <- newEmptyMVar bMVar <- newEmptyMVar forkOS (threadA aMVar bMVar aQuitA ) forkOS (threadB aMVar bMVar aQuitB ) -- wait for threadA and threadB takeMVar aQuitA takeMVar aQuitB return () Thanks a lot, Hoang

Hoang Truong wrote:
Hello everybody,
I am following "A Tutorial on Parallel and Concurrent Programming in Haskell" and I have a problem with making Haskell to use my multi-cores (Core 2 Quad CPU).
The Haskel version I used is GHC 6.10.1, for Haskell 98. I compile my below program with command: ghc --make -threaded -debug thread0.hs, and run with: thread0 +RTS -N4 while watching the cpu usage on another terminal (by: mpstat -P ALL 1 100), but the program uses only one core of my Ubuntu Linux.
Do any of you know why or has any suggestions? Below is my program:
Why do people still insist on using forkOS? You don't need forkOS unless you need to call C libraries that use thread-local state. Otherwise, it will just reduce your performance compared to forkIO. Admittedly the documentation for forkOS has been misleading in the past, but I think the current version is pretty clear: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurren...
import Control.Concurrent import Control.Concurrent.MVar
fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2)
dowork = putStrLn ("fib 35 = " ++ (show (fib 35)))
Perhaps you were expecting "fib 35" to be repeatedly executed each time you call dowork? Laziness means it only gets evaluated once. Cheers, Simon

Hi Simon,
I tried with forkIO and added another dowork functions but the result is the
same: only one core is used, three other cores are idle. Do you have any
other suggestions? Is there anything I should take care when installing GHC?
I also did try the Wombat.hs from the tutorial, but only one core is used
and the times are almost the same.
seq sum: 119201850
seq time: 20.959932 seconds.
par sum: 119201850
par time: 20.959547 seconds.
--------
import System.Time
import Control.Parallel
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
secDiff :: ClockTime -> ClockTime -> Float
secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) =
fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)
mkList :: Int -> [Int]
mkList n = [1..n-1]
relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1
euler :: Int -> Int
euler n = length (filter (relprime n) (mkList n))
sumEuler :: Int -> Int
sumEuler = sum . (map euler) . mkList
sumFibEuler:: Int -> Int -> Int
sumFibEuler a b = fib a + sumEuler b
parSumFibEuler a b = f `par` (e `pseq` (e+ f))
where
f = fib a
e = sumEuler b
r1 :: Int
r1 = sumFibEuler 40 7450
r2 :: Int
r2 = parSumFibEuler 40 7450
main :: IO ()
main =
do
t0 <- getClockTime
pseq r1 (return())
t1 <- getClockTime
putStrLn ("seq sum: " ++ show r1)
putStrLn ("seq time: " ++ show (secDiff t0 t1) ++ " seconds.")
t0 <- getClockTime
pseq r2 (return())
t1 <- getClockTime
putStrLn ("par sum: " ++ show r2)
putStrLn ("par time: " ++ show (secDiff t0 t1) ++ " seconds.")
-----
Many thanks,
Hoang
On Tue, Dec 9, 2008 at 7:26 PM, Simon Marlow
Hoang Truong wrote:
Hello everybody,
I am following "A Tutorial on Parallel and Concurrent Programming in Haskell" and I have a problem with making Haskell to use my multi-cores (Core 2 Quad CPU). The Haskel version I used is GHC 6.10.1, for Haskell 98. I compile my below program with command: ghc --make -threaded -debug thread0.hs, and run with: thread0 +RTS -N4 while watching the cpu usage on another terminal (by: mpstat -P ALL 1 100), but the program uses only one core of my Ubuntu Linux.
Do any of you know why or has any suggestions? Below is my program:
Why do people still insist on using forkOS? You don't need forkOS unless you need to call C libraries that use thread-local state. Otherwise, it will just reduce your performance compared to forkIO. Admittedly the documentation for forkOS has been misleading in the past, but I think the current version is pretty clear:
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurren...
import Control.Concurrent
import Control.Concurrent.MVar
fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2)
dowork = putStrLn ("fib 35 = " ++ (show (fib 35)))
Perhaps you were expecting "fib 35" to be repeatedly executed each time you call dowork? Laziness means it only gets evaluated once.
Cheers, Simon

Hoang Truong wrote:
Hi Simon,
I tried with forkIO and added another dowork functions but the result is the same: only one core is used, three other cores are idle. Do you have any other suggestions? Is there anything I should take care when installing GHC?
I also did try the Wombat.hs from the tutorial, but only one core is used and the times are almost the same.
seq sum: 119201850 seq time: 20.959932 seconds. par sum: 119201850 par time: 20.959547 seconds.
Your program is suffering from microbenchmarkitis, I'm afraid. There's only one spark, which tickles a bug in the scheduler in 6.10.1 and earlier (but sometimes doesn't happen due to random scheduling behaviour). Even with that fixed, the program uses fib which tickles another bug: when optimised, fib doesn't do any allocation, and GHC's scheduler relies on allocation happening at regular enough intervals. In 6.10.1 we never get to do load-balancing in this example, because fib doesn't ever yield control to the scheduler. In HEAD, where we have work-stealing and don't rely on the scheduler for load-balancing, the load-balancing problem goes away but reveals another problem: the second thread wants to GC, but in order to GC it has to synchronise with the other running threads, but the other thread is running fib and never yields. We can fix this by allowing CPUs to GC independently (which we plan to do), but even then you could still run into the same problem because eventually a global GC will be required. If you really want to see the program running in parallel, turn off -O. Cheers, Simon

Hello Simon, Wednesday, December 10, 2008, 5:24:28 PM, you wrote: good explanation of various shortanges on the way to multi-threading. may be it worth a link from GHC Concurrency pages?
Hoang Truong wrote:
Hi Simon,
I tried with forkIO and added another dowork functions but the result is the same: only one core is used, three other cores are idle. Do you have any other suggestions? Is there anything I should take care when installing GHC?
I also did try the Wombat.hs from the tutorial, but only one core is used and the times are almost the same.
seq sum: 119201850 seq time: 20.959932 seconds. par sum: 119201850 par time: 20.959547 seconds.
Your program is suffering from microbenchmarkitis, I'm afraid. There's only one spark, which tickles a bug in the scheduler in 6.10.1 and earlier (but sometimes doesn't happen due to random scheduling behaviour). Even with that fixed, the program uses fib which tickles another bug: when optimised, fib doesn't do any allocation, and GHC's scheduler relies on allocation happening at regular enough intervals.
In 6.10.1 we never get to do load-balancing in this example, because fib doesn't ever yield control to the scheduler. In HEAD, where we have work-stealing and don't rely on the scheduler for load-balancing, the load-balancing problem goes away but reveals another problem: the second thread wants to GC, but in order to GC it has to synchronise with the other running threads, but the other thread is running fib and never yields. We can fix this by allowing CPUs to GC independently (which we plan to do), but even then you could still run into the same problem because eventually a global GC will be required. If you really want to see the program running in parallel, turn off -O.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Thanks Simon for your detail explanation. It does help me as I am new to
Haskell. Btw, I did not use -O option.
Hoang
On Wed, Dec 10, 2008 at 9:24 PM, Simon Marlow
Hoang Truong wrote:
Hi Simon,
I tried with forkIO and added another dowork functions but the result is the same: only one core is used, three other cores are idle. Do you have any other suggestions? Is there anything I should take care when installing GHC?
I also did try the Wombat.hs from the tutorial, but only one core is used and the times are almost the same.
seq sum: 119201850 seq time: 20.959932 seconds. par sum: 119201850 par time: 20.959547 seconds.
Your program is suffering from microbenchmarkitis, I'm afraid. There's only one spark, which tickles a bug in the scheduler in 6.10.1 and earlier (but sometimes doesn't happen due to random scheduling behaviour). Even with that fixed, the program uses fib which tickles another bug: when optimised, fib doesn't do any allocation, and GHC's scheduler relies on allocation happening at regular enough intervals.
In 6.10.1 we never get to do load-balancing in this example, because fib doesn't ever yield control to the scheduler. In HEAD, where we have work-stealing and don't rely on the scheduler for load-balancing, the load-balancing problem goes away but reveals another problem: the second thread wants to GC, but in order to GC it has to synchronise with the other running threads, but the other thread is running fib and never yields. We can fix this by allowing CPUs to GC independently (which we plan to do), but even then you could still run into the same problem because eventually a global GC will be required. If you really want to see the program running in parallel, turn off -O.
Cheers, Simon

hoangta:
Hello everybody, I am following "A Tutorial on Parallel and Concurrent Programming in Haskell" and I have a problem with making Haskell to use my multi-cores (Core 2 Quad CPU). The Haskel version I used is GHC 6.10.1, for Haskell 98. I compile my below program with command: ghc --make -threaded -debug thread0.hs, and run with: thread0 +RTS -N4 while watching the cpu usage on another terminal (by: mpstat -P ALL 1 100), but the program uses only one core of my Ubuntu Linux. Do any of you know why or has any suggestions? Below is my program:
import Control.Concurrent import Control.Concurrent.MVar fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) dowork = putStrLn ("fib 35 = " ++ (show (fib 35))) threadA :: MVar Int -> MVar Int -> MVar Int -> IO () threadA valueToSendMVar valueToReadMVar valueToQuit = do -- some work dowork -- perform rendezvous putMVar valueToSendMVar 30 -- send value v <- takeMVar valueToReadMVar putStrLn ("result, fib 30 = " ++ (show v)) dowork -- notify done putMVar valueToQuit 0 -- send value threadB :: MVar Int -> MVar Int -> MVar Int -> IO () threadB valueToReceiveMVar valueToSendMVar valueToQuit = do -- some work dowork -- perform rendezvous by waiting z <- takeMVar valueToReceiveMVar putMVar valueToSendMVar (fib z) -- continue with other work dowork -- notify done putMVar valueToQuit 0 -- send value main :: IO () main = do aQuitA <- newEmptyMVar aQuitB <- newEmptyMVar aMVar <- newEmptyMVar bMVar <- newEmptyMVar forkOS (threadA aMVar bMVar aQuitA ) forkOS (threadB aMVar bMVar aQuitB ) -- wait for threadA and threadB takeMVar aQuitA takeMVar aQuitB return ()
How about, import Control.Parallel import Control.Monad import Text.Printf cutoff = 35 fib' :: Int -> Integer fib' 0 = 0 fib' 1 = 1 fib' n = fib' (n-1) + fib' (n-2) fib :: Int -> Integer fib n | n < cutoff = fib' n | otherwise = r `par` (l `pseq` l + r) where l = fib (n-1) r = fib (n-2) main = forM_ [0..45] $ \i -> printf "n=%d => %d\n" i (fib i) Where: $ ghc -O2 -threaded fib.hs --make Linking fib ... $ time ./fib +RTS -N2 n=0 => 0 n=1 => 1 n=2 => 1 n=3 => 2 n=4 => 3 ... n=43 => 433494437 n=44 => 701408733 n=45 => 1134903170 ./fib 30 +RTS -N2 107.56s user 0.54s system 184% cpu 58.703 tota

Thanks Don. Your fib program works well. It uses all four cores of my
computer with +RTS -N4. But the Wombat.hs still does not work. It seems
tricky to me.
Hoang
On Wed, Dec 10, 2008 at 4:47 AM, Don Stewart
hoangta:
Hello everybody, I am following "A Tutorial on Parallel and Concurrent Programming in Haskell" and I have a problem with making Haskell to use my multi-cores (Core 2 Quad CPU). The Haskel version I used is GHC 6.10.1, for Haskell 98. I compile my below program with command: ghc --make -threaded -debug thread0.hs, and run with: thread0 +RTS -N4 while watching the cpu usage on another terminal (by: mpstat -P ALL 1 100), but the program uses only one core of my Ubuntu Linux. Do any of you know why or has any suggestions? Below is my program:
import Control.Concurrent import Control.Concurrent.MVar fib :: Int -> Int fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) dowork = putStrLn ("fib 35 = " ++ (show (fib 35))) threadA :: MVar Int -> MVar Int -> MVar Int -> IO () threadA valueToSendMVar valueToReadMVar valueToQuit = do -- some work dowork -- perform rendezvous putMVar valueToSendMVar 30 -- send value v <- takeMVar valueToReadMVar putStrLn ("result, fib 30 = " ++ (show v)) dowork -- notify done putMVar valueToQuit 0 -- send value threadB :: MVar Int -> MVar Int -> MVar Int -> IO () threadB valueToReceiveMVar valueToSendMVar valueToQuit = do -- some work dowork -- perform rendezvous by waiting z <- takeMVar valueToReceiveMVar putMVar valueToSendMVar (fib z) -- continue with other work dowork -- notify done putMVar valueToQuit 0 -- send value main :: IO () main = do aQuitA <- newEmptyMVar aQuitB <- newEmptyMVar aMVar <- newEmptyMVar bMVar <- newEmptyMVar forkOS (threadA aMVar bMVar aQuitA ) forkOS (threadB aMVar bMVar aQuitB ) -- wait for threadA and threadB takeMVar aQuitA takeMVar aQuitB return ()
How about,
import Control.Parallel import Control.Monad import Text.Printf
cutoff = 35
fib' :: Int -> Integer fib' 0 = 0 fib' 1 = 1 fib' n = fib' (n-1) + fib' (n-2)
fib :: Int -> Integer fib n | n < cutoff = fib' n | otherwise = r `par` (l `pseq` l + r) where l = fib (n-1) r = fib (n-2)
main = forM_ [0..45] $ \i -> printf "n=%d => %d\n" i (fib i)
Where:
$ ghc -O2 -threaded fib.hs --make Linking fib ...
$ time ./fib +RTS -N2 n=0 => 0 n=1 => 1 n=2 => 1 n=3 => 2 n=4 => 3 ... n=43 => 433494437 n=44 => 701408733 n=45 => 1134903170 ./fib 30 +RTS -N2 107.56s user 0.54s system 184% cpu 58.703 tota
participants (4)
-
Bulat Ziganshin
-
Don Stewart
-
Hoang Truong
-
Simon Marlow