
Il giorno Apr 14, 2007, alle ore 12:33 AM, Stefan O'Rear ha scritto:
On Sat, Apr 14, 2007 at 12:27:10AM +0200, Fawzi Mohamed wrote:
I was trying to speed up a program that I wrote and so I thought about using multiple threads. I have a quite easy parallel program and I did the following
do subRes <- MVar.newMVar [] putStrLn "starting threads" subV <- flip mapM [0 .. (nThreads - 1)] $ ( \i -> do subR <- MVar.newEmptyMVar let writeRes r = do { MVar.putMVar subR r } forkOS (do let r=eval (startData !! i) writeRes $! r putStr $ "writtenRes") return subR ) putStrLn "started threads" toFold <- mapM MVar.takeMVar subV putStrLn "about to fold" return $ foldl1 mergeRes toFold
I know that the threads really calculate what I want, and as soon as they are finished I get the result. The problem is that I have no speed up, the time is basically the sum of the time for the two threads. I thought that ghc now would take advantage of the two cpus if I compiled with -threaded. Am I wrong, do I need some special flag, a newer version of the compiler (I have 6.6.20070129), or it is just normal?
./MyProgram +RTS -N2
where N is your CPU count.
thanks, that was it.
(that said, DO NOT USE THREADS IF AT ALL POSSIBLE, they are ugly and cause heisenbugs, if you want paralelism `par` from Control.Parallel is to be preferred if at all possible since it is deterministic)
in theory yes, but I am quite used to programm with threads and even mpi. I have looked at Control.Parallel (and the nice article on it), but there is no easy way tell (at least that I saw) to leave the whole calculation to a sub thread. Actually my code should be equivalent to parMap rwhnf, but I get the following results: parMap 3.63user 0.02system 0:01.97elapsed 185%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1039minor)pagefaults 0swaps threads 3.14user 0.02system 0:01.68elapsed 187%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1041minor)pagefaults 0swaps I suppose that it is because I have a thread for each element in the list plus a main thread vs just one thread per element in the list, but I am not sure, if someone has some ideas... With threads (now the I managed to have some speedup) I can use a workers/queue approach and have a better load balancing. I had looked at strategies, but I need first a breath first traversal of a graph generated on the fly (to be parallel) and then a depth first traversal (to avoid space leaks), and I found no easy way to do it with strategies, so I did it by hand. by the way is there a way to know how many processors are available to the program (to make the strategy or thread control depend on it)? Fawzi