A better test program
import Control.Parallel
main = a `par` b `pseq` print (a + b )
where
a = ack 3 11
b = fib 39
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
running it , these are the results
paolino@paolino-casa:~$ ghc --make prova -threaded
[1 of 1] Compiling Main ( prova.hs, prova.o )
Linking prova ...
paolino@paolino-casa:~$ time ./prova +RTS -N1
63262367
real 1m17.485s
user 1m16.473s
sys 0m0.392s
paolino@paolino-casa:~$ time ./prova +RTS -N2
63262367
real 1m20.186s
user 1m31.554s
sys 0m0.600s
paolino@paolino-casa:~$ touch prova.hs
paolino@paolino-casa:~$ ghc --make prova -O2 -threaded
[1 of 1] Compiling Main ( prova.hs, prova.o )
Linking prova ...
paolino@paolino-casa:~$ time ./prova +RTS -N1
63262367
real 0m17.652s
user 0m15.277s
sys 0m0.108s
paolino@paolino-casa:~$ time ./prova +RTS -N2
63262367
real 0m13.650s
user 0m15.121s
sys 0m0.188s
From the resource graph and the timings it is clear that the program is not able to use all the 2 cores powers, considering computing 'a' alone is about 7 seconds and 'b' alone 9.
What is retaining the cpu's to run full power ?
paolino
paolo.veronelli:
Firstly, a and b are identical, so GHC commons them up. The compiler> Hi, reading a previous thread I got interested.
> I simplified the example pointed by dons in
>
> import Control.Parallel
>
> main = a `par` b `pseq` print (a + b )
> where
> a = ack 3 11
> b = ack 3 11
>
> ack 0 n = n+1
> ack m 0 = ack (m-1) 1
> ack m n = ack (m-1) (ack m (n-1))
>
> compiled with
> ghc --make prova -O2 -threaded
>
> timings
> paolino@paolino-casa:~$ time ./prova +RTS -N1
> 32762
>
> real 0m7.031s
> user 0m6.304s
> sys 0m0.004s
> paolino@paolino-casa:~$ time ./prova +RTS -N2
> 32762
>
> real 0m6.997s
> user 0m6.728s
> sys 0m0.020s
> paolino@paolino-casa:~$
>
> without optimizations it gets worse
>
> paolino@paolino-casa:~$ time ./prova +RTS -N1
> 32762
>
> real 1m20.706s
> user 1m18.197s
> sys 0m0.104s
> paolino@paolino-casa:~$ time ./prova +RTS -N2
> 32762
>
> real 1m38.927s
> user 1m45.039s
> sys 0m0.536s
> paolino@paolino-casa:~$
>
> staring at the resource usage graph I can see it does use 2 cores when told to
> do it, but with -N1 the used cpu goes 100% and with -N2 they both run just over
> 50%
>
> thanks for comments
transforms it into:
a `par` a `seq` print (a + a)
So you essentially fork a spark to evaluate 'a', and then have the main
thread also evaluate 'a' again. One of them wins, then you add the
result to itself. The runtime may choose not to convert your first spark
into a thread.
Running with a 2009 GHC head snapshot, we can see with +RTS -sstderr
SPARKS: 1 (0 converted, 0 pruned)
That indeed, it doesn't convert your `par` into a real thread.
While, for example, the helloworld on the wiki:
http://haskell.org/haskellwiki/Haskell_in_5_steps
Converts 2 sparks to 2 theads:
SPARKS: 2 (2 converted, 0 pruned)
./B +RTS -threaded -N2 -sstderr 2.13s user 0.04s system 137% cpu 1.570 total
-- Don