A better test program

import Control.Parallel
 
main = a `par` b  `pseq` print (a + b )
    where
        a = ack 3 11
        b = fib 39
   
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

running it , these are the results

paolino@paolino-casa:~$ ghc --make prova   -threaded
[1 of 1] Compiling Main             ( prova.hs, prova.o )
Linking prova ...
paolino@paolino-casa:~$ time ./prova +RTS -N1
63262367

real    1m17.485s
user    1m16.473s
sys    0m0.392s
paolino@paolino-casa:~$ time ./prova +RTS -N2
63262367

real    1m20.186s
user    1m31.554s
sys    0m0.600s
paolino@paolino-casa:~$ touch prova.hs
paolino@paolino-casa:~$ ghc --make prova -O2  -threaded
[1 of 1] Compiling Main             ( prova.hs, prova.o )
Linking prova ...
paolino@paolino-casa:~$ time ./prova +RTS -N1
63262367

real    0m17.652s
user    0m15.277s
sys    0m0.108s
paolino@paolino-casa:~$ time ./prova +RTS -N2
63262367

real    0m13.650s
user    0m15.121s
sys    0m0.188s

From the resource graph and the timings it is clear that the program is not able to use all the 2 cores powers, considering computing 'a' alone is about 7 seconds and 'b' alone 9.
What is retaining the cpu's to run full power ?


paolino

2009/8/21 Don Stewart <dons@galois.com>
paolo.veronelli:
> Hi, reading a previous thread I got interested.
> I simplified the example pointed by dons in
>
> import Control.Parallel
>
> main = a `par` b  `pseq` print (a + b )
>     where
>         a = ack 3 11
>         b = ack 3 11
>
> ack 0 n = n+1
> ack m 0 = ack (m-1) 1
> ack m n = ack (m-1) (ack m (n-1))
>
> compiled with
> ghc --make prova  -O2 -threaded
>
> timings
> paolino@paolino-casa:~$ time ./prova +RTS -N1
> 32762
>
> real    0m7.031s
> user    0m6.304s
> sys    0m0.004s
> paolino@paolino-casa:~$ time ./prova +RTS -N2
> 32762
>
> real    0m6.997s
> user    0m6.728s
> sys    0m0.020s
> paolino@paolino-casa:~$
>
> without optimizations it gets worse
>
> paolino@paolino-casa:~$ time ./prova +RTS -N1
> 32762
>
> real    1m20.706s
> user    1m18.197s
> sys    0m0.104s
> paolino@paolino-casa:~$ time ./prova +RTS -N2
> 32762
>
> real    1m38.927s
> user    1m45.039s
> sys    0m0.536s
> paolino@paolino-casa:~$
>
> staring at the resource usage graph I can see it does use 2 cores when told to
> do it, but with -N1 the used cpu goes 100% and with -N2 they both run just over
> 50%
>
> thanks for comments


Firstly, a and b are identical, so GHC commons them up. The compiler
transforms it into:

   a `par` a `seq` print (a + a)

So you essentially fork a spark to evaluate 'a', and then have the main
thread also evaluate 'a' again. One of them wins, then you add the
result to itself. The runtime may choose not to convert your first spark
into a thread.

Running with a 2009 GHC head snapshot, we can see with +RTS -sstderr

     SPARKS: 1 (0 converted, 0 pruned)

That indeed, it doesn't convert your `par` into a real thread.

While, for example, the helloworld on the wiki:

   http://haskell.org/haskellwiki/Haskell_in_5_steps

Converts 2 sparks to 2 theads:

   SPARKS: 2 (2 converted, 0 pruned)
   ./B +RTS -threaded -N2 -sstderr  2.13s user 0.04s system 137% cpu 1.570 total

-- Don