Hi, reading a previous thread I got interested.
I simplified the example pointed by dons in

import Control.Parallel
 
main = a `par` b  `pseq` print (a + b )
    where
        a = ack 3 11
        b = ack 3 11
   
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

compiled with
ghc --make prova  -O2 -threaded

timings
paolino@paolino-casa:~$ time ./prova +RTS -N1
32762

real    0m7.031s
user    0m6.304s
sys    0m0.004s
paolino@paolino-casa:~$ time ./prova +RTS -N2
32762

real    0m6.997s
user    0m6.728s
sys    0m0.020s
paolino@paolino-casa:~$

without optimizations it gets worse

paolino@paolino-casa:~$ time ./prova +RTS -N1
32762

real    1m20.706s
user    1m18.197s
sys    0m0.104s
paolino@paolino-casa:~$ time ./prova +RTS -N2
32762

real    1m38.927s
user    1m45.039s
sys    0m0.536s
paolino@paolino-casa:~$

staring at the resource usage graph I can see it does use 2 cores when told to do it, but with -N1 the used cpu goes 100% and with -N2 they both run just over 50%

thanks for comments

paolino