Threading and Mullticore Computation

Hi, I tried a get into concurrent Haskell using multiple cores. The program below creates 2 task in different threads, executes them, synchronizes the threads using MVar () and calculates the time needed. import System.CPUTime import Control.Concurrent import Control.Concurrent.MVar myTask1 = do return $! fac 60000 print "Task1 done!" where fac 0 = 1 fac n = n * fac (n-1) myTask2 = do return $! fac' 60000 1 1 print "Task2 done!" where fac' n m p = if m>n then p else fac' n (m+1) (m*p) main = do mvar <- newEmptyMVar pico1 <- getCPUTime forkIO (myTask1 >> putMVar mvar ()) myTask2 takeMVar mvar pico2 <- getCPUTime print (pico2 - pico1) I compiled the code using $ ghc FirstFork.hs -threaded and executed it by $ main +RTS -N1 resp. $ main +RTS -N2 I use GHC 6.8.3 on Vista with an Intel Dual Core processor. Instead of getting a speed up when using 2 cores I get as significant slow down, even though there is no sharing in my code above (at least none I am aware of. BTW, that was reason that I use 2 different local factorial functions). On my computer the 1-core version takes about 8.3sec and the 2-core version 12.8sec. When I increase the numbers from 60000 to 100000 the time difference gets even worse (30sec vs 51 sec). Can anybody give me an idea what I am doing wrong? Thanks, Michael

Hello mwinter, Tuesday, March 3, 2009, 8:09:21 PM, you wrote:
anybody give me an idea what I am doing wrong?
1. add -O2 to compile command 2. add +RTS -s to run commands your program execution time may be dominated by GCs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication). On 3 Mar 2009 at 20:21, Bulat Ziganshin wrote:
Hello mwinter,
Tuesday, March 3, 2009, 8:09:21 PM, you wrote:
anybody give me an idea what I am doing wrong?
1. add -O2 to compile command 2. add +RTS -s to run commands
your program execution time may be dominated by GCs
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, Mar 3, 2009 at 5:31 PM,
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
Might there not be contention in the allocator/GC that's worsened by having two threads? What happens with -O2? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

It gets a bit faster in general but the problem remains. I have two threads in both runs, once I use 1 core and then 2 cores. The second run is much slower. On 3 Mar 2009 at 17:32, Sebastian Sylvan wrote:
On Tue, Mar 3, 2009 at 5:31 PM,
wrote: In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication). Might there not be contention in the allocator/GC that's worsened by having two threads? What happens with -O2? -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

On 2009 Mar 3, at 12:31, mwinter@brocku.ca wrote:
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
The same GCs are done, but GC has to be done on a single core (currently; parallel GC is in development) so you will see a lot more lock contention when the GC kicks in. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

allbery:
On 2009 Mar 3, at 12:31, mwinter@brocku.ca wrote:
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
The same GCs are done, but GC has to be done on a single core (currently; parallel GC is in development) so you will see a lot more lock contention when the GC kicks in.
Assuming he is using GHC 6.10, the parallel GC is enabled by default when you use -Nn where n > 1. That's is -N4 will use -g4 (4 cores to collect). So GC should be the same or a little faster. -- Don

On Tue, Mar 3, 2009 at 6:41 PM, Don Stewart
allbery:
On 2009 Mar 3, at 12:31, mwinter@brocku.ca wrote:
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
The same GCs are done, but GC has to be done on a single core (currently; parallel GC is in development) so you will see a lot more lock contention when the GC kicks in.
Assuming he is using GHC 6.10, the parallel GC is enabled by default when you use -Nn where n > 1. That's is -N4 will use -g4 (4 cores to collect). So GC should be the same or a little faster.
Further, GHC (6.10 at least) uses one allocation area per thread, meaning there's no contention on allocation. I'd echo the request to try it with -O2, though.

I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run is still much slower that the 1 core version. Will switching to 6.10 make the difference? On 3 Mar 2009 at 18:46, Svein Ove Aas wrote:
On Tue, Mar 3, 2009 at 6:41 PM, Don Stewart
wrote: allbery:
On 2009 Mar 3, at 12:31, mwinter@brocku.ca wrote:
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
The same GCs are done, but GC has to be done on a single core (currently; parallel GC is in development) so you will see a lot more lock contention when the GC kicks in.
Assuming he is using GHC 6.10, the parallel GC is enabled by default when you use -Nn where n > 1. That's is -N4 will use -g4 (4 cores to collect). So GC should be the same or a little faster.
Further, GHC (6.10 at least) uses one allocation area per thread, meaning there's no contention on allocation.
I'd echo the request to try it with -O2, though. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 2009 Mar 3, at 12:54, mwinter@brocku.ca wrote:
I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run is still much slower that the 1 core version. Will switching to 6.10 make the difference?
If GC contention is the issue, it should. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Brandon S. Allbery KF8NH wrote:
On 2009 Mar 3, at 12:54, mwinter@brocku.ca wrote:
I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run is still much slower that the 1 core version. Will switching to 6.10 make the difference?
If GC contention is the issue, it should.
I just tried it with GHC 6.10.1. Two capabilities is still slower. (See attachments. Compiled with -O2 -threaded.) In both cases, GC time is miniscule. [("GHC RTS", "Yes") ,("GHC version", "6.10.1") ,("RTS way", "rts_thr") ,("Host platform", "i386-unknown-mingw32") ,("Build platform", "i386-unknown-mingw32") ,("Target platform", "i386-unknown-mingw32") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ] Cores1 +RTS -N1 -s 16,918,324 bytes allocated in the heap 1,055,836 bytes copied during GC 1,005,356 bytes maximum residency (1 sample(s)) 29,760 bytes maximum slop 1260 MB total memory in use (112 MB lost due to fragmentation) Generation 0: 32 collections, 0 parallel, 0.03s, 0.03s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed Task 0 (worker) : MUT time: 2.53s ( 5.11s elapsed) GC time: 0.02s ( 0.02s elapsed) Task 1 (worker) : MUT time: 0.00s ( 5.11s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 2 (worker) : MUT time: 2.30s ( 5.11s elapsed) GC time: 0.02s ( 0.02s elapsed) INIT time 0.02s ( 0.00s elapsed) MUT time 4.83s ( 5.11s elapsed) GC time 0.03s ( 0.03s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.88s ( 5.14s elapsed) %GC time 0.6% (0.6% elapsed) Alloc rate 3,492,815 bytes per MUT second Productivity 99.0% of total user, 93.9% of total elapsed recordMutableGen_sync: 0 gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 0 gen[1].steps[0].sync_large_objects: 0 Cores1 +RTS -N2 -s 16,926,532 bytes allocated in the heap 1,243,560 bytes copied during GC 794,980 bytes maximum residency (2 sample(s)) 12,012 bytes maximum slop 1927 MB total memory in use (160 MB lost due to fragmentation) Generation 0: 23 collections, 8 parallel, 0.00s, 0.00s elapsed Generation 1: 2 collections, 0 parallel, 0.02s, 0.02s elapsed Parallel GC work balance: 1.00 (1267 / 1267, ideal 2) Task 0 (worker) : MUT time: 0.00s ( 0.00s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 1 (worker) : MUT time: 3.63s ( 4.67s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 2 (worker) : MUT time: 0.00s ( 4.67s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 3 (worker) : MUT time: 3.42s ( 4.67s elapsed) GC time: 0.02s ( 0.02s elapsed) Task 4 (worker) : MUT time: 0.00s ( 4.67s elapsed) GC time: 0.00s ( 0.00s elapsed) INIT time 0.02s ( 0.00s elapsed) MUT time 7.05s ( 4.67s elapsed) GC time 0.02s ( 0.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.08s ( 4.69s elapsed) %GC time 0.2% (0.3% elapsed) Alloc rate 2,396,677 bytes per MUT second Productivity 99.6% of total user, 150.3% of total elapsed recordMutableGen_sync: 0 gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 0 gen[1].steps[0].sync_large_objects: 0

I feel the need to point something out here. Both for me and Andrew, the program tops out at allocating ~22MB of memory - in total, over its whole run. Why, then, is max heap size over a gigabyte?

Hello Andrew, Tuesday, March 3, 2009, 9:21:42 PM, you wrote:
I just tried it with GHC 6.10.1. Two capabilities is still slower. (See attachments. Compiled with -O2 -threaded.)
i don't think so: Total time 4.88s ( 5.14s elapsed) Total time 7.08s ( 4.69s elapsed) so with 1 thread wall clock time is 5 seconds, with 2 thread wall time is 4.7 seconds cpu time spent increased with 2 threads - this indicates that you either use hyperthreaded/SMT-capable cpu or speed is limited by memory access operations so, my conclusion - this benchmark limited by memory latencies so it cannot be efficiently multithreaded -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

So, apparently the 6.10.1 code runs amiss of this bug: http://hackage.haskell.org/trac/ghc/ticket/2747 I'll be upgrading to HEAD now. If no-one else gets around to it first, I'll probably post some more benchmarks afterwards.

Bulat Ziganshin wrote:
Hello Andrew,
Tuesday, March 3, 2009, 9:21:42 PM, you wrote:
I just tried it with GHC 6.10.1. Two capabilities is still slower. (See attachments. Compiled with -O2 -threaded.)
i don't think so:
Total time 4.88s ( 5.14s elapsed)
Total time 7.08s ( 4.69s elapsed)
Damnit. Foiled again! It turns out Process Explorer is reporting CPU time, not wall time. Sorry about that... (This is the second time I've tripped over that one. There doesn't seem to be a way to get it to report wall time either, unfortunately.)
so with 1 thread wall clock time is 5 seconds, with 2 thread wall time is 4.7 seconds
So a small speedup then.
so, my conclusion - this benchmark limited by memory latencies so it cannot be efficiently multithreaded
Probably.

Hello, IMO, the conclusion about instant cache misses due to several threads sharing memory and/or performing large memory consumption is very highly probable, especially on Intel CPUs with shared L2 cache. I have several examples, where threading means significant time consumption increase (<new time> = <number of threads> * <old time>). My personal conclusion - use linear recursive functions only (so that they could be optimized), Int instead of Integer, if possible, no data structure traversal (unless such a structure is very small, L2 caches are several MBs only). Such a way cache misses are minimized for both/all threads. Moreover, OS needs some time instantly (=> cache refill/misses), thus, I've devoted one core for OS, others for computation (quad core), which brings certain improvement and more accurate measurements. Regards Dusan Bulat Ziganshin wrote:
Hello Andrew,
Tuesday, March 3, 2009, 9:21:42 PM, you wrote:
I just tried it with GHC 6.10.1. Two capabilities is still slower. (See attachments. Compiled with -O2 -threaded.)
i don't think so:
Total time 4.88s ( 5.14s elapsed)
Total time 7.08s ( 4.69s elapsed)
so with 1 thread wall clock time is 5 seconds, with 2 thread wall time is 4.7 seconds
cpu time spent increased with 2 threads - this indicates that you either use hyperthreaded/SMT-capable cpu or speed is limited by memory access operations
so, my conclusion - this benchmark limited by memory latencies so it cannot be efficiently multithreaded

On Tue, Mar 3, 2009 at 6:54 PM,
I am using GHC 6.8.3. The -O2 option made both runs faster but the 2 core run is still much slower that the 1 core version. Will switching to 6.10 make the difference?
There are a lot of improvements; it's certainly worth a try. For what it's worth, I tried it myself on 6.10.. details follow, but overall impression is that while you lose some time to overhead, it's still 50% faster than unthreaded. While trying to optimize it, I ran "./test +RTS -N2 -H64m -M64m"; the program promptly ate all my memory, invoking the OOM killer and messing up my system something fierce. This has to be a bug. GC time only accounts for 10% of the time used, but as I read these, the parallell GC didn't do any good. ..I'm stumped. ==== time ./test +RTS -N1 -s ==== "Task1 done!" "Task2 done!" 5750000000000 22,712,520 bytes allocated in the heap 2,982,440 bytes copied during GC 1,983,288 bytes maximum residency (2 sample(s)) 30,208 bytes maximum slop 636 MB total memory in use (58 MB lost due to fragmentation) Generation 0: 42 collections, 0 parallel, 0.12s, 0.13s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.01s elapsed Task 0 (worker) : MUT time: 2.85s ( 6.09s elapsed) GC time: 0.07s ( 0.08s elapsed) Task 1 (worker) : MUT time: 0.00s ( 6.09s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 2 (worker) : MUT time: 2.66s ( 6.09s elapsed) GC time: 0.05s ( 0.06s elapsed) INIT time 0.00s ( 0.00s elapsed) MUT time 4.78s ( 6.09s elapsed) GC time 0.12s ( 0.14s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.81s ( 6.23s elapsed) %GC time 2.5% (2.3% elapsed) Alloc rate 4,842,754 bytes per MUT second Productivity 97.5% of total user, 75.3% of total elapsed recordMutableGen_sync: 0 gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 0 gen[1].steps[0].sync_large_objects: 0 real 0m6.319s user 0m4.810s sys 0m0.920s ==== time ./test +RTS -N2 -s ==== "Task2 done!" "Task1 done!" 6860000000000 22,734,040 bytes allocated in the heap 2,926,160 bytes copied during GC 1,976,240 bytes maximum residency (2 sample(s)) 117,584 bytes maximum slop 1234 MB total memory in use (107 MB lost due to fragmentation) Generation 0: 32 collections, 13 parallel, 0.47s, 0.43s elapsed Generation 1: 2 collections, 0 parallel, 0.01s, 0.01s elapsed Parallel GC work balance: 1.00 (4188 / 4188, ideal 2) Task 0 (worker) : MUT time: 0.00s ( 0.00s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 1 (worker) : MUT time: 0.00s ( 0.00s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 2 (worker) : MUT time: 3.10s ( 3.82s elapsed) GC time: 0.09s ( 0.05s elapsed) Task 3 (worker) : MUT time: 2.96s ( 3.82s elapsed) GC time: 0.39s ( 0.39s elapsed) Task 4 (worker) : MUT time: 0.00s ( 3.82s elapsed) GC time: 0.00s ( 0.00s elapsed) INIT time 0.00s ( 0.00s elapsed) MUT time 5.23s ( 3.82s elapsed) GC time 0.48s ( 0.44s elapsed) EXIT time 0.01s ( 0.00s elapsed) Total time 5.72s ( 4.26s elapsed) %GC time 8.4% (10.4% elapsed) Alloc rate 4,338,557 bytes per MUT second Productivity 91.6% of total user, 123.0% of total elapsed recordMutableGen_sync: 0 gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 0 gen[1].steps[0].sync_large_objects: 0 real 0m4.345s user 0m5.680s sys 0m1.250s

Svein Ove Aas wrote:
For what it's worth, I tried it myself on 6.10.. details follow, but overall impression is that while you lose some time to overhead, it's still 50% faster than unthreaded.
Damn. Somebody beat me to it. :-)
While trying to optimize it, I ran "./test +RTS -N2 -H64m -M64m"; the program promptly ate all my memory, invoking the OOM killer and messing up my system something fierce. This has to be a bug.
I should point out that approximately 50% of the time, the -N2 version exits with "Cores1: out of memory" rather than running to completion. The -N1 version never does this. I hadn't looked at RAM usage, but it does appear that both programs use... rather a lot of this. (Measurable in gigabytes.) Space leak, anyone? (Presumably in fac or fac'.)

andrewcoppin:
Svein Ove Aas wrote:
For what it's worth, I tried it myself on 6.10.. details follow, but overall impression is that while you lose some time to overhead, it's still 50% faster than unthreaded.
On a quad core, ghc 6.10 snapshot from today: Single threaded whirlpool$ ghc-6.10.1.20090302 -O2 A.hs --make -fforce-recomp [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... whirlpool$ time ./A "Task2 done!" "Task1 done!" 4000249000000 ./A 3.99s user 0.01s system 99% cpu 4.001 total -threaded, with various N whirlpool$ ghc-6.10.1.20090302 -O2 A.hs -threaded --make [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... N=1 whirlpool$ time ./A +RTS -N1 -sstderr ./A +RTS -N1 -sstderr "Task2 done!" "Task1 done!" 5908369000000 6,468,629,288 bytes allocated in the heap 128,647,752 bytes copied during GC 1,996,320 bytes maximum residency (563 sample(s)) 426,512 bytes maximum slop 7 MB total memory in use (1 MB lost due to fragmentation) %GC time 61.0% (62.1% elapsed) ^^^^^^^^^^^^^^^^^^^^^ Alloc rate 2,699,611,953 bytes per MUT second Productivity 39.0% of total user, 39.8% of total elapsed ./A +RTS -N1 -sstderr 6.14s user 0.06s system 102% cpu 6.016 total So 61% of time spent in GC. N=2 whirlpool$ time ./A +RTS -N2 -sstderr ./A +RTS -N2 -sstderr "Task2 done!" "Task1 done!" 6360397000000 6,511,269,512 bytes allocated in the heap 3,684,592 bytes copied during GC 1,566,800 bytes maximum residency (3 sample(s)) 34,496 bytes maximum slop 5 MB total memory in use (1 MB lost due to fragmentation) %GC time 43.1% (63.5% elapsed) Alloc rate 1,384,112,532 bytes per MUT second Productivity 56.9% of total user, 82.8% of total elapsed ./A +RTS -N2 -sstderr 8.26s user 0.09s system 146% cpu 5.681 total Getting rid of the space leaky version of fac: whirlpool$ time ./A +RTS -N2 -H50M -sstderr ./A +RTS -N2 -H50M -sstderr "Task1 done!" "Task2 done!" 5700355000000 6,512,828,504 bytes allocated in the heap 1,224,488 bytes copied during GC 6,656 bytes maximum residency (1 sample(s)) 116,136 bytes maximum slop 50 MB total memory in use (1 MB lost due to fragmentation) %GC time 60.6% (76.4% elapsed) Alloc rate 2,778,330,289 bytes per MUT second Productivity 39.4% of total user, 49.5% of total elapsed ./A +RTS -N2 -H50M -sstderr 6.30s user 0.42s system 141% cpu 4.737 total I'm not sure there's anything weird going on here, other than just naive implementations of factorial making my cores hot.

Hello mwinter, Tuesday, March 3, 2009, 8:31:12 PM, you wrote: not same :) when you perform two computations at the same time, you have 2x more memory allocated that means that each GC will need more time. and don't forget that GC is single-threaded
In both runs the same computations are done (sequentially resp. parallel), so the gc should be the same. But still using 2 cores is much slower than using 1 core (same program - no communication).
On 3 Mar 2009 at 20:21, Bulat Ziganshin wrote:
Hello mwinter,
Tuesday, March 3, 2009, 8:09:21 PM, you wrote:
anybody give me an idea what I am doing wrong?
1. add -O2 to compile command 2. add +RTS -s to run commands
your program execution time may be dominated by GCs
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

mwinter:
Hi,
I tried a get into concurrent Haskell using multiple cores. The program below creates 2 task in different threads, executes them, synchronizes the threads using MVar () and calculates the time needed.
import System.CPUTime import Control.Concurrent import Control.Concurrent.MVar
myTask1 = do return $! fac 60000 print "Task1 done!" where fac 0 = 1 fac n = n * fac (n-1)
myTask2 = do return $! fac' 60000 1 1 print "Task2 done!" where fac' n m p = if m>n then p else fac' n (m+1) (m*p)
main = do mvar <- newEmptyMVar pico1 <- getCPUTime forkIO (myTask1 >> putMVar mvar ()) myTask2 takeMVar mvar pico2 <- getCPUTime print (pico2 - pico1)
I compiled the code using $ ghc FirstFork.hs -threaded and executed it by $ main +RTS -N1 resp. $ main +RTS -N2 I use GHC 6.8.3 on Vista with an Intel Dual Core processor. Instead of getting a speed up when using 2 cores I get as significant slow down, even though there is no sharing in my code above (at least none I am aware of. BTW, that was reason that I use 2 different local factorial functions). On my computer the 1-core version takes about 8.3sec and the 2-core version 12.8sec. When I increase the numbers from 60000 to 100000 the time difference gets even worse (30sec vs 51 sec). Can anybody give me an idea what I am doing wrong?
If you just want to check that your machine can do multicore, here's the "hello world" I've been using: import Control.Parallel main = a `par` b `par` c `pseq` print (a + b + c) where a = ack 3 10 b = fac 42 c = fib 34 fac 0 = 1 fac n = n * fac (n-1) ack 0 n = n+1 ack m 0 = ack (m-1) 1 ack m n = ack (m-1) (ack m (n-1)) fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) To be run as: $ ghc -O2 -threaded --make hello.hs [1 of 1] Compiling Main ( hello.hs, hello.o ) Linking hello ... $ time ./hello +RTS -N2 1405006117752879898543142606244511569936384005711076 ./hello +RTS -N2 2.29s user 0.01s system 152% cpu 1.505 total ^^^^ -- Don
participants (8)
-
Andrew Coppin
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
Don Stewart
-
Dušan Kolář
-
mwinter@brocku.ca
-
Sebastian Sylvan
-
Svein Ove Aas