
GHC seems to have a few bottlenecks once you start to really stress-test its I/O performance. Using a newer HEAD ghc actually gives less awful performance: scpmw@cslin209 ~/test $ ghc --make -O2 -threaded -rtsopts get.hs [1 of 1] Compiling Main ( get.hs, get.o ) Linking get ... scpmw@cslin209 ~/test $ ./get 1 10000 1 6.25696s scpmw@cslin209 ~/test $ ./get 1 10000 1 5.409605s scpmw@cslin209 ~/test $ ./get 2 10000 2 3.827393s scpmw@cslin209 ~/test $ ./get 2 10000 +RTS -N2 2 4.274985s scpmw@cslin209 ~/test $ ./get 3 10000 3 3.692725s scpmw@cslin209 ~/test $ ./get 3 10000 +RTS -N2 3 4.186283s scpmw@cslin209 ~/test $ ./get 3 10000 +RTS -N3 3 4.303649s That it still does not speed up might be the result of Haskell's internal implementation of non-blocking I/O. If I understand the situation correctly, all events are actually passing through one OS thread (the I/O manager) right now. That would explain nicely why you can't get more than single-core performance out of your program. In case you are interested in the details, search for the new "Scalable Event Handling for GHC" paper by Bryan O'Sullivan and Johan Tibell. Greetings, Peter Wortmann