
So I dug a bit deeper (trying N different ways to write the code,
ThreadScope, etc.) and I have some good news some bad news.
The good news is that I figured out how to fix it, and it's an "easy" fix.
The bad news is that the fix is to compile it with GHC HEAD (~7.8, I built
ec4af3f), which has the new Mio high-performance multicore IO manager [1].
Apparently the old IO manager wasn't well suited to this use case.
With the latest GHC, the -threaded version outperforms the single threaded
version, and the numbers for both are better than with GHC 7.6.3.
I've updated the gist with a README that has some more details:
https://gist.github.com/etrepum/8312165
[1]
http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf
-bob
On Wed, Jan 8, 2014 at 7:50 AM, Bob Ippolito
I haven't yet been able to sort out the performance difference, and probably won't have time to dig in deeper with profiling tools today. I've put together a gist with all of the files and a cabal file so it's easy to build with the right options.
https://gist.github.com/etrepum/8312165
On Tue, Jan 7, 2014 at 7:47 PM, Bob Ippolito
wrote: On Tuesday, January 7, 2014, Branimir Maksimovic wrote:
On 01/08/2014 04:15 AM, Bob Ippolito wrote:
On Tuesday, January 7, 2014, Branimir Maksimovic wrote:
On 01/07/2014 09:39 PM, Bob Ippolito wrote:
Here's a much simpler implementation for that sort of pattern, using channels to fan out work to threads. I added a dependency on Criterion because getCPUTime is basically useless for this kind of measurement on Mac OS X since it doesn't include the time that the process spent waiting on IO:
Great, thank you very much. You gave me material for learning ;) However, my version is significantly faster when compiling without -threaded. With -threaded option, your version is much faster than mine, but both are significantly slower then compile without -threaded.
Happy to look into it, I didn't have time today to do benchmarks (and Mac OS X is the worst platform to do this kind of testing on regardless of language, its network stack is inconsistent at best). I need to know more:
How exactly are you compiling?
ghc-7.6.3 --make -O2 client.hs
Which OS?
Ubuntu 13.10 with 3.13-rc7 kernel.
What version of GHC and Haskell Platform?
bmaxa@maxa:~$ apt-cache policy haskell-platform haskell-platform: Installed: 2013.2.0.0 Candidate: 2013.2.0.0 Version table: *** 2013.2.0.0 0 500 http://archive.ubuntu.com/ubuntu/ saucy/universe amd64 Packages 100 /var/lib/dpkg/status
What is the exact command line you execute it with?
time ./client maxa 5055 1000 100000
What happens if you add +RTS -N to the end of that command line?
What timings do you get?
with your version: real 0m4.235s user 0m1.589s sys 0m2.642s
with my version real 0m3.010s user 0m0.590s sys 0m2.417s that is, of course, without -threaded
What's the code for the server are you connecting to?
import Network (listenOn,PortID(..)) import Network.Socket (accept,close) import Network.Socket.ByteString import System.Environment import Control.Concurrent (forkIO)
main = do n <- getArgs let nn = (read.head) n :: Int putStrLn $ "Listening on " ++ show nn sock <- listenOn $ PortNumber $ fromIntegral nn serve sock
serve sock = do (s,_) <- accept sock forkIO $ process s serve sock
process sock = do buf <- recv sock 1024 sendAll sock buf close sock
Loopback, local network, or internet?
loopback. Testing is on same computer.
Thanks!
Great, I'll look closer later tonight or tomorrow morning.