warp vs scotty + some benchmark numbers

Hi,
I'm trying to make a small benchmarking for warp and scotty (later with
json/no-json text performance test). My client is a Qt c++ application. I
made a minimum code in both Haskell and C++. The problem is the numbers I'm
getting.
(Benchmark::pong-warp) - total requests: 10000 , send/received in msec:
3848
(Benchmark::pong-scotty) - total requests: 10000 , send/received in msec:
3814
Only sending 10k requests from my c++ client takes around 1 sec.
What I don't understand is:
a) how is it possible that scotty is so close to warp (in this run it even
wins)?
b) 10k requests I'm getting in approximately 4 seconds.
Herehttp://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchma...is
one 2 years old benchmark with 80k/second. Why is there so big
difference?
I'm running on macbook-pro 16g, 2.3 GHz
Scotty and warp is compiled with: -O2 -threaded
--scotty
import Web.Scotty
import Data.Monoid (mconcat)
main = scotty 3005 $ do
get "/" $ do
text "pong"
--warp
import Network.Wai(responseLBS, Application)
import Network.HTTP.Types(status200)
import Network.Wai.Handler.Warp (run)
app :: Application
app _ = return $ responseLBS
status200
[("Content-Type", "text/plain")]
"pong"
main :: IO ()
main = do
let port = 3000
putStrLn $ "Listening on port " ++ show port
run port app
--c++ client sample (Qt)
simStart = QDateTime().currentMSecsSinceEpoch();
for(int i=0; i

Several things... 1. The difference here isn't very big, so it's likely that the ~30ms difference to complete 10k requests is just noise. Scotty's overhead should really be pretty small 2. With GHC <= 7.6, -threaded is actually going to hurt you on Warp benchmarks that don't do much computational work. 7.8's new IO manager seems like is a different story from my benchmarks, but -threaded is basically a loss if you're not going to be doing any actual parallelism (which if it's just a pong benchmark, isn't really) 3. It's almost certainly the case that the bottleneck is in your client code. It's not trivial to write benchmark code and a fair amount of effort has gone into building such tools (httperf and wrk seem to be the best right now). Are you using Keep-Alive to recycle connections? Are you using events/threads on the client? Is your client HTTP parsing code fast? Are you spending a lot of time copying buffers around? The trick is to keep the servers constantly busy, and if they are waiting on getting the next request from the server, that's not going to be the case. For comparison, just yesterday I happened to run such a benchmark on Warp and get ~70k req/sec with a load of 2-3 client threads. I've gotten higher before. This was on my laptop with similar specs - Arch Linux, 3.4Ghz Intel i7, 16GB, all over localhost. I'd recommend testing out the benchmark with `wrk` and/or `httperf` (wrk is just easier to get going, but I've gotten good results with both). Cheers! Amit On 04/12/2014 02:22 PM, Miro Karpis wrote:
Hi, I'm trying to make a small benchmarking for warp and scotty (later with json/no-json text performance test). My client is a Qt c++ application. I made a minimum code in both Haskell and C++. The problem is the numbers I'm getting.
(Benchmark::pong-warp) - total requests: 10000 , send/received in msec: 3848
(Benchmark::pong-scotty) - total requests: 10000 , send/received in msec: 3814
Only sending 10k requests from my c++ client takes around 1 sec.
What I don't understand is: a) how is it possible that scotty is so close to warp (in this run it even wins)? b) 10k requests I'm getting in approximately 4 seconds. Here http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchma... is one 2 years old benchmark with 80k/second. Why is there so big difference?
I'm running on macbook-pro 16g, 2.3 GHz
Scotty and warp is compiled with: -O2 -threaded
--scotty import Web.Scotty import Data.Monoid (mconcat)
main = scotty 3005 $ do
get "/" $ do text "pong"
--warp import Network.Wai(responseLBS, Application) import Network.HTTP.Types(status200) import Network.Wai.Handler.Warp (run)
app :: Application app _ = return $ responseLBS status200 [("Content-Type", "text/plain")] "pong"
main :: IO () main = do let port = 3000 putStrLn $ "Listening on port " ++ show port run port app
--c++ client sample (Qt) simStart = QDateTime().currentMSecsSinceEpoch();
for(int i=0; i
get(QNetworkRequest(url)); QEventLoop eventLoop; QObject::connect(manager, SIGNAL(finished(QNetworkReply *)), &eventLoop, SLOT(quit())); eventLoop.exec(); QObject::disconnect(manager, SIGNAL(finished(QNetworkReply *)), &eventLoop, SLOT(quit())); } qDebug() << "(Benchmark::pong) - total requests: " << requests << ", send in msec: " << (QDateTime().currentMSecsSinceEpoch()-simStart); cheers, miro
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Apr 12, 2014 at 11:22 PM, Miro Karpis
Hi, I'm trying to make a small benchmarking for warp and scotty (later with json/no-json text performance test). My client is a Qt c++ application. I made a minimum code in both Haskell and C++. The problem is the numbers I'm getting.
If you're not running your Haskell program with "+RTS -A4M" (or for a newer
chip even larger, the "4M" should correspond to the size of your L3 cache),
please do so. The default of 512k is really too small for most processors
in use and will force the runtime into garbage collection before the L3
cache is even consumed. In my benchmarks this flag alone can give you a
remarkable improvement.
Also, a more fundamental issue: those other tests you mentioned are
measuring something different than you are. Those tests use a large number
of simultaneous client connections to simulate a busy server, i.e.
measuring throughput. Your test makes 10,000 connections serially: you're
measuring the server's latency.
G
--
Gregory Collins

Hi Miro, As Gregory pointed, you should use a web-benchmark tool rather than rolling your own (e.g., weighttp). If you intend to run benchmarks and play with many parameters, I'd recommend to use a framework to handle the experiments (I'm selling my magic potion here :P). I've wrapped the weighttp client to benchmark the mighty web servers in these Laborantin experiments: - https://github.com/lucasdicioccio/laborantin-bench-web
From the results I got on my server, mighty handles from ~8K req/s to ~50K req/s depending on the input parameters of the server and of the measuring client. I'm not bragging that my server is beefy, but I report these results to show that results vary a lot with the methodology. Hence, take care and explore many operating points =).
Feel free to contribute a Scotty / Warp wrapper (or wait until I find time
to make these myself).
Gregory, thanks for the -A4M tip, I wasn't aware of it. I'll patch my
experiments with an extra parameter too =).
Best,
--Lucas
2014-04-13 11:38 GMT+02:00 Gregory Collins
On Sat, Apr 12, 2014 at 11:22 PM, Miro Karpis
wrote: Hi, I'm trying to make a small benchmarking for warp and scotty (later with json/no-json text performance test). My client is a Qt c++ application. I made a minimum code in both Haskell and C++. The problem is the numbers I'm getting.
If you're not running your Haskell program with "+RTS -A4M" (or for a newer chip even larger, the "4M" should correspond to the size of your L3 cache), please do so. The default of 512k is really too small for most processors in use and will force the runtime into garbage collection before the L3 cache is even consumed. In my benchmarks this flag alone can give you a remarkable improvement.
Also, a more fundamental issue: those other tests you mentioned are measuring something different than you are. Those tests use a large number of simultaneous client connections to simulate a busy server, i.e. measuring throughput. Your test makes 10,000 connections serially: you're measuring the server's latency.
G -- Gregory Collins
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Amit Aryeh Levy
-
Gregory Collins
-
lucas di cioccio
-
Miro Karpis