code review? store server, 220loc.

Anyone interested in critiquing some code? I'm looking for ideas for making it faster and/or simpler: http://www.thenewsh.com/%7Enewsham/store/Server5.hs This is an exercise to see how well a server in Haskell would perform. My goals are roughly: - retargetability to other server types (ie. easy to replace request and response structures and business logic). - readability. - performance. My measurements show that a simple dummy server (accept, forkio, recv byte) handles roughly 7500 requests/connects per second, the server/client that do real messages do about 4500 req and connections per second. If all requests are on the same connection one after another it does about 13500 requests/second. For comparisons, a C ping-pong server does about 3600/second if it has to fork for each new connection/request, and about 35000/sec if its all on the same connection. So it seems at least competitive with a forking C server. I havent tested threaded C servers. Tim Newsham http://www.thenewsh.com/~newsham/

newsham:
Anyone interested in critiquing some code? I'm looking for ideas for making it faster and/or simpler:
http://www.thenewsh.com/%7Enewsham/store/Server5.hs
This is an exercise to see how well a server in Haskell would perform. My goals are roughly: - retargetability to other server types (ie. easy to replace request and response structures and business logic). - readability. - performance.
My measurements show that a simple dummy server (accept, forkio, recv byte) handles roughly 7500 requests/connects per second, the server/client that do real messages do about 4500 req and connections per second. If all requests are on the same connection one after another it does about 13500 requests/second. For comparisons, a C ping-pong server does about 3600/second if it has to fork for each new connection/request, and about 35000/sec if its all on the same connection. So it seems at least competitive with a forking C server. I havent tested threaded C servers.
packBS :: String -> B.ByteString packBS = B.pack . map (toEnum.fromEnum) -- | Convert a bytestring to a string. unpackBS :: B.ByteString -> String unpackBS = map (toEnum.fromEnum) . B.unpack are Data.ByteString.Char8.pack/unpack. What optimisation and runtime flags did you use (-threaded or not?)

newsham:
Anyone interested in critiquing some code? I'm looking for ideas for making it faster and/or simpler:
What optimisation and runtime flags did you use (-threaded or not?)
currently "ghc -O --make $< -o $@". For some measurements I tried -threaded which seemed to have a slight slowdown and +RTS -N2 at runtime which didnt seem to help. I also ran some tests with multiple clients running concurrently, and it seemed to handle them at approximately the same rate with and without the threaded and RTS flags (on an amd64x2). Those flags should have allowed it to handle two client connections (via forkIO) concurrently, right? [ps: the same web directory has test clients and other versions of the server.] Tim Newsham http://www.thenewsh.com/~newsham/

On Sat, 2008-08-02 at 19:13 -1000, Tim Newsham wrote:
My measurements show that a simple dummy server (accept, forkio, recv byte) handles roughly 7500 requests/connects per second, the server/client that do real messages do about 4500 req and connections per second. If all requests are on the same connection one after another it does about 13500 requests/second. For comparisons, a C ping-pong server does about 3600/second if it has to fork for each new connection/request, and about 35000/sec if its all on the same connection. So it seems at least competitive with a forking C server. I havent tested threaded C servers.
What kind of performance do you actually need? Can your network connection actually sustain the bandwidth of your synthetic benchmarks? For reference, I've got a demo HAppS-based server which handles reasonably high load pretty well I think: (tested using apache-bench, loopback interface, amd64 @ 2.2GHz) With cached content 450k: with 1 concurrent client: 760 requests per second ~1ms latency 34Mb/s with 100 concurrent clients: 1040 requests per second ~90ms latency 46Mb/s on a non-cached generated-on-demand 3k page: with 1 concurrent client: 280 requests per second ~4ms latency 900Kb/s bandwidth with 100 concurrent clients: 240 requests per second ~400ms latency 750Kb/s bandwidth Using http keep-alive boots requests per sec by ~20% Obviously this is testing with a loopback network. My point is, it's serving at a rather higher rate than most real network connections I could buy (except local ethernet networks). Duncan

What kind of performance do you actually need? Can your network connection actually sustain the bandwidth of your synthetic benchmarks?
This is just an exercise at the moment, so no particular performance goal beyond "how fast can it go".
(tested using apache-bench, loopback interface, amd64 @ 2.2GHz) With cached content 450k: with 1 concurrent client: 760 requests per second ~1ms latency 34Mb/s [...] Obviously this is testing with a loopback network. My point is, it's serving at a rather higher rate than most real network connections I could buy (except local ethernet networks).
My requests and responses are fairly small, a typical request is 29bytes and a typical response is 58bytes. If you just count these payloads its about 390kB/sec for 4.5kreq/sec and 1.2MB/sec for 13.5kreq/sec (of course its more like double these as the TCP overhead will be about the same size). Anyway, with such small sizes, the performance shouldn't be limited by the bandwidth (I dont think). If this was a back-end storage server, the network probably wouldn't be the limiting factor.
Duncan
Tim Newsham http://www.thenewsh.com/~newsham/

On Sat, Aug 2, 2008 at 10:13 PM, Tim Newsham
Anyone interested in critiquing some code? I'm looking for ideas for making it faster and/or simpler:
The code looks fairly reasonable, although most of your strictness annotations are unlikely to do much (particularly those on String fields). You should try profiling this. I can see a few possible problems (such as reading String from a socket, instead of a ByteString), but it's difficult to predict what might be causing your code to be so slow. Haskell code ought to be much more competitive with C for an application like this.

You should try profiling this. I can see a few possible problems (such as reading String from a socket, instead of a ByteString), but it's difficult to predict what might be causing your code to be so slow. Haskell code ought to be much more competitive with C for an application like this.
I haven't tried profiling yet, I should do that (ugh, will prob have to rebuild some libraries I'm using to support that). Anyway, I haven't yet used any ByteString IO functions. I ran some tests when I was starting and it seems that using Handle IO functions was a bit slower than using the Socket IO functions directly. It looks like there are a bunch of Handle IO functions that can return ByteStrings but I don't see any for sockets... Are there any? If not, I suspect it will be a wash switching to Handles and ByteStrings (especially since most of my requests and responses are quite small). I'll write up some tests to see how well those perform.. Tim Newsham http://www.thenewsh.com/~newsham/

On 2008 Aug 4, at 23:45, Tim Newsham wrote:
Anyway, I haven't yet used any ByteString IO functions. I ran some tests when I was starting and it seems that using Handle IO functions was a bit slower than using the Socket IO functions directly. It looks like there are a bunch of Handle IO functions that can return ByteStrings but I don't see any for sockets... Are there any?
There's something on hackage... http://hackage.haskell.org/cgi-bin/hackage-scripts/package/network-bytestrin... -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Sat, Aug 2, 2008 at 10:13 PM, Tim Newsham
wrote: You should try profiling this. I can see a few possible problems (such as reading String from a socket, instead of a ByteString), but it's difficult to predict what might be causing your code to be so slow. Haskell code ought to be much more competitive with C for an application like this.
Profiling didn't turn up anything obvious: http://www.thenewsh.com/~newsham/store/Server9.prof one thing I dont quite understand is that it seems to be crediting more time to "put8" and "get8" than is warranted, perhaps all of the "get" and "put" functions... One suprising result from testing: http://www.thenewsh.com/~newsham/store/TestBin.hs shows that the Data.Binary marshalling is actually very fast, but when I want to add a length field at the start of the buffer it has a huge impact on performance. I've tried several variations without much luck... (also the third test case is very odd in that skipping a conversion to strict bytestring actually makes it slower(!)). soo... I think thats probably one of the things limiting my performance. Another one is probably my need to do two reads (one for length one for the data) for each received record... I think with some trickiness I could get around that (since recv will return only as many bytes as are immediately available) but I don't know that its worth the effort right now.... Anyway, any thoughts, especially on how to make the bytestring operations faster, would be appreciated. Tim Newsham
participants (5)
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Don Stewart
-
Duncan Coutts
-
Tim Newsham