
Hi all, I've recently switched my "Hums" UPnP server over to using WAI/Warp, but I'm seeing quite high CPU usage compared to monadic I/O. (1-7% with WAI as opposed to 0-1% for monadic). (Just to be clear: The server is using a constant amount of memory, so it's definitely not leaking or anything like that.) I think the source of the efficiency problem is that I'm enumerating strict ByteStrings from a file, but the WAI response enumerator requires Builder chunks, so I'm forced to use "fromByteString" to convert between the strict ByteString representation and the Builder representation -- I'm assuming this performs a memory copy. Is there any way to avoid this extra "fromByteString"? Other than this little issue, WAI seems to be working very well and it seems like a great fit if you just want low-level access to the HTTP protocol. On a slightly related note: Is there an elegant way to map enumerator chunk elements from type a -> b using a mapping function? Right now, I've embedded the ByteString -> Builder conversion deep in my enumerator, but it should really be happening at a higher level. Cheers, Bardur

On Sun, Feb 20, 2011 at 11:04 AM, Bardur Arantsson
Hi all,
I've recently switched my "Hums" UPnP server over to using WAI/Warp, but I'm seeing quite high CPU usage compared to monadic I/O. (1-7% with WAI as opposed to 0-1% for monadic).
(Just to be clear: The server is using a constant amount of memory, so it's definitely not leaking or anything like that.)
I think the source of the efficiency problem is that I'm enumerating strict ByteStrings from a file, but the WAI response enumerator requires Builder chunks, so I'm forced to use "fromByteString" to convert between the strict ByteString representation and the Builder representation -- I'm assuming this performs a memory copy.
Is there any way to avoid this extra "fromByteString"?
blaze-builder is usually pretty intelligent about this. If I remember correctly, Simon Meier said that it won't do a memory copy for ByteStrings larger than 8k. In any event, if you want to force insertion instead of copying, replace fromByteString with insertByteString. It *might* be that the CPU overhead is warranted, however: you may end up seeing increased system call overhead with this switch, since the average chunk size will be smaller.
Other than this little issue, WAI seems to be working very well and it seems like a great fit if you just want low-level access to the HTTP protocol.
Good to hear, I'm glad it's working out.
On a slightly related note: Is there an elegant way to map enumerator chunk elements from type a -> b using a mapping function? Right now, I've embedded the ByteString -> Builder conversion deep in my enumerator, but it should really be happening at a higher level.
Yes, Data.Enumerator provides a map enumeratee. You can use the enumeratee to either modify at enumerator or an enumeratee, eg: f :: a -> b e :: Enumerator a IO x i :: Iteratee b IO x e `joinE` map f :: Enumerator b IO x joinI $ map f $$ i :: Iteratee a IO x I hope that clarifies things. Michael

On 2011-02-20 10:29, Michael Snoyman wrote:
On Sun, Feb 20, 2011 at 11:04 AM, Bardur Arantsson
wrote: [--snip--] Is there any way to avoid this extra "fromByteString"?
blaze-builder is usually pretty intelligent about this. If I remember correctly, Simon Meier said that it won't do a memory copy for ByteStrings larger than 8k. In any event, if you want to force insertion instead of copying, replace fromByteString with insertByteString. It *might* be that the CPU overhead is warranted, however: you may end up seeing increased system call overhead with this switch, since the average chunk size will be smaller.
Good point; I just tried upping the chunk size to 32K and explicitly using insertByteString. Even with those changes I'm still seeing a lot of CPU usage (5-10%). I should say that I'm using a version of the enumFile enumerator from Data.Enumerator.Binary that I've adapted to support byte ranges. It may be the case that I've just done something horribly stupid or inefficient. I've attached the code for the enumerator. I guess I'll have to try to get some profiling data to see where the time is actually being spent. I suppose it's about time I learned a bit about profiling my Haskell code :). [--snip bits about enumerators--] Thanks for the explanation. For some reason I'm having a little trouble "connecting" all the type signatures for enumerator/iterator/iteratee. Hopefully it'll get better with practice. Cheers, Bardur

On Sun, Feb 20, 2011 at 12:00 PM, Bardur Arantsson
On 2011-02-20 10:29, Michael Snoyman wrote:
On Sun, Feb 20, 2011 at 11:04 AM, Bardur Arantsson
wrote: [--snip--]
Is there any way to avoid this extra "fromByteString"?
blaze-builder is usually pretty intelligent about this. If I remember correctly, Simon Meier said that it won't do a memory copy for ByteStrings larger than 8k. In any event, if you want to force insertion instead of copying, replace fromByteString with insertByteString. It *might* be that the CPU overhead is warranted, however: you may end up seeing increased system call overhead with this switch, since the average chunk size will be smaller.
Good point; I just tried upping the chunk size to 32K and explicitly using insertByteString. Even with those changes I'm still seeing a lot of CPU usage (5-10%).
I should say that I'm using a version of the enumFile enumerator from Data.Enumerator.Binary that I've adapted to support byte ranges. It may be the case that I've just done something horribly stupid or inefficient. I've attached the code for the enumerator.
I don't see any problems with your implementation, but John Millikin is definitely the guy to speak to about that. He will be able to give you a more definitive answer than I.
I guess I'll have to try to get some profiling data to see where the time is actually being spent. I suppose it's about time I learned a bit about profiling my Haskell code :).
It's entirely possible that WAI/enumerators/builder/Warp is adding some overhead. But what exactly are you comparing against? Warp is *definitely* doing some extra stuff that a simple Data.ByteString.hPut is not, such as timeout handling. I'd be interested in any numbers that you come up with from profiling, please do share.
[--snip bits about enumerators--]
Thanks for the explanation. For some reason I'm having a little trouble "connecting" all the type signatures for enumerator/iterator/iteratee. Hopefully it'll get better with practice.
There's no question that it's a difficult concept to get started with, but it many ways it's like learning monads: the concept looks ridiculously complicated at first, then you think that it's just like (spaceships/nuclear waste/burritos), and then it finally *really* clicks and becomes second nature. The only real solution is what you've already said: practice. If you haven't seen it already, I wrote a three-part series on the enumerator package. I've moved that package into a chapter of the Yesod book[1] so you can read it all on a single page if you like. Hope it helps! Michael [1] http://docs.yesodweb.com/book/enumerator

On 2011-02-20 12:08, Michael Snoyman wrote:
On 2011-02-20 10:29, Michael Snoyman wrote:
On Sun, Feb 20, 2011 at 11:04 AM, Bardur Arantsson
wrote: [--snip--]
It's entirely possible that WAI/enumerators/builder/Warp is adding some overhead. But what exactly are you comparing against? Warp is *definitely* doing some extra stuff that a simple Data.ByteString.hPut is not, such as timeout handling. I'd be interested in any numbers
On Sun, Feb 20, 2011 at 12:00 PM, Bardur Arantsson
wrote: that you come up with from profiling, please do share.
Silly of me, I should have explained properly: I'm comparing against a simple monadic HTTP server using the "HTTP" module for parsing/rendering requests and response headers (so regular Strings) and simple (strict) bytestring output in the IO monad. Socket operations were handled using the network/network-bytestring packages. While I can readily accept a little extra overhead from a more elegant model like WAI/Warp, I don't think 5-10% CPU usage is reasonable for streaming data to a *single* client on a reasonably beefy 2.4GHz Core2 CPU. Given the Warp benchmarks that have been posted and the fact that my (admittedly slightly simpler) http server uses around 0-1% CPU, that makes me think that I'm probably doing something wrong. Anyway, I'll see about producing some proper benchmarks (probably not until the weekend, though) and I guess I/we can take it from there. Cheers,

Two questions:
1) what version of enumerator are you building against? Version
0.4.7 fixed a problem with (>>=) which might affect you.
2) what does the profiler say? It would be helpful to build
enumerator with "-auto-all".
G
On Mon, Feb 21, 2011 at 6:19 PM, Bardur Arantsson
On 2011-02-20 12:08, Michael Snoyman wrote:
On Sun, Feb 20, 2011 at 12:00 PM, Bardur Arantsson
wrote: On 2011-02-20 10:29, Michael Snoyman wrote:
On Sun, Feb 20, 2011 at 11:04 AM, Bardur Arantsson
wrote: [--snip--]
It's entirely possible that WAI/enumerators/builder/Warp is adding some overhead. But what exactly are you comparing against? Warp is *definitely* doing some extra stuff that a simple Data.ByteString.hPut is not, such as timeout handling. I'd be interested in any numbers that you come up with from profiling, please do share.
Silly of me, I should have explained properly: I'm comparing against a simple monadic HTTP server using the "HTTP" module for parsing/rendering requests and response headers (so regular Strings) and simple (strict) bytestring output in the IO monad. Socket operations were handled using the network/network-bytestring packages.
While I can readily accept a little extra overhead from a more elegant model like WAI/Warp, I don't think 5-10% CPU usage is reasonable for streaming data to a *single* client on a reasonably beefy 2.4GHz Core2 CPU.
Given the Warp benchmarks that have been posted and the fact that my (admittedly slightly simpler) http server uses around 0-1% CPU, that makes me think that I'm probably doing something wrong.
Anyway, I'll see about producing some proper benchmarks (probably not until the weekend, though) and I guess I/we can take it from there.
Cheers,
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
--
Gregory Collins

On 2011-02-21 18:35, Gregory Collins wrote:
Two questions:
1) what version of enumerator are you building against? Version 0.4.7 fixed a problem with (>>=) which might affect you.
2) what does the profiler say? It would be helpful to build enumerator with "-auto-all".
I am building against 0.4.7.
2) what does the profiler say? It would be helpful to build enumerator with "-auto-all".
Alright, I've had a go a running profiling enabled. I'm a total noob at profiling with GHC, so any help is much appreciated -- I really have no idea where to start looking. I've attached the profiler output from a one-minute run streaming data. Cheers,

That profile doesn't tell us much unfortunately -- most of the time is
in MAIN for some reason.
What profiling flags did you pass when you built the program?
G
On Sun, Feb 27, 2011 at 5:09 PM, Bardur Arantsson
On 2011-02-21 18:35, Gregory Collins wrote:
Two questions:
1) what version of enumerator are you building against? Version 0.4.7 fixed a problem with (>>=) which might affect you.
2) what does the profiler say? It would be helpful to build enumerator with "-auto-all".
I am building against 0.4.7.
2) what does the profiler say? It would be helpful to build enumerator with "-auto-all".
Alright, I've had a go a running profiling enabled. I'm a total noob at profiling with GHC, so any help is much appreciated -- I really have no idea where to start looking.
I've attached the profiler output from a one-minute run streaming data.
Cheers,
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
--
Gregory Collins

On 2011-02-27 18:44, Gregory Collins wrote:
That profile doesn't tell us much unfortunately -- most of the time is in MAIN for some reason.
I suspected as much. The only thing that really struck me about the output was that enumHandle seems to be allocating quite a lot... but given what it's doing that's not *that* surprising...
What profiling flags did you pass when you built the program?
There are the GHC flags I used: -Wall -O2 -fno-warn-unused-matches -threaded -prof -auto-all I'll try to see if -caf-all makes any difference tomorrow. Cheers,
participants (3)
-
Bardur Arantsson
-
Gregory Collins
-
Michael Snoyman