Can a GC delay TCP connection formation?

newer
Why Kleisli composition is not in...

Jeff Shaw

27 Nov 2012 27 Nov '12

12:19 p.m.

Hello, I've run into an issue that makes me think that when the GHC GC runs while a Snap or Warp HTTP server is serving connections, the GC prevents or delays TCP connections from forming. My application requires that TCP connections form within a few tens of milliseconds. I'm wondering if anyone else has run into this issue, and if there are some GC flags that could help. I've tried a few, such as -H and -c, and haven't found anything to help. I'm using GHC 7.4.1. Thanks, Jeff

Show replies by date

Ertugrul Söylemez

27 Nov 27 Nov

2:14 p.m.

Jeff Shaw <shawjef3@gmail.com> wrote:

...

I've run into an issue that makes me think that when the GHC GC runs while a Snap or Warp HTTP server is serving connections, the GC prevents or delays TCP connections from forming. My application requires that TCP connections form within a few tens of milliseconds. I'm wondering if anyone else has run into this issue, and if there are some GC flags that could help. I've tried a few, such as -H and -c, and haven't found anything to help. I'm using GHC 7.4.1.

When you compile with -threaded and run on multiple threads, then the runtime uses parallel GC. Did you try that? Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad.

Gregory Collins

5:45 p.m.

GHC has a "stop the world" garbage collector, meaning that while major GC is happening, the entire process must be halted. In my experience GC pause times are typically low, but depending the heap residency profile of your application (and the quantity of garbage being produced by it), this may not be the case. If you have a hard real-time requirement then a garbage-collected language may not be appropriate for you. On Tue, Nov 27, 2012 at 5:19 AM, Jeff Shaw <shawjef3@gmail.com> wrote:

...

Hello, I've run into an issue that makes me think that when the GHC GC runs while a Snap or Warp HTTP server is serving connections, the GC prevents or delays TCP connections from forming. My application requires that TCP connections form within a few tens of milliseconds. I'm wondering if anyone else has run into this issue, and if there are some GC flags that could help. I've tried a few, such as -H and -c, and haven't found anything to help. I'm using GHC 7.4.1.

Thanks, Jeff

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Gregory Collins <greg@gregorycollins.net>

Johan Tibell

10:37 p.m.

Kazu and Andreas, could this be IO manager related? On Monday, November 26, 2012, Jeff Shaw wrote:

...

Hello, I've run into an issue that makes me think that when the GHC GC runs while a Snap or Warp HTTP server is serving connections, the GC prevents or delays TCP connections from forming. My application requires that TCP connections form within a few tens of milliseconds. I'm wondering if anyone else has run into this issue, and if there are some GC flags that could help. I've tried a few, such as -H and -c, and haven't found anything to help. I'm using GHC 7.4.1.

Thanks, Jeff

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>

timothyhobbs＠seznam.cz

11:55 p.m.

Could you give us more info on what your constraints are? Is it necessary that you have a certain number of connections per second, or is it necessary that the connection results very quickly after some other message is received? ---------- Původní zpráva ---------- Od: Johan Tibell <johan.tibell@gmail.com> Datum: 27. 11. 2012 Předmět: Re: [Haskell-cafe] Can a GC delay TCP connection formation? " Kazu and Andreas, could this be IO manager related? On Monday, November 26, 2012, Jeff Shaw wrote: " Hello, I've run into an issue that makes me think that when the GHC GC runs while a Snap or Warp HTTP server is serving connections, the GC prevents or delays TCP connections from forming. My application requires that TCP connections form within a few tens of milliseconds. I'm wondering if anyone else has run into this issue, and if there are some GC flags that could help. I've tried a few, such as -H and -c, and haven't found anything to help. I'm using GHC 7.4.1. Thanks, Jeff _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe (http://www.haskell.org/mailman/listinfo/haskell-cafe) " "

Jeff Shaw

28 Nov 28 Nov

3:02 a.m.

Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections). Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond. To respond to Ertugrul, I'm compiling with -threaded, and running with +RTS -N. I hope this helps describe my problem. I c an probably come up with some hard information if requested, E.G. threadscope. Jeff On 11/27/2012 10:55 AM, timothyhobbs@seznam.cz wrote:

...

Could you give us more info on what your constraints are? Is it necessary that you have a certain number of connections per second, or is it necessary that the connection results very quickly after some other message is received?

Jason Dagit

3:17 a.m.

On Tue, Nov 27, 2012 at 11:02 AM, Jeff Shaw <shawjef3@gmail.com> wrote:

...

Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections).

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond.

Have you read section 8.4.2 of the ghc user guide? http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/ffi-ghc.html Based on that I would check the FFI imports in your database library. In the best case (-threaded, 'safe', and thread-safe odbc), I think you'll find that N of these can run concurrently, but here your number of requests is likely to be much greater than N (where N is the number of threads the RTS created with +RTS -N). I'm not sure how to solve your problem, but perhaps this information can help you pinpoint the problem. Good luck, Jason

Jason Dagit

3:18 a.m.

On Tue, Nov 27, 2012 at 11:17 AM, Jason Dagit <dagitj@gmail.com> wrote:

...

On Tue, Nov 27, 2012 at 11:02 AM, Jeff Shaw <shawjef3@gmail.com> wrote:

...
Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections).

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond.

Have you read section 8.4.2 of the ghc user guide? http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/ffi-ghc.html

Ahem, I meant *8.2.4*.

Gershom Bazerman

3:45 a.m.

On 11/27/12 2:17 PM, Jason Dagit wrote:

...

Based on that I would check the FFI imports in your database library. In the best case (-threaded, 'safe', and thread-safe odbc), I think you'll find that N of these can run concurrently, but here your number of requests is likely to be much greater than N (where N is the number of threads the RTS created with +RTS -N).

HDBC-odbc has long used the wrong type of FFI imports, resulting in long-running database queries potentially blocking all other IO. I just checked, and apparently a patch was made to the repo in September that finally fixes this [1], but apparently a new release has yet to be uploaded to hackage. In any case, if you try to install it from the repo, this may at least solve some of your problems. [1] https://github.com/hdbc/hdbc-odbc/commit/7299d3441ce2e1d5a485fe79b37540c0a44... --Gershom

Jeff Shaw

5:45 a.m.

On 11/27/2012 2:45 PM, Gershom Bazerman wrote:

...

HDBC-odbc has long used the wrong type of FFI imports, resulting in long-running database queries potentially blocking all other IO. I just checked, and apparently a patch was made to the repo in September that finally fixes this [1], but apparently a new release has yet to be uploaded to hackage. In any case, if you try to install it from the repo, this may at least solve some of your problems.

[1] https://github.com/hdbc/hdbc-odbc/commit/7299d3441ce2e1d5a485fe79b37540c0a44...

--Gershom Gershom, Thanks for pointing this out. I've checked out the latest hdbc-odbc code, and I'll see if there's an improvement.

Jeff

Nicolas Wu

5:59 a.m.

On Tue, Nov 27, 2012 at 9:45 PM, Jeff Shaw <shawjef3@gmail.com> wrote:

...

On 11/27/2012 2:45 PM, Gershom Bazerman wrote:

...
HDBC-odbc has long used the wrong type of FFI imports, resulting in long-running database queries potentially blocking all other IO. I just checked, and apparently a patch was made to the repo in September that finally fixes this [1], but apparently a new release has yet to be uploaded to hackage. In any case, if you try to install it from the repo, this may at least solve some of your problems.

[1] https://github.com/hdbc/hdbc-odbc/commit/7299d3441ce2e1d5a485fe79b37540c0a44...

--Gershom

Gershom, Thanks for pointing this out. I've checked out the latest hdbc-odbc code, and I'll see if there's an improvement.

Hi, I'm the maintainer of HDBC. I haven't yet released this code since it hasn't yet been fully tested. However, if you're happy with it, I'll push the version with proper ffi bindings up to Hackage. Nick

Jeff Shaw

29 Nov 29 Nov

4:36 a.m.

On 11/27/2012 4:59 PM, Nicolas Wu wrote:

...

Hi, I'm the maintainer of HDBC. I haven't yet released this code since it hasn't yet been fully tested. However, if you're happy with it, I'll push the version with proper ffi bindings up to Hackage. Nick Nick, I pulled the latest version of HDBC-odbc, and it appears to be working MUCH better than before. I now have 0% timeouts from httperf with 50 connections/second and timeout set to 0.1 seconds. It's looking like the safe imports vastly improved IO blocking. I haven't seen any new problems since the new version went live.

Jeff

Nicolas Wu

6:05 a.m.

On Wed, Nov 28, 2012 at 8:36 PM, Jeff Shaw <shawjef3@gmail.com> wrote:

...

On 11/27/2012 4:59 PM, Nicolas Wu wrote: I pulled the latest version of HDBC-odbc, and it appears to be working MUCH better than before. I now have 0% timeouts from httperf with 50 connections/second and timeout set to 0.1 seconds. It's looking like the safe imports vastly improved IO blocking. I haven't seen any new problems since the new version went live.

That's great to hear! I'll aim to push this version to Hackage over the weekend. Nick

Neil Davies

28 Nov 28 Nov

5:21 p.m.

Jeff Are you certain that all the delay can be laid at the GHC runtime? How much of the end-to-end delay budget is being allocated to you? I recently moved a static website from a 10-year old server in telehouse into AWS in Ireland and watched the access time (HTTP GET to check time on top index page) increase by 150ms. Neil On 27 Nov 2012, at 19:02, Jeff Shaw <shawjef3@gmail.com> wrote:

...

Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections).

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond.

To respond to Ertugrul, I'm compiling with -threaded, and running with +RTS -N.

I hope this helps describe my problem. I c an probably come up with some hard information if requested, E.G. threadscope.

Jeff

On 11/27/2012 10:55 AM, timothyhobbs@seznam.cz wrote:

...
Could you give us more info on what your constraints are? Is it necessary that you have a certain number of connections per second, or is it necessary that the connection results very quickly after some other message is received?

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Alexander Kjeldaas

10:44 p.m.

Jeff, this is somewhat off topic, but interesting. Are "telehouse" and AWS physically close? Was this latency increase not expected due to geography? Alexander On 28 November 2012 06:21, Neil Davies <semanticphilosopher@gmail.com>wrote:

...

Jeff

Are you certain that all the delay can be laid at the GHC runtime?

How much of the end-to-end delay budget is being allocated to you? I recently moved a static website from a 10-year old server in telehouse into AWS in Ireland and watched the access time (HTTP GET to check time on top index page) increase by 150ms.

Neil

On 27 Nov 2012, at 19:02, Jeff Shaw <shawjef3@gmail.com> wrote:

...
Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections).

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond.

To respond to Ertugrul, I'm compiling with -threaded, and running with +RTS -N.

I hope this helps describe my problem. I c an probably come up with some hard information if requested, E.G. threadscope.

Jeff

On 11/27/2012 10:55 AM, timothyhobbs@seznam.cz wrote:

...
Could you give us more info on what your constraints are? Is it necessary that you have a certain number of connections per second, or is it necessary that the connection results very quickly after some other message is received?

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Neil Davies

29 Nov 29 Nov

7:32 a.m.

No - the difference is 6.5ms each way On 28 Nov 2012, at 14:44, Alexander Kjeldaas <alexander.kjeldaas@gmail.com> wrote:

...

Jeff, this is somewhat off topic, but interesting. Are "telehouse" and AWS physically close? Was this latency increase not expected due to geography?

Alexander

On 28 November 2012 06:21, Neil Davies <semanticphilosopher@gmail.com> wrote: Jeff

Are you certain that all the delay can be laid at the GHC runtime?

How much of the end-to-end delay budget is being allocated to you? I recently moved a static website from a 10-year old server in telehouse into AWS in Ireland and watched the access time (HTTP GET to check time on top index page) increase by 150ms.

Neil

On 27 Nov 2012, at 19:02, Jeff Shaw <shawjef3@gmail.com> wrote:

...
Hello Timothy and others, One of my clients hosts their HTTP clients in an Amazon cloud, so even when they turn on persistent HTTP connections, they use many connections. Usually they only end up sending one HTTP request per TCP connection. My specific problem is that they want a response in 120 ms or so, and at times they are unable to complete a TCP connection in that amount of time. I'm looking at on the order of 100 TCP connections per second, and on the order of 1000 HTTP requests per second (other clients do benefit from persistent HTTP connections).

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc. The query results are strict, and atomicModifyIORef' should receive the updated state already evaluated. I reduced the amount of time that query took from tens of seconds to just a couple, and for some reason that reduced the proportion of TCP timeouts drastically. The approximate before and after TCP timeout proportions are 15% and 5%. I'm not sure why this reduction in timeouts resulted from the query time improving, but this discovery has me on the task of removing all database code from the main program and into a cron job. My best guess is that HDBC-odbc somehow disrupts other communications while it waits for the DB server to respond.

To respond to Ertugrul, I'm compiling with -threaded, and running with +RTS -N.

I hope this helps describe my problem. I c an probably come up with some hard information if requested, E.G. threadscope.

Jeff

On 11/27/2012 10:55 AM, timothyhobbs@seznam.cz wrote:

...
Could you give us more info on what your constraints are? Is it necessary that you have a certain number of connections per second, or is it necessary that the connection results very quickly after some other message is received?

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bryan O'Sullivan

1 Dec 1 Dec

2:29 a.m.

On Tue, Nov 27, 2012 at 11:02 AM, Jeff Shaw <shawjef3@gmail.com> wrote:

...

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc.

Incidentally, what kind of database are you talking to? Issues of FFI correctness aside, HDBC is in general terribly slow compared to some of the more DB-specific bindings.

Jeff Shaw

11:08 a.m.

On 11/30/2012 1:29 PM, Bryan O'Sullivan wrote:

...

On Tue, Nov 27, 2012 at 11:02 AM, Jeff Shaw <shawjef3@gmail.com <mailto:shawjef3@gmail.com>> wrote:

Once each minute, a thread of my program updates a global state, stored in an IORef, and updated with atomicModifyIORef', based on query results via HDBC-obdc.

Incidentally, what kind of database are you talking to? Issues of FFI correctness aside, HDBC is in general terribly slow compared to some of the more DB-specific bindings. I'm connecting to a MS SQL Server 2008 R2 DBMS via Free TDS.

4983

Age (days ago)

4987

Last active (days ago)

List overview

Download

17 comments

11 participants

participants (11)

Alexander Kjeldaas
Bryan O'Sullivan
Ertugrul Söylemez
Gershom Bazerman
Gregory Collins
Jason Dagit
Jeff Shaw
Johan Tibell
Neil Davies
Nicolas Wu
timothyhobbs＠seznam.cz