Why don't my OS threads terminate?

Dear all, At work, I'm developing a webapplication with happstack. When I let it run for some time (< 5min.) it crashes with the following message: "failed to create OS thread: Resource temporarily unavailable" This happens because the server continuously creates OS threads but never terminates them. This keeps going on until I hit the OS thread limit. I would like to know why the OS threads are created and more importantly why they aren't terminated. Unfortunately I can't publish the source code because it's proprietary software. Hopefully the following description may give an idea what's going on: The webapplication is basically a simple measurement application: The client side (HTML, Javascript, JQuery) contains a plot which shows the measurements that the server generates. The client contains a loop which first longpolls[2] the server to wait for a new measurement, the server will then block (using takeMVar) until (using putMVar) a measurement is available. The client then retrieves the measurement from the server including all other relevant data. These requests all happen asynchronously using ajax calls[1]. The server starts by forking a thread which runs the measure loop. At each iteration the measure loop reads data from a file then fits that data to a model. For fitting I use my levmar fitting library[3]. The coefficients of the fitted model are then used to calculate a measurement. I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call. When I replace my levmar fit function with a function which just returns a random number the problem disappears. The problem also doesn't appear in a standalone application I have which just uses my levmar fit function to fit all the datafiles, without the webserver. It seems that the combination of happstack and levmar causes the OS threads to not terminate. I tried space profiling the application but that did not reveal anything interesting. I even used threadscope to look at the application eventlog (see: [5]) but that also looked normal. Hopefully someone can point me where to look next. Thanks, Bas [1] http://api.jquery.com/jQuery.ajax/ [2] http://en.wikipedia.org/wiki/Push_technology#Long_polling [3] http://hackage.haskell.org/package/levmar [4] http://hackage.haskell.org/package/bindings-levmar [5] http://bifunctor.homelinux.net/~bas/crash.eventlog

Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262 (Should be fixed in head) --Sterl -- View this message in context: http://haskell.1045720.n5.nabble.com/Why-don-t-my-OS-threads-terminate-tp339... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

On 25 February 2011 18:27, sclv
Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262
(Should be fixed in head)
That looks exactly like my problem. I will try it out with ghc head on monday. Thanks! Bas

On 25 February 2011 19:10, Bas van Dijk
On 25 February 2011 18:27, sclv
wrote: Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262
(Should be fixed in head)
That looks exactly like my problem. I will try it out with ghc head on monday.
In ghc-HEAD (7.1.20110227) the bug is fixed, great! The bug still appears in ghc-7.0.2-rc2, so I assume the patch in #4262 was not merged in that release. Will it be merged into the upcoming ghc-7.0.2? Regards, Bas

On 28/02/11 15:59, Bas van Dijk wrote:
On 25 February 2011 19:10, Bas van Dijk
wrote: On 25 February 2011 18:27, sclv
wrote: Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262
(Should be fixed in head)
That looks exactly like my problem. I will try it out with ghc head on monday.
In ghc-HEAD (7.1.20110227) the bug is fixed, great!
The bug still appears in ghc-7.0.2-rc2, so I assume the patch in #4262 was not merged in that release.
Will it be merged into the upcoming ghc-7.0.2?
I'm slightly worried by this. #4262 was not an OS thread "leak", but rather that when the program needs fewer OS threads we weren't returning the surplus OS threads back to the system. #4262 doesn't make programs use an ever-increasing number of OS threads. It's an optimisation rather than a bug, which is why we didn't merge the patch into the 7.0 branch. So I'm concerned because the symptom you describe sounds more like a leak. #4850 could have caused it, but the fix for that was merged into 7.0.2. Cheers, Simon

On 10 March 2011 18:11, Simon Marlow
On 28/02/11 15:59, Bas van Dijk wrote:
On 25 February 2011 19:10, Bas van Dijk
wrote: On 25 February 2011 18:27, sclv
wrote: Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262
(Should be fixed in head)
That looks exactly like my problem. I will try it out with ghc head on monday.
In ghc-HEAD (7.1.20110227) the bug is fixed, great!
The bug still appears in ghc-7.0.2-rc2, so I assume the patch in #4262 was not merged in that release.
Will it be merged into the upcoming ghc-7.0.2?
I'm slightly worried by this. #4262 was not an OS thread "leak", but rather that when the program needs fewer OS threads we weren't returning the surplus OS threads back to the system. #4262 doesn't make programs use an ever-increasing number of OS threads. It's an optimisation rather than a bug, which is why we didn't merge the patch into the 7.0 branch.
So I'm concerned because the symptom you describe sounds more like a leak. #4850 could have caused it, but the fix for that was merged into 7.0.2.
My bug is solved in 7.0.2. Regards, Bas

On 10/03/11 18:20, Bas van Dijk wrote:
On 10 March 2011 18:11, Simon Marlow
wrote: On 28/02/11 15:59, Bas van Dijk wrote:
On 25 February 2011 19:10, Bas van Dijk
wrote: On 25 February 2011 18:27, sclv
wrote: Bas van Dijk-2 wrote:
I believe the OS threads are created by my levmar library. This library uses bindings-levmar[4] which is a binding to a C library. bindings-levmar uses safe FFI calls because the levmar C procedures are reentrant (they need to call back into Haskell to execute the model function). I believe the RTS creates an OS thread for each safe FFI call.
Sounds like #4262: http://hackage.haskell.org/trac/ghc/ticket/4262
(Should be fixed in head)
That looks exactly like my problem. I will try it out with ghc head on monday.
In ghc-HEAD (7.1.20110227) the bug is fixed, great!
The bug still appears in ghc-7.0.2-rc2, so I assume the patch in #4262 was not merged in that release.
Will it be merged into the upcoming ghc-7.0.2?
I'm slightly worried by this. #4262 was not an OS thread "leak", but rather that when the program needs fewer OS threads we weren't returning the surplus OS threads back to the system. #4262 doesn't make programs use an ever-increasing number of OS threads. It's an optimisation rather than a bug, which is why we didn't merge the patch into the 7.0 branch.
So I'm concerned because the symptom you describe sounds more like a leak. #4850 could have caused it, but the fix for that was merged into 7.0.2.
My bug is solved in 7.0.2.
Ah great, thanks for testing. Cheers, Simon
participants (3)
-
Bas van Dijk
-
sclv
-
Simon Marlow