Why don't my OS threads terminate?

25 Feb 2011

      Dear all,

At work, I'm developing a webapplication with happstack. When I let it
run for some time (< 5min.) it crashes with the following message:

"failed to create OS thread: Resource temporarily unavailable"

This happens because the server continuously creates OS threads but
never terminates them. This keeps going on until I hit the OS thread
limit.

I would like to know why the OS threads are created and more
importantly why they aren't terminated. Unfortunately I can't publish
the source code because it's proprietary software. Hopefully the
following description may give an idea what's going on:

The webapplication is basically a simple measurement application: The
client side (HTML, Javascript, JQuery) contains a plot which shows the
measurements that the server generates. The client contains a loop
which first longpolls[2] the server to wait for a new measurement, the
server will then block (using takeMVar) until (using putMVar) a
measurement is available. The client then retrieves the measurement
from the server including all other relevant data. These requests all
happen asynchronously using ajax calls[1].

The server starts by forking a thread which runs the measure loop. At
each iteration the measure loop reads data from a file then fits that
data to a model. For fitting I use my levmar fitting library[3]. The
coefficients of the fitted model are then used to calculate a
measurement.

I believe the OS threads are created by my levmar library. This
library uses bindings-levmar[4] which is a binding to a C library.
bindings-levmar uses safe FFI calls because the levmar C procedures
are reentrant (they need to call back into Haskell to execute the
model function). I believe the RTS creates an OS thread for each safe
FFI call.

When I replace my levmar fit function with a function which just
returns a random number the problem disappears. The problem also
doesn't appear in a standalone application I have which just uses my
levmar fit function to fit all the datafiles, without the webserver.
It seems that the combination of happstack and levmar causes the OS
threads to not terminate.

I tried space profiling the application but that did not reveal
anything interesting. I even used threadscope to look at the
application eventlog (see: [5]) but that also looked normal.

Hopefully someone can point me where to look next.

Thanks,

Bas

[1] http://api.jquery.com/jQuery.ajax/
[2] http://en.wikipedia.org/wiki/Push_technology#Long_polling
[3] http://hackage.haskell.org/package/levmar
[4] http://hackage.haskell.org/package/bindings-levmar
[5] http://bifunctor.homelinux.net/~bas/crash.eventlog

Bas van Dijk

sclv

Bas van Dijk

Bas van Dijk

Simon Marlow

Bas van Dijk

Simon Marlow

tags

participants (3)