RE: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

We do use a thread pool. But you still need as many OS threads as there are blocked read() calls, unless you have a single thread doing select() as I described. BTW our Haskell Workshop paper from last year describes this stuff: http://www.haskell.org/~simonmar/papers/conc-ffi.ps.gz Cheers, Simon On 19 January 2005 15:07, Keean Schupke wrote:
Why not use a thread-pool, and a "safe" call to read, provided there is an OS thread available, defaulting to "unsafe" if no thread is available... You could make the thread pool size an argument...
Keean.
Simon Marlow wrote:
On 19 January 2005 13:50, William Lee Irwin III wrote:
On 19 January 2005 09:45, Ben Rudiak-Gould wrote:
Okay, my ignorance of Posix is showing again. Is it currently the case, then, that every GHC thread will stop running while a disk read is in progress in any thread? Is this true on all platforms?
On Wed, Jan 19, 2005 at 01:39:05PM -0000, Simon Marlow wrote:
It's true on Unix-like systems, I believe. Even with -threaded. It might not be true on Win32.
How does forkOS fit into this picture? It's described in the documentation as allowing concurrent execution of system calls and other activity by other threads.
forkOS doesn't fix this. It forks another OS thread which can be used to make concurrent foreign calls, if they are not marked "unsafe". However, the standard I/O library, in -threaded mode, does read like this:
- non-blocking, "unsafe", read() to see what's there - if read() would block, then hand off to another Haskell thread which does select() on all the outstanding IO requests.
This scheme is just for efficiency. We could (and used to) just call "safe" read() for every read - that would give you the right concurrency with -threaded, but unfortunately you'd really notice the difference if you had 1000s of threads all doing IO, because each one would need its own OS thread. The current scheme is rather snappy (even snappier than non-threaded, as it happens).
You can always do System.Posix.fileRead to get around it.
Cheers, Simon _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Simon Marlow wrote:
We do use a thread pool. But you still need as many OS threads as there are blocked read() calls, unless you have a single thread doing select() as I described.
How does the select() help? AFAIK, select() on a regular file or block
device will always indicate that it is readable, even if a subsequent
read() would have to read the data from disk.
--
Glynn Clements

Glynn Clements
We do use a thread pool. But you still need as many OS threads as there are blocked read() calls, unless you have a single thread doing select() as I described.
How does the select() help? AFAIK, select() on a regular file or block device will always indicate that it is readable, even if a subsequent read() would have to read the data from disk.
It doesn't help if we don't want I/O requests to delay one another, and not only avoiding delay of execution of pure Haskell code. BTW, poll is generally preferred to select. The maximum fd supported by select may be lower than the maximum fd supported by the system. And the interface of poll allows the cost to be proportional to the number of descriptors rather than to the highest descriptor. The timeout is specified in microseconds for select and in milliseconds for poll, but on Linux the actual resolution is the clock tick in both cases anyway (usually 1ms or 10ms). It's probably yet better to use epoll than poll. The difference is that with epoll you register fds using separate calls, and you don't have to provide them each time you wait (and the kernel doesn't have to scan the array each time). So it scales better to a large number of threads which perform I/O. It's available in Linux 2.6. Caveat: before Linux 2.6.8 epoll had a memory leak in the kernel because of a reference counting bug (0.5kB per epoll_create call, which means 0.5kB of physical memory lost per starting a program which waits for I/O using epoll). poll is in Single Unix Spec, epoll is Linux-specific. poll and epoll both take the timeout in the same format, but they interpret it differently: poll sleeps at least the given time (unless a fd is ready or a signal arrives), while epoll rounds it up to a whole number of clock ticks and then sleeps between this time and one tick shorter. I was told that this is intentional because it allows to sleep until the next clock tick by specifying the timeout of 1ms (a timeout of 0ms means to not sleep at all). Accurate sleeping requires to measure the time by which poll/epoll can make the timeout longer (it's 1 tick for epoll and 2 ticks for poll), subtract this time from the timeout passed to them, add 1ms, and sleep the remaining time by busy waiting calling gettimeofday interspersed with poll/epoll with no timeout. gettimeofday() is accurate to microseconds, it asks some clock chip instead of relying on the timer interrupt only. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
participants (3)
-
Glynn Clements
-
Marcin 'Qrczak' Kowalczyk
-
Simon Marlow