How does the IO manager handle reading regular files?

I just got reminded that epoll() has no effect on regular files on Linux by reading an nginx article [1] [2] and why that is [3] [4]. By what means does the IO manager make reads (wraps around the read() syscall on Linux) non-blocking? Does it always use read() in `foreign import safe` (or `interruptible`) so that an OS thread is spawned? It would be great if somebody could point me to the code where that's done (not again: for *regular* files). Thanks! Niklas [1]: https://www.nginx.com/blog/thread-pools-boost-performance-9x/ [2]: https://stackoverflow.com/questions/8057892/epoll-on-regular-files [3]: https://jvns.ca/blog/2017/06/03/async-io-on-linux--select--poll--and-epoll/ [4]: https://groups.google.com/forum/#!topic/comp.os.linux.development.system/K-f...

Niklas Hambüchen
I just got reminded that epoll() has no effect on regular files on Linux by reading an nginx article [1] [2] and why that is [3] [4].
By what means does the IO manager make reads (wraps around the read() syscall on Linux) non-blocking?
Does it always use read() in `foreign import safe` (or `interruptible`) so that an OS thread is spawned?
It would be great if somebody could point me to the code where that's done (not again: for *regular* files).
I believe the relevant implementation is the RawIO instance defined in GHC.IO.FD. The read implementation in particular is GHC.IO.FD.readRawBufferPtr. There is a useful Note directly above this function. Cheers, - Ben

Hey Ben, thanks for your quick reply. I think there's a problem. On 14/05/2018 15.36, Ben Gamari wrote:
I believe the relevant implementation is the RawIO instance defined in GHC.IO.FD. The read implementation in particular is GHC.IO.FD.readRawBufferPtr. There is a useful Note directly above this function.
Reading through the code at
http://hackage.haskell.org/package/base-4.11.1.0/docs/src/GHC.IO.FD.html#rea...
The first line jumped to my eye:
| isNonBlocking fd = unsafe_read -- unsafe is ok, it can't block
This looks suspicious.
And indeed, the following program does NOT keep printing things in the printing thread, and instead blocks for 30 seconds:
```
module Main where
import Control.Concurrent
import Control.Monad
import qualified Data.ByteString as BS
import System.Environment
main :: IO ()
main = do
args <- getArgs
case args of
[file] -> do
forkIO $ forever $ do
putStrLn "still running"
threadDelay 100000 -- 0.1 s
bs <- BS.readFile file
putStrLn $ "Read " ++ show (BS.length bs) ++ " bytes"
_ -> error "Pass 1 argument (a file)"
```
when compiled with
~/.stack/programs/x86_64-linux/ghc-8.2.2/bin/ghc --make -O -threaded blocking-regular-file-read-test.hs
on my Ubuntu 16.04 and on a 2GB file like
./blocking-regular-file-read-test /mnt/images/ubuntu-18.04-desktop-amd64.iso
And `strace -f -e open,read` on it shows:
open("/mnt/images/ubuntu-18.04-desktop-amd64.iso", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 11
read(11,

Niklas Hambüchen
So GHC is trying to use `O_NONBLOCK` on regular files, which cannot work and will block when used through unsafe foreign calls like that.
Yikes!
Is this a known problem?
Doesn't sound familiar to me. Sounds like a ticket is in order. Thanks for spotting all of these nasty I/O issues; who knew there could be so many of them. Cheers, - Ben

Also funny but perhaps not too surprising: If in my code, you replace `forkIO` by e.g. `forkOn 2`, then nondeterministically, sometimes the program hangs and sometimes it works with +RTS -N2. The higher you set -N, the more likely it is to work. If you put both the putStrLn loop and the readFile into `forkOn 0` and `forkOn 1` each, and run with +RTS -N3, then it always works as expected.
participants (2)
-
Ben Gamari
-
Niklas Hambüchen