
#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by nh2: Old description:
From my discovery at https://phabricator.haskell.org/D42#30542:
{{{ Why does the existing code work on platforms that are not Linux? In my select man page it says:
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 per‐ mits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns.
The existing select loop seems to rely on the fact that &tv is updated as described here. }}}
Same for `man 2 poll`.
E.g. `man 2 select` on FreeBSD 11 says explicitly:
{{{ BUGS Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to modify the original timeout in place. Thus, it is unwise to assume that the timeout value will be unmodified by the select() system call. FreeBSD does not modify the return value, which can cause problems for applications ported from other systems. }}}
I have tested this now on FreeBSD, and indeed it doesn't work as expected.
With GHC 7.10.2:
{{{ import System.IO main = hWaitForInput stdin (1 * 1000) }}}
`ghc --make test.hs -rtsopts`
{{{ [root@ ~]# time ./test
real 0m1.386s user 0m0.004s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.01
real 0m1.386s user 0m0.001s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.001
real 0m1.678s user 0m0.003s sys 0m0.002s [root@ ~]# time ./test +RTS -V0.0001
real 0m11.311s user 0m0.032s sys 0m0.139s }}}
See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should.
That's because it triggers the case where EINTR is received in https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/..., letting us use the same unmodified 1-second `struct timeval *timeout` again and again.
This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 `hWaitForInput` is broken (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4) so the demo doesn't work there.
New description: From my discovery at https://phabricator.haskell.org/D42#30542: {{{ Why does the existing code work on platforms that are not Linux? In my select man page it says: On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 per‐ mits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns. The existing select loop seems to rely on the fact that &tv is updated as described here. }}} Same for `man 2 poll`. E.g. `man 2 select` on FreeBSD 11 says explicitly: {{{ BUGS Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to modify the original timeout in place. Thus, it is unwise to assume that the timeout value will be unmodified by the select() system call. FreeBSD does not modify the return value, which can cause problems for applications ported from other systems. }}} I have tested this now on FreeBSD, and indeed it doesn't work as expected. With GHC 7.10.2: {{{ import System.IO main = hWaitForInput stdin (1 * 1000) }}} `ghc --make test.hs -rtsopts` {{{ [root@ ~]# time ./test real 0m1.386s user 0m0.004s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.01 real 0m1.386s user 0m0.001s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.001 real 0m1.678s user 0m0.003s sys 0m0.002s [root@ ~]# time ./test +RTS -V0.0001 real 0m11.311s user 0m0.032s sys 0m0.139s }}} See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should. That's because it triggers the case where EINTR is received in https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/..., letting us use the same unmodified 1-second `struct timeval *timeout` again and again. This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 `hWaitForInput` is broken (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4) so the demo doesn't work there. --- Convenience: Here is the call chain of [https://gist.github.com/nh2/6f571ce00667bc49d845ab4c8fdf9769 hWaitForInput] -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler