[GHC] #13497: GHC does not use select()/poll() correctly on non-Linux platforms

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime | Version: 8.0.1 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #8684, #12912 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- From my discovery at https://phabricator.haskell.org/D42#30542: {{{ Why does the existing code work on platforms that are not Linux? In my select man page it says: On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 per‐ mits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns. The existing select loop seems to rely on the fact that &tv is updated as described here. }}} Same for `man 2 poll`. E.g. `man 2 select` on FreeBSD 11 says explicitly: {{{ BUGS Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to modify the original timeout in place. Thus, it is unwise to assume that the timeout value will be unmodified by the select() system call. FreeBSD does not modify the return value, which can cause problems for applications ported from other systems. }}} I have tested this now on FreeBSD, and indeed it doesn't work as expected. With GHC 7.10.2: {{{ import System.IO main = hWaitForInput stdin (1 * 1000) }}} `ghc --make test.hs -rtsopts` {{{ [root@ ~]# time ./test real 0m1.386s user 0m0.004s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.01 real 0m1.386s user 0m0.001s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.001 real 0m1.678s user 0m0.003s sys 0m0.002s [root@ ~]# time ./test +RTS -V0.0001 real 0m11.311s user 0m0.032s sys 0m0.139s }}} See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should. That's because it triggers the case where EINTR is received in https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/..., letting us use the same unmodified 1-second `struct timeval *timeout` again and again. This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 `hWaitForInput` is broken (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4) so the demo doesn't work there. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Using `git grep "select("`, I've found uses of `select()` in: * `libraries/base/cbits/inputReady.c` * `rts/posix/Select.c` The first one I already knew about, the other one has 3 uses. 2 of those are with a 0-timeout (in which case there is no problem), and the one in `awaitEvent()` aims to block as long as possible, so that's not a problem either. The only occurrence of `poll(` is in the new code that carries the problem I mentioned (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4); this probably has to be fixed (both to not crash and to handle the fact that `poll()` may not update the timeout. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I have a first patch for this ready (based on 8.0.1) at https://github.com/ghc/ghc/compare/ghc-8.0.1-release...nh2:fix-12695 -select-time-loop-ghc-8.0.1 Still need to forward-port it to 8.0.2 / HEAD after fixing the `poll()` failure there. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Actually before that, I should probably fix the Windows version too (for the case that the FD is not a socket). https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/... {{{ while (1) // keep trying until we find a real key event { rc = WaitForSingleObject( hFile, msecs ); ... }}} Looking at this, it is not clear to me how this should ever have worked (how this should have provided the semantics that after the maximum given time, `hWaitForInput` returns), as this would simply loop forever if interrupted by a signal. I just wrote the above sentence and then realised that the exact same was true for any non-Linux select, so I got suspicious why it worked at all so far on my FreeBSD VM before my fix, because any EINTR should just return out of `fdReady()` just to have it restarted with the original full duration. [1 hour later] I now know what's going on. It only worked on non-Linux because of the idle GC! Looking at `truss`, it keeps calling `select()` until suddenly a `ktimer_settime` appears, and a quick `printf` reveals that it's issued by `stopTicker()` called by `stopTimer()` caused by idle GC. When the idle GC is set to a value larger than the `msecs` passed to `fdReady()`, e.g. when `msecs = 500` and `+RTS -I1`, then `fdReady` will never return and the program will be stuck. BSD and Mac OS users (and probably Windows) were just lucky so far that idle GC defaults to 0.3 seconds. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Uh, some things are still puzzling me here. If I run with `+RTS -I0.3`, the infinite loop does not occur on FreeBSD, even though 0.3 is supposed to be the default, so I'm not sure why giving no `+RTS` and `+RTS -I0.3` have different behaviour. Also, passing `+RTS -I0` does not enforce the bug, surprises me. Note all these observations are for the non`-threaded` runtime. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I don't understand how changing `-I...` can even do anything given that I don't use the threaded runtime (confirmed by `+RTS --info`), as `-I` is supposed to be only available for the threaded runtime. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I found (using [https://hackage.haskell.org/package/base-4.9.1.0/docs/GHC- RTS-Flags.html#v:getRTSFlags getRTSFlags]) that setting `+RTS -I10` will set {{{ idleGCDelayTime = 10000000000 doIdleGC = True }}} even when the threaded RTS is not enabled. I'm not sure if that's a bug, but commit [https://github.com/ghc/ghc/commit/33a84b8c79f 33a84b8c79f] that introduced the idle GC says {{{ The idle GC only happens in the threaded RTS }}} so maybe this is a bug? [https://github.com/ghc/ghc/blob/428e152be6bb0fd3867e41cee82a6d5968a11a26/rts... The line that checks doIdleGC] is clearly outside the `#ifdef THREADED_RTS`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nh2): * cc: simonmar (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): The `stopTimer()` occurring for me (and terminating `fdReady()`) is the one [https://github.com/ghc/ghc/blob/428e152be6bb0fd3867e41cee82a6d5968a11a26/rts... here]. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I have confirmed by printf-debugging that the line {{{ ticks_to_gc = RtsFlags.GcFlags.idleGCDelayTime / RtsFlags.MiscFlags.tickInterval; }}} is used even if the non-threaded runtime is used, and `doIdleGC` is `False`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Hmm, this seems to happen on purpose: Commit [https://github.com/ghc/ghc/commit/0b79d5cd4687dacf7efd430df7fba9d9a5a5ce32 Another overhaul of the recent_activity / idle GC handling] (#5991) mentions {{{ - we now turn off the timer signal after idleGCDelay even if the idle GC is disabled with +RTS -I0. }}} I personally find that very confusing, using a setting called `idleGCDelay` even when the user explicitly turned that off. Further, doing it this way, there's no way of actually setting what the value of `idleGCDelay` is, because if you use `-I0`, well, then it's the default (0.3 seconds), and if you use `-I` with any other value, then that actually turns idle GC on (as I found above). So there's no way to change away from the default 0.3 seconds when idle GC is set to off. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

and the one in `awaitEvent()` aims to block as long as possible, so
#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I was wrong when I said: Replying to [comment:1 nh2]: that's not a problem either. The `select()` occurrence in `awaitEvent()` waits for `sleeping_queue->block_info.target - now` so that also needs to be vetted on whether it has the `&tv` updating problem on non-Linux. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Reading the aforementioned [https://github.com/ghc/ghc/commit/0b79d5cd4687dacf7efd430df7fba9d9a5a5ce32 #diff-dce00d71def2bd086622a775b0723323R71 commit] in detail, it seems that the code currently doesn't do what the code may not follow the intents expressed in the comments: {{{ + * The timer interrupt transitions ACTIVITY_YES into + * ACTIVITY_MAYBE_NO, waits for RtsFlags.GcFlags.idleGCDelayTime, + * and then: + * - if idle GC is no, set ACTIVITY_INACTIVE and wakeUpRts() + * - if idle GC is off, set ACTIVITY_DONE_GC and stopTimer() }}} [https://github.com/ghc/ghc/blob/380b25ea4754c2aea683538ffdb179f8946219a0/rts... Code:] {{{ if (RtsFlags.GcFlags.doIdleGC) { recent_activity = ACTIVITY_INACTIVE; #ifdef THREADED_RTS ... }}} It depends on what `if idle GC is no` (read: `on`) means. In many places I have read that idle GC can really only be "on" in the threaded RTS. In that case, the code does not implement the comment, as it sets `ACTIVITY_INACTIVE` even when we're not in the threaded RTS. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Some comments I've found in the code base: {{{ switch (recent_activity) { case ACTIVITY_INACTIVE: if (force_major) { // We are doing a GC because the system has been idle for a // timeslice and we need to check for deadlock. Record the // fact that we've done a GC and turn off the timer signal; // it will get re-enabled if we run any threads after the GC. recent_activity = ACTIVITY_DONE_GC; }}} {{{ switch (recent_activity) { case ACTIVITY_DONE_GC: { // ACTIVITY_DONE_GC means we turned off the timer signal to // conserve power (see #1623). Re-enable it here. }}} So the disabling of the timer signal is a power saving feature. I don't understand though why this is only done after a certain time (`idleGCDelayTime / tickInterval` many ticks) instead of immediately. Is there something we know only after that many ticks? Or is it a performance optimisation to do it after that many ticks instead of immediately? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): @nh2 the timer signal gets turned off after the idle GC. We can't turn it off before the idle GC, because then we would never do the idle GC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Replying to [comment:11 nh2]:
The `select()` occurrence in `awaitEvent()` waits for `sleeping_queue->block_info.target - now` ([https://github.com/ghc/ghc/blob/380b25ea4754c2aea683538ffdb179f8946219a0/rts... code]) so that also needs to be vetted on whether it has the `&tv` updating problem on non-Linux.
OK, I've looked into that in detail now and added some printfs, and I think this `select()` needs some update as well if we want it to wake up precisely at `sleeping_queue->block_info.target`. The reasoning is the following: Inside the `while ((numFound = select(maxfd+1, &rfd, &wfd, NULL, ptv)) < 0) { ... }` there's this code: {{{ /* check for threads that need waking up */ wakeUpSleepingThreads(getLowResTimeOfDay()); /* If new runnable threads have arrived, stop waiting for * I/O and run them. */ if (!emptyRunQueue(&MainCapability)) { return; /* still hold the lock */ } }}} After an EINTR has interrupted `select()`, `wakeUpSleepingThreads(getLowResTimeOfDay())` checks whether there is a Haskell thread that wants to run (whether we are past its `sleeping_queue->block_info.target` time) -- and that includes the thread for which we're currently `select`ing -- and if so, we `return` out of the C code. So we're looking at the current time in each loop iteration. Consequently, we don't have the same problem as in `fdReady`, as this scheme does not rely on `select()` updating the passed in `struct timeval *ptv` pointer. However, there is still a problem: The `EINTR`s interrupting the `select()` come at fixed intervals (the timer signal). That can result in us waiting slighly too long. For example, assume `*ptv` is set to 15 ms (`sleeping_queue->block_info.target` is 15 ms from `now`), and assume the timer signal is every 10 ms. Then we would enter `select(15ms)`, get interrupted with EINTR after 10ms, and then call `select(15ms)` again in the same while loop. With the next timer EINTR 10ms later, we would `return` out of the `while` loop, so we're not at risk of running forever due to EINTR. But we have now waited 20ms in total instead of the desired 15ms. That's why I currently believe that the timeout argument to this `select()` should be recalculated based on the current time in every iteration, similar to how my fix for `fdReady()` does it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): OK I have now also checked the situation on Windows: Windows `select()` does not set the `timeout` parameter to the remaining time. So currently `fdReady()` on Windows is as wrong as it is for the BSDs, and may wait forever even if given a max timeout. Source: https://msdn.microsoft.com/en- us/library/windows/desktop/ms740141(v=vs.85).aspx {{{ _In_ const struct timeval *timeout }}} I'll try to fix it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Hmm looks like I also discovered that `fdReady()` doesn't work at all on Windows for the `case FILE_TYPE_PIPE`. This example file {{{ import System.IO main = hWaitForInput stdin (5 * 1000) }}} when compiled with `ghc --make -O -fforce-recomp ../example-windows- pipe.hs && time ../example-windows-pipe.hs` terminates immediately instead of waiting for 5 seconds. The reason seems to be that the `PeekNamedPipe( hFile, NULL, 0, NULL, &avail, NULL )`, which is called only once in `fdReady()`, doesn't have any blocking feature. From the docs https://msdn.microsoft.com/en- us/library/windows/desktop/aa365779(v=vs.85).aspx
The function always returns immediately in a single-threaded application, even if there is no data in the pipe
(Not sure why it says "in a single-threaded application"; the same problem happens with `-threaded` and `+RTS -N2`.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): **An update for GHC 8.2:** GHC 8.2 has improved on this problem for POSIX platforms, two-fold: 1. GHC 8.2 has fixed the crashing problem of `fdReady()` that was introduced in GHC 8.0.2 (but not present in 8.0.1), that I mentioned in Comment 2 further above. 2. That fix (commit [http://github.com/ghc/ghc/commit/ae69eaed6e2a5dff7f3a61d4373b7c52e715e3ad ae69eaed] - "base: Fix hWaitForInput with timeout on POSIX") works by repeatedly inspecting the current time (because `select()` is gone and was replaced by `poll()` in commit [http://github.com/ghc/ghc/commit/f46369b8a1bf90a3bdc30f2b566c3a7e03672518 f46369b8], which doesn't have the functionality of advancing a timeout pointer even on Linux), so the problem that was described in the original issue description is gone for FreeBSD / OSX / any platform that's not Windows. However, there are remaining problems with the current in implementation: * It currently uses `gettimeofday()`, which doesn't use a monotonic clock, so any time adjustment can make `fdReady()` wait for significantly more or less than it should. * It keeps track of the total time waited by adding up time differences between calls to `gettimeofday()`: {{{ while ((res = poll(fds, 1, msecs)) < 0) { if (errno == EINTR) { if (msecs > 0) { struct timeval tv; if (gettimeofday(&tv, NULL) != 0) { fprintf(stderr, "fdReady: gettimeofday failed: %s\n", strerror(errno)); abort(); } int elapsed = 1000 * (tv.tv_sec - tv0.tv_sec) + (tv.tv_usec - tv0.tv_usec) / 1000; msecs -= elapsed; if (msecs <= 0) return 0; tv0 = tv; } } else { return (-1); } } }}} This is inaccurate, because the code between `gettimeofday()` and `tv0 = tv;` is not tracked. If the process is descheduled by the OS in these lines, then that time is "lost" and `fdReady()` will wait way too long. This inacurracy can easily be observed and magnified by increasing the frequency of signals arriving at the program. Consider this simple program `ghc-bug-13497-accuracy-test.hs` that waits 5 seconds for input on stdin: {{{ import System.IO main = hWaitForInput stdin (5 * 1000) }}} When run normally on GHC 8.2.1 (release commit 0cee25253), this program terminates within 5 seconds when run with the command line {{{ inplace/bin/ghc-stage2 --make -fforce-recomp -rtsopts ghc-bug-13497 -accuracy-test.hs /usr/bin/time ./ghc-bug-13497-accuracy-test }}} But it starts taking much longer when `+RTS -V...` is added for increasingly frequent values of `-V`; the effect is even stronger when setting the idle GC timer to something large (e.g. `-I10` for every 10 seconds): {{{ no `-V` passed 0.00user 0.00system 0:05.02elapsed 0%CPU -V0.1 0.00user 0.00system 0:05.01elapsed 0%CPU -V0.01 0.00user 0.00system 0:05.01elapsed 0%CPU -V0.001 0.00user 0.00system 0:05.13elapsed 0%CPU -V0.0001 0.00user 0.00system 0:05.30elapsed 0%CPU -V0.00001 0.06user 0.00system 0:05.31elapsed 1%CPU -V0.000001 0.37user 0.20system 0:05.73elapsed 10%CPU -V0.0000001 2.67user 3.30system 0:12.47elapsed 47%CPU -V0.00000001 50.44user 7.32system 1:17.50elapsed 74%CPU -I10 -V0.1 0.00user 0.00system 0:05.10elapsed 0%CPU -I10 -V0.01 0.00user 0.00system 0:05.25elapsed 0%CPU -I10 -V0.001 0.00user 0.10system 0:08.47elapsed 1%CPU -I10 -V0.0001 the program did not terminate within 2 minutes of waiting }}} It reason it's worse with `-I10` is that, as described above, without `-I` ghc stops the timer signal after 0.3 seconds (so no `EINTR`s are occurring beyond that time), and with `-I` given it doesn't. Not all of the above is in the non-threaded runtime. Doing the same with `-threaded` on Linux gives reliable `0.00user 0.00system 0:05.01elapsed 0%CPU` no matter what `-V` or `-I` is passed, and `strace` shows that there are no `EINTR`s happening in that case. I suspect this is because `hWaitForInput` calls `fdReady()` as a `safe` foreign call, which makes it have its own thread in the threaded runtime. There is also an `unsafe` call to `fdReady()` but that one is only used with timeouts of `0` so that's not a problem. On FreeBSD 11, non-threaded, the situation is worse: {{{ no `-V` passed 5.14 real 0.00 user 0.00 sys -V0.1 5.10 real 0.00 user 0.00 sys -V0.01 5.16 real 0.00 user 0.00 sys -V0.001 5.17 real 0.00 user 0.02 sys -V0.0001 5.24 real 0.00 user 0.01 sys -V0.00001 5.89 real 0.01 user 0.08 sys -V0.000001 5.81 real 0.00 user 0.11 sys -V0.0000001 6.05 real 0.00 user 0.09 sys -V0.00000001 5.77 real 0.00 user 0.09 sys -I10 -V0.1 5.13 real 0.00 user 0.01 sys -I10 -V0.01 5.24 real 0.00 user 0.01 sys -I10 -V0.001 5.90 real 0.00 user 0.10 sys -I10 -V0.0001 5.82 real 0.00 user 0.09 sys }}} And with `-threaded` on FreeBSD 11: {{{ no `-V` passed 5.15 real 0.00 user 0.01 sys -V0.1 5.30 real 0.00 user 0.00 sys -V0.01 5.31 real 0.00 user 0.01 sys -V0.001 5.45 real 0.00 user 0.13 sys -V0.0001 5.98 real 0.00 user 0.13 sys -V0.00001 5.93 real 0.00 user 0.15 sys -V0.000001 5.79 real 0.00 user 0.15 sys -V0.0000001 5.83 real 0.00 user 0.13 sys -V0.00000001 5.80 real 0.00 user 0.18 sys -I10 -V0.1 5.13 real 0.00 user 0.01 sys -I10 -V0.01 5.27 real 0.00 user 0.03 sys -I10 -V0.001 5.77 real 0.00 user 0.12 sys -I10 -V0.0001 5.90 real 0.00 user 0.18 sys }}} As you can see, on FreeBSD 11 the `-threaded` doesn't fix the issues as it does on Linux, and `truss` suggests that that is because `EINTR`s arrive (while they didn't on Linux). I'm not sure why with `threaded` on FreeBSD there's `EINTR`s happening but not on Linux, but I observed that on Linux we have instead: {{{ [pid 30502] timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 7 ... [pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8 [pid 30502] read(7, "\1\0\0\0\0\0\0\0", 8) = 8 ... }}} So I suspect that the difference is that Linux has `timerfd` and FreeBSD doesn't. ---- OK, so far the problem description. **The summary is:** * In the nonthreaded runtime, a high precision `-V` destroys accuracy * On non-Linux (systems without `timerfd`), this happens even with `-threaded` * In any runtime, accuracy can be screwed with due to non-use of monotonic clocks. **The fix** is simple: Use the monotonic clock, and instead of tracking waited time as a sum of wait intervals, always compare the current time with the _end_ time (time at entry of `fdReady()` + `msec`). I have implemented this in commit [https://github.com/nh2/ghc/commit/12f9d66b5c837c221be080b526dcb61fecb7cf1c 12f9d66b] of my branch [https://github.com/ghc/ghc/compare/ghc-8.2.1-release...nh2:ghc-8.2.1 -improve-fdRready-precision ghc-8.2.1-improve-fdRready-precision] (the first link is stable, the latter may change as I write more fixes for Windows). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): With my fix, I reliably get, independent of `-threaded` or not, on Linux: {{{ 0:05.00elapsed }}} And on FreeBSD 11: {{{ 5.00 real }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Indeed you were right, nh2; the current state of affairs looks quite bad. Thanks for looking at this! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by slyfox): * cc: slyfox (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): When testing my change on Windows, I found that the `if (isSock){ select(...) }` part seems completely broken on Windows. That is because on Windows `FD_SETSIZE` defaults to 64, but pretty much all GHC programs seem to have > 64 FDs open, so you can't actually create a socket on which you can `select()`. It errors with `fdReady: fd is too big` even with an example as simple as [https://github.com/nh2/ghc-socket- test/blob/af74bb348e88f3f7ca717c70540936fea1293257/ghc-socket-test.hs this] (in this case, on my machine the `fd` is `284`): {{{ {-# LANGUAGE OverloadedStrings #-} import Control.Monad (forever) import Network.Socket import System.IO -- Simple echo server: Reads up to 10 chars from network, echoes them back. -- Uses the Handle API so that `hWaitForInput` can be used. main :: IO () main = do sock <- socket AF_INET Stream 0 setSocketOption sock ReuseAddr 1 bind sock (SockAddrInet 1234 0x0100007f) -- 0x0100007f == 127.0.0.1 localhost listen sock 2 forever $ do (connSock, _connAddr) <- accept sock putStrLn "Got connection" h <- socketToHandle connSock ReadWriteMode hSetBuffering h NoBuffering ready <- hWaitForInput h (5 * 1000) -- 5 seconds putStrLn $ "Ready: " ++ show ready line <- hGetLine h putStrLn "Got line" hPutStrLn h ("Got: " ++ line) hClose h }}} I'm not sure how this was not discovered earlier; for #13525 (where `fdReady()` breaking completely was also discovered late) at least it failed only when the timeout was non-zero, which is not used in ghc beyond in `hWaitForInput`, but in this Windows socket case it breaks even on the 0-timeout. Maybe there is not actually anybody who uses sockets as handles on Windows? It seems an approriate workaround for now is to increase `FD_SETSIZE` (which is possible on Windows and BSD, see [https://stackoverflow.com/questions/7976388/increasing-limit-of-fd- setsize-and-select here]) on Windows. A real fix would be to move to IO Completion Ports, and thus get rid of the last use of `select()` (the other platforms already use `poll()` but Windows doesn't have that). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Addition to above comment: Probably the only time that worked was before commit [https://github.com/ghc/ghc/commit/b0316cdb10fbd9eaca7ede28c7bb3eb19f7766bf b0316cdb] (#9169) that added the `FD_SETSIZE` check -- in which case it probably only worked due to writing to (and likely corrupting) out-of- bounds memory beyond the size of the `fd_set`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): I have pushed a workaround to my branch [https://github.com/ghc/ghc/compare/ghc-8.2.1-release...nh2:ghc-8.2.1 -improve-fdRready-precision ghc-8.2.1-improve-fdRready-precision], namely commit [https://github.com/nh2/ghc/commit/1d037a16251751e98d4302f4be1e5659c3240958 1d037a16]. This completes my testing of my branch on Windows. I have tested the 4 of the 5 possible branches that `fdReady()` can take on Windows: 1. Testing the `FILE_TYPE_PIPE` case suceeded with `ghc-bug-13497 -accuracy-test.hs` (from [https://ghc.haskell.org/trac/ghc/ticket/13497#comment:18 comment 18]), by running it from the MSYS2 Mingw-w64 console (NOT `cmd.exe`!) 2. Testing the `FILE_TYPE_DISK` case succeeded with the following file `ghc-bug-13497-windows-file-test.hs`: {{{ import System.IO main = do writeFile "testfile" "hello" handle <- openFile "testfile" ReadMode hWaitForInput handle (5 * 1000) }}} 3. Testing the `FILE_TYPE_CHAR` case suceeded by running `ghc-bug-13497 -accuracy-test.hs` (see above) , but this time from `cmd.exe`. Here, focusing and unfocusing the Window triggers `EventType == FOCUS_EVENT` so that our loop is executed (I confirmed this with a print in the loop). 4. Testing the `if (isSock)` case succeded with the echo server example given in the previous comment. 5. I did not test the `default:` case ([https://github.com/nh2/ghc/blob/1d037a16251751e98d4302f4be1e5659c3240958/lib... this one]), because I don't know how to trigger it. There are 3 cases in which this branch could be taken according to the `GetFileType()` [https://msdn.microsoft.com/en- us/library/windows/desktop/aa364960(v=vs.85).aspx documentation]: A. `FILE_TYPE_REMOTE`, which seems impossible to create on modern Windows B. `FILE_TYPE_UNKNOWN`, for which it is literally unknown on how it can be created C. The very bottom of the `FILE_TYPE_PIPE` case, when `/* PeekNamedPipe didn't work - fall through to the general case */` happens. I don't know how to trigger this. Simon Marlow introduced this logic 9 years ago in commit [https://github.com/nh2/ghc/commit/1b0906e072344376ec7cb95afe314f2157b62b20 1b0906e0], so maybe he knows. In any case, with this testing done I'm reasonably confident that my branch doesn't regress Windows. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Bugs
See the discussion of spurious readiness notifications under the BUGS
#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Another thing I found: The comment on top of `fdReady()` says: {{{ "Input is available" is defined as 'can I safely read at least a *character* from this file object without blocking?' }}} But this seems to be not true in the general case: If the FD is for a socket, then it is only true when that socket is non- blocking. `man 2 poll` (see [https://linux.die.net/man/2/poll here]) describes this: section of select(2). `man 2 select` (see [https://linux.die.net/man/2/poll here]) says:
Under Linux, `select()` may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use `O_NONBLOCK` on sockets that should not block.
I have not investigated whether anywhere in GHC `fdReady()` is potentially called with a not-nonblocking socket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13497: GHC does not use select()/poll() correctly on non-Linux platforms
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Runtime System | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking: 8684
Related Tickets: #8684, #12912 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#13497: GHC does not use select()/poll() correctly on non-Linux platforms -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: 8684 Related Tickets: #8684, #12912 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by nh2: Old description:
From my discovery at https://phabricator.haskell.org/D42#30542:
{{{ Why does the existing code work on platforms that are not Linux? In my select man page it says:
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 per‐ mits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns.
The existing select loop seems to rely on the fact that &tv is updated as described here. }}}
Same for `man 2 poll`.
E.g. `man 2 select` on FreeBSD 11 says explicitly:
{{{ BUGS Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to modify the original timeout in place. Thus, it is unwise to assume that the timeout value will be unmodified by the select() system call. FreeBSD does not modify the return value, which can cause problems for applications ported from other systems. }}}
I have tested this now on FreeBSD, and indeed it doesn't work as expected.
With GHC 7.10.2:
{{{ import System.IO main = hWaitForInput stdin (1 * 1000) }}}
`ghc --make test.hs -rtsopts`
{{{ [root@ ~]# time ./test
real 0m1.386s user 0m0.004s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.01
real 0m1.386s user 0m0.001s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.001
real 0m1.678s user 0m0.003s sys 0m0.002s [root@ ~]# time ./test +RTS -V0.0001
real 0m11.311s user 0m0.032s sys 0m0.139s }}}
See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should.
That's because it triggers the case where EINTR is received in https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/..., letting us use the same unmodified 1-second `struct timeval *timeout` again and again.
This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 `hWaitForInput` is broken (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4) so the demo doesn't work there.
New description: From my discovery at https://phabricator.haskell.org/D42#30542: {{{ Why does the existing code work on platforms that are not Linux? In my select man page it says: On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 per‐ mits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns. The existing select loop seems to rely on the fact that &tv is updated as described here. }}} Same for `man 2 poll`. E.g. `man 2 select` on FreeBSD 11 says explicitly: {{{ BUGS Version 2 of the Single UNIX Specification (``SUSv2'') allows systems to modify the original timeout in place. Thus, it is unwise to assume that the timeout value will be unmodified by the select() system call. FreeBSD does not modify the return value, which can cause problems for applications ported from other systems. }}} I have tested this now on FreeBSD, and indeed it doesn't work as expected. With GHC 7.10.2: {{{ import System.IO main = hWaitForInput stdin (1 * 1000) }}} `ghc --make test.hs -rtsopts` {{{ [root@ ~]# time ./test real 0m1.386s user 0m0.004s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.01 real 0m1.386s user 0m0.001s sys 0m0.000s [root@ ~]# time ./test +RTS -V0.001 real 0m1.678s user 0m0.003s sys 0m0.002s [root@ ~]# time ./test +RTS -V0.0001 real 0m11.311s user 0m0.032s sys 0m0.139s }}} See how when we increase the timer signal, the sleep suddenly takes 10x longer than it should. That's because it triggers the case where EINTR is received in https://github.com/ghc/ghc/blob/f46369b8a1bf90a3bdc30f2b566c3a7e03672518%5E/..., letting us use the same unmodified 1-second `struct timeval *timeout` again and again. This demo of the bug works for GHC 7.10 and 8.0.1; in 8.0.2 `hWaitForInput` is broken (https://ghc.haskell.org/trac/ghc/ticket/12912#comment:4) so the demo doesn't work there. --- Convenience: Here is the call chain of [https://gist.github.com/nh2/6f571ce00667bc49d845ab4c8fdf9769 hWaitForInput] -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13497#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC