[GHC] #13944: Introduce synchronized FFI

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- During hacking on ghc I/O libraries, i find an [https://github.com/winterland1989/stdio/tree/master/bench/diskIO interesting situation], to summarize: There're some FFI calls that vary a lot in time duration, this makes unsafe FFI bad because it make GC hard to synchronize all capacities. OTOH safe FFI 's overhead is too large to justify its usage. After some though i think we should Introduce a new FFI type, let call it `synchronized` for now, it should: 1. Have the same requirement as safe FFI of what kind of argument we can pass, e.g. pinned ByteArray, pass by value primitive type, etc. 2. It works like unsafe FFI call by directly jump to c code using current capacity's stack. 3. It should mark current capacity GC ready, and doesn't participate next GC. The point is that if a FFI call doesn't need GC guarantee, then it shouldn't stop GC running. The next GC will not be running with all threads, but that's not too terrible since it doesn't scale too much anyway. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by winter: @@ -8,1 +8,1 @@ - After some though i think we should Introduce a new FFI type, let call it + After some thought i think we should introduce a new FFI type, let call it New description: During hacking on ghc I/O libraries, i find an [https://github.com/winterland1989/stdio/tree/master/bench/diskIO interesting situation], to summarize: There're some FFI calls that vary a lot in time duration, this makes unsafe FFI bad because it make GC hard to synchronize all capacities. OTOH safe FFI 's overhead is too large to justify its usage. After some thought i think we should introduce a new FFI type, let call it `synchronized` for now, it should: 1. Have the same requirement as safe FFI of what kind of argument we can pass, e.g. pinned ByteArray, pass by value primitive type, etc. 2. It works like unsafe FFI call by directly jump to c code using current capacity's stack. 3. It should mark current capacity GC ready, and doesn't participate next GC. The point is that if a FFI call doesn't need GC guarantee, then it shouldn't stop GC running. The next GC will not be running with all threads, but that's not too terrible since it doesn't scale too much anyway. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by winter: @@ -20,0 +20,4 @@ + 4. When the FFI finish, if we're GC ing, put current capaity into sleep(if + we can find a way to let it join parallel GC that would be better). + Otherwise it continue to work. + @@ -23,1 +27,2 @@ - anyway. + anyway, we can combine this with the new `-qn` RTS parameter: as long as + there're more than `n` threads can join GC, we do GC. New description: During hacking on ghc I/O libraries, i find an [https://github.com/winterland1989/stdio/tree/master/bench/diskIO interesting situation], to summarize: There're some FFI calls that vary a lot in time duration, this makes unsafe FFI bad because it make GC hard to synchronize all capacities. OTOH safe FFI 's overhead is too large to justify its usage. After some thought i think we should introduce a new FFI type, let call it `synchronized` for now, it should: 1. Have the same requirement as safe FFI of what kind of argument we can pass, e.g. pinned ByteArray, pass by value primitive type, etc. 2. It works like unsafe FFI call by directly jump to c code using current capacity's stack. 3. It should mark current capacity GC ready, and doesn't participate next GC. 4. When the FFI finish, if we're GC ing, put current capaity into sleep(if we can find a way to let it join parallel GC that would be better). Otherwise it continue to work. The point is that if a FFI call doesn't need GC guarantee, then it shouldn't stop GC running. The next GC will not be running with all threads, but that's not too terrible since it doesn't scale too much anyway, we can combine this with the new `-qn` RTS parameter: as long as there're more than `n` threads can join GC, we do GC. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I don't see how this could work (or I've misunderstood your proposal). Consider that you have a thread with some reference to some heap object (call it `A`) sitting in another capability's nursery. Let's say that it enters a `synchronized` call and while the call is running a GC is started. During this GC another other capability evacuates `A` (producing a new object, say `A'`) and the nursery in which it lived is freed. Now we have a situation where the thread who made the call has a reference to `A`, an object which existed in a no-longer-existing heap block, which it may attempt to dereference once the call completes. The distinguishing feature of the `safe` call mechanism is that we suspend the running thread, saving all of its state back to its TSO, where they can be handled by the garbage collector. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I'm not even sure that my proposal work :) but let's try to have a discussion on this idea. Remember the new `-qn` RTS flag, which allow some inactive threads doesn't participate GC, (The word participate seems too vague, i think it means the thread doesn't do GC work, but do get GCed by other GC worker thread). Now a thread which enter `synchronized` call will behave exactly like an inactive thread which is not selected to do GC work, that means its root will get GCed by some other thread, which is OK since `synchronized` FFI restrict what kind of heap object it can reference. I'm not totally clear how this idea works because i'm not an expert on RTS, all clarifications and suggestions are welcomed! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): OK, now i'm pretty sure what `synchronized` would do, it will behave just like a `safe` FFI call with one difference: a task which comes across a `synchronized` call will give off its capacity, but will not hand off it to other tasks unless that's a GC task. This means `synchronized` calls will still block whole capacity, but it will not spawn new threads, nor blocking GC. I'm not sure if this make sense, it seems the overhead of safe FFI calls come from both the capacity transfer and creating new OS threads. But i'm not sure which is more costly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Alright, I think I see what you want. So you are targetting applications where you have a potentially-blocking foreign call and therefore can't use `unsafe` calls but want to avoid the cost associated with suspending a Haskell thread. This cost comes in a variety of forms, 1. a stack walk, due to the duplicate computation check in `suspendThread` 2. a few lock operations to safely modify the capability 3. the cost of saving and restoring the thread state to the TSO Judging from this I'd guess this will cost anywhere from hundreds to thousands of cycles (largely depending upon stack depth, I suspect), and consequently it seems plausible that we'd want a way to avoid this. Have you done any characterisation of this? The problem is that, other than perhaps the stack walk, I don't think any of these costs can be avoided. Afterall, we need to save the thread state to the TSO in order to safely GC (since TSOs are GC roots). I suspect it would also be necessary to lock the capability. Consequently, it's really not clear to me that this new call type would pull its weight. It would be helpful if we had actual measurements of how much the various contributions above cost. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter):
Alright, I think I see what you want. So you are targetting applications where you have a potentially-blocking foreign call and therefore can't use `unsafe` calls but want to avoid the cost associated with suspending a Haskell thread.
Exactly! For example current I/O manager first perform an unsafe read and check if result is eAgain, but for regular file it will never return eAgain, which turn this unsafe read into a blocking read. If unfortunately OS didn't manage page cache for us, we end up with a long capacity blocking read.
The problem is that, other than perhaps the stack walk, I don't think any of these costs can be avoided. Afterall, we need to save the thread state to the TSO in order to safely GC (since TSOs are GC roots). I suspect it would also be necessary to lock the capability.
Is there a way to do it lazily? By say lazily i meaning, can we just perform the FFI call and mark the capacity with some kind of `SYNC` flag. If a GC thread is asking for sync during FFI, then let it perform these stack saving for us. Otherwise we just happily continue, and saving all the overheads. I think this might work since once we enter a FFI call, the haskell thread's stack are not touch anymore, and the whole capacity is freezed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

The problem is that, other than perhaps the stack walk, I don't think any of these costs can be avoided. Afterall, we need to save the thread state to the TSO in order to safely GC (since TSOs are GC roots). I suspect it would also be necessary to lock the capability.
Is there a way to do it lazily? By say lazily i meaning, can we just
#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): perform the FFI call and mark the capacity with some kind of SYNC flag. If a GC thread is asking for sync during FFI, then let it perform these stack saving for us. Otherwise we just happily continue, and saving all the overheads. I'll assume by "it" you mean the thread-state save. I don't believe it is possible; afterall, there may be thread state (which may contain reference to heap objects) sitting in machine registers. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter):
I'll assume by "it" you mean the thread-state save. I don't believe it is possible; afterall, there may be thread state (which may contain reference to heap objects) sitting in machine registers.
Yes, i mean all the thread pause stuff. I want to understand what is the minimum preparation before we can make FFI calls which doesn't get in GC's way. I believe this helps to reduce GC pause. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
Yes, i mean all the thread pause stuff. I want to understand what is the minimum preparation before we can make FFI calls which doesn't get in GC's way. I believe this helps to reduce GC pause.
I don't believe there is anything that we do currently that can simply be omitted without careful consideration at very least. Have you confirmed that `safe` is really as slow as you think it is? I think that we should better characterise the problem before we begin discussing possible solutions. I think it would be helpful if you had a benchmark and could use it answer a few questions, a. What overhead is incurred by a `unsafe` call? b. What overhead is incurred by a `safe` call? c. Where specifically is the additional overhead of the `safe` call coming from? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I have did some benchmarks[https://github.com/winterland1989/stdio/tree/newFD/bench/diskIO here], my conclusion is that under concurent disk I/O situations, safe FFI can cause a slow down up to 50%. The problem is if the disk is slow enough, or the file is not in page cache, then safe FFI may be a good choice. OTOH, if a blocking read doesn't take too much time, we'd better simply issue an unsafe read. I can't answer all the questions you're asking now though, i'm thinking for disk I/O a small thread pool maybe a better answer, since the I/O throughput will not increase or even downgrade when conncurrent number exceed a certain value. (for example, libuv do this). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
I have did some benchmarks​ here, my conclusion is that under concurent disk I/O situations, safe FFI can cause a slow down up to 50%.
I can't reproduce this result. I've checked out your benchmark and see the following, == unsafe == {{{ [1250 ben@ben-laptop diskIO(master)] $ time bin/unsafe-ffi 10m real 0m0.657s user 0m0.000s sys 0m1.035s [1251 ben@ben-laptop diskIO(master)] $ time bin/unsafe-ffi 10m real 0m1.766s user 0m0.000s sys 0m1.912s [1251 ben@ben-laptop diskIO(master)] $ time bin/unsafe-ffi 10m real 0m1.460s user 0m0.008s sys 0m1.570s }}} == safe == {{{ [1252 ben@ben-laptop diskIO(master)] $ time bin/safe-ffi 10m real 0m0.867s user 0m0.000s sys 0m0.966s [1252 ben@ben-laptop diskIO(master)] $ time bin/safe-ffi 10m ^[[A real 0m0.473s user 0m0.001s sys 0m0.636s [1252 ben@ben-laptop diskIO(master)] $ time bin/safe-ffi 10m real 0m0.794s user 0m0.009s sys 0m0.864s }}} If anything, it seems that `safe` FFI is dramatically **faster** than `unsafe`. Admittedly this is rather surprising, but I wouldn't expect a factor of two slower for this particular test. In the `safe` case `perf` confirmed that less than 3% of runtime was spent in `suspendThread` and `resumeThread` combined. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I'm not sure if only the 10m runtimes were large enough to be meaningful, the problems is that for large files, we indeed spend more time on read and write instead of FFI scheduling, and unsafe FFI's blocking behaviour is indeed bad in this case. The different size of test is really supposed to test how different strategy react to each kind of workload, but i may have been failed to do that in that test though. FYI, currently in base we always do an unsafe read/write before call threadWaitRead/Write on *nix system, and since disk file never block, this is simply same as issue an unsafe call. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari):
I'm not sure if only the 10m runtimes were large enough to be meaningful
The benchmark when run against the `1m` input took less than 50 milliseconds. I doubt that this is enough for the benchmark to get into enough of a steady-state to be representative of actual performance. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13944: Introduce synchronized FFI -------------------------------------+------------------------------------- Reporter: winter | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.6.1 => 8.8.1 Comment: Won't happen for 8.6. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13944#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC