[GHC] #8400: Migrate the RTS to use libuv (or libev, or libevent)

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------------------------+---------------------------- Reporter: schyler | Owner: Type: feature request | simonmar Priority: normal | Status: new Component: Runtime System | Milestone: Keywords: | Version: Architecture: Unknown/Multiple | Operating System: Difficulty: Project (more than a week) | Unknown/Multiple Blocked By: | Type of failure: Related Tickets: 635, 7353 | None/Unknown | Test Case: | Blocking: ----------------------------------------------+---------------------------- This is mainly a reference discussion ticket. libuv (https://github.com/joyent/libuv) is a lightweight library which allows asynchronous IO across OpenBSD, Linux, Darwin, Windows etc by utilizing the fastest implementation on each system (epoll, kqueue, IOCP, event ports). These specialized IO polling methods are '''much''' faster on their respective platforms than just using select() like the RTS currently does. Additionally, it also provides cross platform threads, mutex, condition vars, terminal input/output and term settings w/ cross platform ANSI escape code handling, thread pools, cross platform HRC's etc. It's currently significantly faster than libevent and slightly faster than libev (and no doubt faster than rolling our own stuff). Because it's maintained and utilized heavily by Node.js it's in extremely active and maintained development. Rewriting a portion of the RTS to utilize libuv would have the following benefits; * We could ditch basically all of our platform specific code. Everything under rts/win32 and rts/posix, except the SEH stuff, could be deleted. * libuv is tuned for speed on each platform. This would be an optimization to all our async IO stuff. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by schyler): Correction: kqueue/epoll support currently does exist, but it seems to be in GHC.Event rather than baked into the RTS (?). #7353 discusses using IOCP on Windows. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Changes (by schyler): * cc: schyler (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by tibbe): You might want to start by reading the 3 papers of the I/O manager's evolution and implementation: Extending the Haskell Foreign Function Interface with Concurrency http://community.haskell.org/~simonmar/papers/conc-ffi.pdf Scalable I/O Event Handling for GHC http://research.google.com/pubs/archive/36841.pdf Mio: A High-Performance Multicore IO Manager for GHC http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf We briefly considered using libev when we did the first I/O manager rewrite, but any library that relies on callbacks will not work well as callbacks from C to Haskell are expensive. I have thought about integrating the I/O manager, which now runs in a separate thread, into the scheduler this might (or might not) give us lower latency and somewhat better request/s performance per core. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by schyler): Wouldn't it be faster to move all of the I/O stuff into the RTS as C and then only end up back in haskell-land when fd stuff is finished? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Wouldn't it be faster to move all of the I/O stuff into the RTS as C and
#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by tibbe): Replying to [comment:4 schyler]: then only end up back in haskell-land when an fd event is finished? As I mentioned do *something* in the RTS instead of in a separate Haskell thread is probably going to be somewhat faster than what we do today.
In this case, libuv is probably appropriate.
It will likely not work, as libuv want to own the thread to drive the event loop. The scheduler needs to own the thread for it to work. Perhaps we could run libuv in a separate thread like we do with the I/O manager today. I also don't know if libuv is thread safe. We most likely want to make some assumptions/optimizations based on our special needs and I'm not sure if libuv can support that efficiently. In the end performance will probably be best if we call the underlying system calls directly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by thoughtpolice): I think there are separate discussions happening here. One is about moving to libuv because it provides some nice abstractions we don't have to roll ourselves, and it works on Windows - and the other is about moving the scheduler and related stuff around for better I/O performance. These are somewhat separate discussions, IMO. To move the I/O manager into the RTS (and port it from C to Haskell) would be a sizeable amount of complex work, everything else aside (it's much, much easier to get all the tricky multicore stuff right in Haskell, obviously!) Of course this shouldn't kill the suggestion, but it's worth bringing this up - it adds a sizeable amount of complexity to an already extremely complex thing (the RTS.) It also adds other tradeoffs. This isn't a thing we're new to - we often carry the complexity burden for users - but we also don't want to totally overblow our complexity budget. Frankly I think we should look into improvements beyond "totally rewrite it in C" - this should be a last resort only, unless some numbers would suggest huge order-of-magnitude improvements. For windows, #7353 discusses some of the issues with Joey's I/O manager, the main one being that it needs some sort of scheduler integration to help mitigate the performance loss from round-tripping through the kernel on every I/O request. I can definitely buy that scheduler integration may help latency numbers on all platforms. There's also the somewhat-related issue that `-threaded` always adds a bit of latency anyway, and the Mio paper touches on this. Perhaps we should do a more thorough investigation and experiments first. And also, Joey really just wanted interruptible socket timeouts. Perhaps as a first step, we should: A) investigate if Joey's patches in #7353 can be resurrected, perhaps as a library (his work happened before the introduction of Mio!) and B) see if there's performance on the table. There's probably some scope for this if we can find a talented Windows engineer to run some good numbers. If A) can happen at least, and the improvements are stable, I may not be opposed to integrating this I/O manager into `base`, provided Simon thinks it's OK. Although bad performance may be unfortunate, it also brings feature parity to windows in this area (which was woefully limited in other ways.) As for the other stuff libuv provides - we already tend to have very lightweight wrappers around most system concurrency primitives so I'm not sure how much we need it, but being maintained and less code for us is a win too. (But I don't find this to be too much of a selling point to us, personally. Just a unified interface to IOCP/epoll/kqueue is enough if we were to use it.) Not to deter anyone, I just find the discussion here touching on a few related things, and it's all quite a lot to consider and think about! I think there's a ton of scope for improvement here, but it seems quite open ended in terms of design and implementation. Performance numbers and patches welcome! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by tibbe): I generally agree with thoughtpolice's comments. At this point I don't think we will get much further without building a prototype or two. I think moving the I/O manager into the scheduler can also remove some complexity: * We now have a bunch of Haskell-side global state that could live more naturally on the capability data structure. * The synchronization situation would be a bit simpler inside the RTS (and there would be one less thread). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by simonmar): Agree with everything that @thoughtpolice said. Also, I'm not generally swayed by arguments of the form "we should use library X because it is actively maintained and does job Y that we already do", because external dependencies have their own costs - let's not forget we've had problems with all of gmp, libffi and LLVM. There are benefits to being in control of your own code, and when the functionality already exists and works (as in the case of the rts code) it's an unforced change. These things are never black and white, and we have to weigh up any benefits we might get against the costs of incurring an external dependency. I just want to point out that we shouldn't add new dependencies without thinking very carefully. @tibbe: I suggest first identifying the problem that would be solved by moving the I/O manager into the RTS. Identify where we have extra latency, and explain why it can't be solved in Haskell. Moving the I/O manager wholesale into the RTS is a huge job, because you don't get to take advantage of nice things like atomicModifyIORef and immutable data structures, and the interaction with the scheduler is likely to be very tricky indeed. There would need to be significant payoff. Adding small hooks is a better approach, if we can find out what hooks would help - e.g. per-thread state is something we don't have a good way to do right now, and would be a generally useful thing to add. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Comment (by simonpj): For what it's worth, my gut feel is that we should be moving fuctionality out of the monolithic RTS written in C, and into Haskell libraries. Moving the I/O manager into the RTS would be a move in the opposite direction. It could conceivably be the right thing to do, but my nose tell me otherwise. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: new request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Changes (by AndreasVoellmy): * cc: andreas.voellmy@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) ----------------------------+---------------------------------------------- Reporter: schyler | Owner: simonmar Type: feature | Status: closed request | Milestone: Priority: normal | Version: Component: Runtime | Keywords: System | Architecture: Unknown/Multiple Resolution: wontfix | Difficulty: Project (more than a week) Operating System: | Blocked By: Unknown/Multiple | Related Tickets: 635, 7353 Type of failure: | None/Unknown | Test Case: | Blocking: | ----------------------------+---------------------------------------------- Changes (by simonmar): * status: new => closed * resolution: => wontfix Comment: Closing because it's not clear that there's any benefit to this and the discussion wondered off in a different direction. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by winter): * owner: simonmar => (none) * status: closed => new * resolution: wontfix => Comment: I'm researching on combining libuv with ghc's light weight thread recently, here's my initial design: https://github.com/winterland1989/stdio/issues/6. I'll report in this thread if i manage to have some benchmark numbers. If it works, The benifit is quite obvious: we don't have to deal with all the windows/encoding hacks and quirks anymore in base. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Phyx-): We don't have to deal with the hacks either if we have a clean design in Base that properly uses UTF16 internally as it should for Windows. My feeling on this, is that in order to get things like IOCP working correctly using libuv you still need to change a significant part of base. You still need to - define an new IODevice etc. - change the internal encoding of GHC to support both UTF16 and UTF32 (there is some code for this from when simonmar rewrote the I/O manager but I'm not sure the state it's in). - somehow get the scheduler of libuv to interact nicely with that of the rts. You for instance still want to have async I/O using the non-threaded rts. - change functions such as openFile to create handles using `FILE_FLAG_OVERLAPPED`. Especially the second one is not trivial. Getting IOCP support and/or RIO functionality in is actually fairly trivial, but if you don't change the internal encodings of base, you'll just take a hit by having it convert from UTF16 to UTF32 and back to UTF16. I am working on an implementation the other way, moving the I/O manager into Haskell and reworking most of base to use the right internal encoding for each platform. It's based on Joey's patches https://github.com/Mistuke /ghc-win-io-system was the state before merging into base. So We'll have a way to compare the two approaches then to see which is best. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I'm aware of your effort actually, it's here:[https://github.com/Mistuke /ghc-win-io-system/] isn't? And i'm sure you are an windows expert ;) The old `IODevice` typeclass is not satisfactory and i indeed want to define new one. The encoding problems in base use functions from C runtime on windows instead of using windows API directly, which is the culprit IMO. Because most of the windows API have wide-char version which can directly use UTF-16, isn't? But libuv really solved most of those problems VERY WELL. for example it mapped window console API to accept ansi escaped code, so we get colorful ansi terminal for free, and it does the heavy work to shim the UTF-16 encoded `readConsoleInputW/writeConsoleOutputW` to accept UTF-8 buffers, which reduced much of the headache. The real remaining problem is how to make libuv with with ghc's scheduler, which i think i have a plan, i describe it in the github link above, please have a look. Finally, if we can manage leverage libuv to do I/O in haskell, we can save lots of repeated work, libuv is really an all-in-one solution for I/O problems: tcp, pipes, process, file watchers, etc. You name it! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): My libuv branch finally start to run! For now i only test it with concurrent tcp clients, judged by running time, eventlog and `RTS -s` report, i'm pretty sure its performance is on pair with the old IO manager, Now i'm try to bring a tcp server benchmark, let's see how fast it is going to be. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I got my libuv based tcp server running! It performs almost identical under my thinkpad w540, here's a quick benchmark numbers: {{{ ~/Code/stdio/bench/libuv(libuv*) » wrk -c1000 -d10s http://127.0.0.1:8888 winter@winter-thinkpad-w540 Running 10s test @ http://127.0.0.1:8888 2 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.58ms 4.35ms 233.38ms 99.16% Req/Sec 109.35k 16.85k 165.69k 65.66% 2158626 requests in 10.02s, 10.26GB read Requests/sec: 215444.90 Transfer/sec: 1.02GB }}} In contrast, here is the original I/O manager in base, aka mio: {{{ ~/Code/stdio/bench/libuv(libuv*) » wrk -c1000 -d10s http://127.0.0.1:8888 winter@winter-thinkpad-w540 Running 10s test @ http://127.0.0.1:8888 2 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 5.03ms 11.50ms 436.51ms 99.53% Req/Sec 106.39k 6.27k 124.92k 77.50% 2117274 requests in 10.07s, 10.07GB read Requests/sec: 210264.57 Transfer/sec: 1.00GB }}} The benchmark code is here: https://github.com/winterland1989/stdio/tree/libuv/bench/libuv What is interesting here is the `+RTS -s` figures, first is the mio one: {{{ 30,227,784,440 bytes allocated in the heap 4,608,251,832 bytes copied during GC 4,058,040 bytes maximum residency (1442 sample(s)) 4,537,568 bytes maximum slop 17 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 34042 colls, 34042 par 9.782s 2.468s 0.0001s 0.0074s Gen 1 1442 colls, 1441 par 4.523s 1.140s 0.0008s 0.0043s Parallel GC work balance: 75.40% (serial 0%, perfect 100%) TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 24.940s ( 10.609s elapsed) GC time 14.305s ( 3.608s elapsed) EXIT time 0.002s ( 0.001s elapsed) Total time 39.248s ( 14.218s elapsed) Alloc rate 1,212,033,615 bytes per MUT second Productivity 63.5% of total user, 74.6% of total elapsed gc_alloc_block_sync: 357989 whitehole_spin: 1 gen[0].sync: 4817566 gen[1].sync: 326374 }}} Here's my libuv code's figure after `wrk` 's load: {{{ 6,666,812,808 bytes allocated in the heap 2,177,870,680 bytes copied during GC 3,574,680 bytes maximum residency (1370 sample(s)) 5,571,840 bytes maximum slop 16 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 6034 colls, 6034 par 4.802s 1.220s 0.0002s 0.0063s Gen 1 1370 colls, 1369 par 3.994s 1.010s 0.0007s 0.0025s Parallel GC work balance: 76.00% (serial 0%, perfect 100%) TASKS: 13 (1 bound, 12 peak workers (12 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 23.782s ( 13.182s elapsed) GC time 8.796s ( 2.230s elapsed) EXIT time 0.001s ( 0.001s elapsed) Total time 32.581s ( 15.413s elapsed) Alloc rate 280,326,860 bytes per MUT second Productivity 73.0% of total user, 85.5% of total elapsed gc_alloc_block_sync: 253553 whitehole_spin: 0 gen[0].sync: 3071193 gen[1].sync: 186169 }}} It seems that my new libuv based I/O manager reduce LOTS of allocations(maybe they're just moved to C side). Overall, i think my evaluation on libuv is successful, i'd like to discuss the possibilities to integrate it with base. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by alexbiehl): Winter this is interesting. As you have setup the benchmark already. Could you provide a heap profile for ghc MIO? It would be interesting where the massive allocation come from. I think a simple +RTS -hT -RTS should suffice (no need to compile with -fprof, thanks Herbert for the tip). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I don't know what Jaffacake's opinion on this is but I'm not terribly keen on picking up a `libuv` dependency. We currently have very few native dependencies and I think that's a good thing; native dependencies are bound to either increase the complexity of the build system (if we statically link) or dramatically complicate distribution (if we dynamically link). If we could make the `libuv` backend optional and keep the existing codepath then maybe I can see us merge this, but otherwise it seems like we should simply try to optimize what we have rather than throw the whole thing away. I strongly suspect there is some very low-hanging fruit in mio. I'm willing to be convinced otherwise though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by winter): * Attachment "libuv.hp" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by winter): * Attachment "mio.hp" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

If we could make the libuv backend optional and keep the existing codepath then maybe I can see us merge this, but otherwise it seems like we should simply try to optimize what we have rather than throw the whole
#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I have attached the heap profile, the previous `+RTS -s` seems to be a mistake(I don't have a clue why it report such a big different), mio do allocate more memory, but that much. The benchmark parameters are: {{{wrk -c1000 -d10s http://127.0.0.1:8888}}} The request rate are {{{ libuv: Requests/sec: 264542.07 mio: Requests/sec: 236880.75 }}} thing away. I personally feel no rush to change current base's I/O manager: it works well, and needed by GHCi. But since this ticket's purpose is to discuss the possibilities of integrate libuv, i would like to hear more.
native dependencies are bound to either increase the complexity of the build system (if we statically link) or dramatically complicate distribution (if we dynamically link).
I'm interested in distribute my library with a statically linked libuv, is it possible with cabal? libuv is quite easy to build on unix, but i don't know the situation on windows, it seems require visual studio.
Further, it looks like libuv doesn't even support all of the platforms which GHC supports (OpenBSD, for instance).
This is definitely not true, Node.js officially support OpenBSD so i don't see a reason why libuv not. I guess OpenBSD is just not on their CI.
I strongly suspect there is some very low-hanging fruit in mio.
Can't say for libuv branch, but I think i got some other low-hanging fruit, e.g. i'm preparing the patch for `primitive` since you asked. It will include the `PrimArray a` and `class Arr (marr :: * -> * -> *) (arr :: * -> * ) a | arr -> marr, marr -> arr`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): Here's my analysis on the allocate and performance difference: Mio's centre data structure is a fixed sized(32) stripped int-table(using grow only buckets), it maps fd to its callback function. The bucket usually is quite full since fd numbers follow the least unused rule. My libuv binding's centre data structure is variadic size(capatibily number) stripped int-table(also using grow only buckets), it map a range limited *slot* to a `MVar`. It actually works like a per capatibily stable pointer table: we keep a free list of slot waiting to be assigned, and during GC the int-table is evacuated. So mio have to allocate extra callback closure, even it's usually just a `putMVar`, but libuv doesn't: it simply use slots from eventloop to unblock those `MVar`s. Theoretically mio is more flexible since you can do pretty much anything inside callback, but in practise this's not a problem for libuv: after we unblock the blocking thread, it can do anything it want to, too. I plan to optimize the `MVar` table further using `primitive` 's `UnliftedArray` trick: it use `ArrayArray#` to directly save `MVar#` in packed memory. It will cut the GC time spending on the table(which is already very low) half. But i can't see how this optimization work out for mio. Another direction to go is to optimize our stable pointer's implementation and use the new `try_put_mvar` in 8.2. But I'm afraid it would not be as fast as my current version since current version is optimizied for libuv's call convention: provide buffers and handles, then wait for callbacks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): Another thing to add is that, recently experiments showed that it's important to make a balance between unsafe nonblocking poll, e.g. `epoll_wait` with timeout zero and safe blocking poll, e.g. `epoll_wait` with timeout -1. Currently mio do two non-blocking poll, if still no events happend, mio sleep by doing a blocking poll. In stdio i use a idle counter, increased by one if last poll return no events. When the counter reach a limit(currently 50), stdio sleep by doing a blocking poll(using libuv's `UV_RUN_ONCE` mode). While this number doesn't affect normal load situation(since you never enter blocking poll in that case), it affect some light load situation, namely some benchmark code. The idle counter solution and 50 as a limit are borrowed from golang, which helped it win lots of benchmark i guess. In that case, mio just enter safe FFI too often. The mechanism is actually more complicated in stdio, since libuv never guarantee thread safety except a special `uv_async_t` handler(which behave like a control FD). We have to wake up the safe blocking call everytime we add new events. But this also give stdio a better ratio between unsafe and safe poll: everytime we wake from a blocking poll, we clean the idle counter so next 50 time poll will be unsafe one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): I'm not familiar with `libuv`, but just to list the concerns I would have about replacing the IO manager: * Performance: the current IO manager was pretty heavily tuned and optimised when it was developed, the benchmarks and results are described in the Mio paper. I'd like to see comparative results showing that libuv is at least as fast for the same benchmarks before we consider switching. Some performance issues can be subtle (e.g. unsafe FFI calls that take too long, or mutable data structures that affect generational GC performance), so even if the benchmarks are good we'd need to examine the code quite carefully. * Dependencies (as @bgamari pointed out) can be problematic. How would we handle the dependency? Import it into the tree (as with libffi), as a submodule (as with packages), or require it to be installed and test for it in configure (as with LLVM / gmp, but IIRC we're thinking of changing this for LLVM)? * Correctness: there is a history of subtle bugs in the IO manager and its interface with the rest of the IO library. I don't know how best to avoid introducing new problems other than the regression test suite, but it's worth mentioning that this is an area we need to be especially careful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by winter): I share the same concerns above, so my experiment is still under going, so far the results are: the libuv I/O system's performance is slightly better(~10%), and allocate slightly less(~10%~20%), the GC pause is much lower(~30%). we test it under different load, we also did a test on a 36-core 512G RAM server accorading to the MIO paper. We(I and a student in BHU who used to work as an intern under my supervision) are planning to write a paper about this new I/O system design, collecting all the result above, and do some analysis. Since this year's HIW submission is closed already. We're looking forward to submit this paper in next year's HIW. So we still have plenty of time to polish our work. We would much appreciate if someone can recommend some places to submit a paper like this ; ) If we somehow choose to use libuv in GHC, the libuv dependency should be managed like libffi IMO, it's easy to build and we should ship it with GHC to help user get started(just like node.js). As for correctness, i think switch to libuv is actually helpful to reduce bugs since we don't need to interface different OS event backend. A regression test suite is definitely helpful, i'll try to make one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dobenour): I have an idea for how to implement this: * Each `Capability` contains a `uv_loop_t` as well as a pointer to a list of threads that are blocked waiting on C callbacks to fire. Since initializing a `uv_loop_t` can fail due to OS resource exhaustion, such as too many open files, the RTS checks that initializing succeeded before a capability can run Haskell code. * Each Capability owns a pool of C structures {{{#!C typedef struct StgCCallbackInfo { StgTSO *BlockedThread; /* The thread that is blocked waiting for the callback */ StgWord refcount; /* Reference count */ void *user; /* Arbitrary C data */ } StgCCallbackInfo; }}} This list is a GC root. The members of this pool are in pinned memory, so they can safely be referenced by C code * The RTS exports C functions {{{#!C /** * Allocates a C callback info struct, or NULL if we run out of memory. */ StgCCallbackInfo *rts_newCCallbackInfo(Capability *c, StgTSO *t, void *user); /** * Wakes up the thread pointed to by the given `StgCCallbackInfo`. */ void rts_wakeupThread(struct StgCCallbackInfo *ptr); /** * Increments the reference count on the `StgCCallbackInfo`. */ void rts_callback_incref(struct StgCCallbackInfo *ptr); /** * Decrements the reference count. */ void rts_callback_decref(struct StgCCallbackInfo *ptr); }}} which can be used to manipulate these structures -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8400: Migrate the RTS to use libuv (or libev, or libevent) -------------------------------------+------------------------------------- Reporter: schyler | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Runtime System | Version: Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 635, 7353 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by lelf): * cc: lelf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8400#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC