[GHC] #14193: Add RTS flag to disable 1TB address space allocaiton

#14193: Add RTS flag to disable 1TB address space allocaiton -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature | Status: new request | Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #9706, #14192 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- GHC 8.0 changed the default behaviour on Linux to allocate 1 TB virtual memory for Haskell programs (#9706). While shown to be good for performance (a small percentage gain), it has created me a couple problems, especially when: * trying to disable overcommit in Linux to get more reliable memory behaviour and avoid swapping / the OOM-killer (all Haskell programs crash at startup) * and in debugging (e.g. #14192) Currently you can turn that feature off only via a compile-time switch, e.g. `./configure --disable-large-address-space`. I'd like to request to make it possible to turn this behaviour off at run- time with an RTS flag, so that when the flag is given, it uses the old block-allocator. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocaiton -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): This seems to suggest that we should use PROT_NONE instead of MAP_NORESERVE to work around issues with overcommit: https://lwn.net/Articles/627557/ As I said over in #14192, adding a flag to disable the large address space would have a performance cost, because we'd have to check the flag in the inner loop of the GC, so I'd rather avoid that if at all possible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocaiton -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Patch to solve the same problem in webkit, here it looks like using PROT_NONE to reserve and mprotect() to commit works around the overcommit issue: http://trac.webkit.org/changeset/137994/webkit -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocaiton -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Is this really the same problem? We already map with `PROT_NONE` when reserving. The only real difference between WebKit after that patch and us is the use of `madvise(MADV_DONTNEED)` after reservation; I wonder if this affects the core dump logic. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocaiton -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => patch * differential: => Phab:D3929 Comment: Here is a patch trying this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

adding a flag to disable the large address space would have a
#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Replying to [comment:1 simonmar]: performance cost, because we'd have to check the flag in the inner loop of the GC @simonmar do you recall if the impact of this was already measured when the new allocator was introduced, or would there be value in me giving it a try? I have some wishful thinking that if branch prediction and pipelining are on our side, we might not be able to measure the additional check -- and if it were so I'd welcome the ability to switch it without recompilation (of ghc, or even better, the program). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): We didn't measure the effect of making it runtime-selectable. You're welcome to try measuring it, but my concern is that it's likely to be around 1% overall, and that's at the level where we care, but it's difficult to measure accurately given the benchmarks and tools we have. nofib almost certainly won't be able to measure it, because most of the programs in there don't do much GC. You can try nofib/parallel, but those tend to be a bit noisy. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Or there's nofib/gc of course. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation
-------------------------------------+-------------------------------------
Reporter: nh2 | Owner: (none)
Type: feature request | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: wontfix | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => wontfix Comment: I believe that comment:9 should remove the need for such a flag. Feel free to reopen if you disagree. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Changes (by gidyn): * status: closed => new * resolution: wontfix => Comment: Haskell programs killed on startup by OOM on OpenShift. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Changes (by gidyn): * cc: gidyn (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): gidyn, can you provide more details? Which GHC version are you using? How can I reproduce this? Are you sure that the program you are running isn't just running out of memory? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by gidyn): GHC 8.6.3 with https://bitbucket.org/accursoft/haskell-cloud/src. `cabal update` was killed by the OOM reaper. Running in a local image didn't commit an excessive amount of memory, but reserved 1 Tb virtual. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): Unless we misunderstand something, the OOM killer should not act upon VIRT at all, and only kick in if resident memory usage is exhausted. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I'm really not sure what to do about this. If OpenShift is really killing processes based on their virtual memory reservation size then this really seems like a bug in OpenShift. Can you describe how in particular the process is being killed? Is it really the kernel OOM killer? Is it a cgroup memory limit? I have found possibly relevant [[https://developers.redhat.com/blog/2017/03/14/java-inside- docker/|article]] about Docker and the JVM which suggests it may be the latter. The JVM apparently has the (experimental) ability to query the memory limits of its containing cgroup. We could do the same if this is really the issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nh2): @gidyn Indeed we will need detail information on how exactly OpenShift does its killing and what metrics it uses (ideally even a link to the code that sets it up) in order to figure out whether it's even related to the 1 TB VIRT allocation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14193: Add RTS flag to disable 1TB address space allocation -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: feature request | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 8.0.2 Resolution: wontfix | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #9706, #14192 | Differential Rev(s): Phab:D3929 Wiki Page: | -------------------------------------+------------------------------------- Changes (by gidyn): * status: new => closed * resolution: => wontfix Comment: Tried with plain docker, and it worked. Tried with plain Kubernetes, and it worked. Further attempts with OpenShift failed, then one worked, then got `01-index.tar: hPutBuf: resource exhausted (Cannot allocate memory)`. So it does indeed appear that the memory issues were coming from `cabal update`, and not the 1Tb allocation. Apologies for the distraction. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14193#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC