[GHC] #7602: Threaded RTS performing badly on recent OS X (10.8?)

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ This ticket is to remind us about the following problem: OS X is now using llvm-gcc, and as a result GHC's garbage collector with -threaded is much slower than it should be (approx 30% slower overall runtime). Some results here: [http://www.haskell.org/pipermail/cvs- ghc/2011-July/063552.html] This is because the GC code relies on having fast access to thread-local state. It uses one of two methods: either a register variable (gcc only) or `__thread` variables (which aren't supported on OS X). To make things work on OS X, we use calls to `pthread_getspecific` instead (see #5634), which is quite slow, even though it compiles to inline assembly. I don't recall which OS X / XCode versions are affected, maybe a Mac expert could fill in the details. We have tried other fixes, such as passing around the thread-local state as extra arguments, but performance wasn't good. Ideally Apple will implement TLS in OS X at some point and we can start to use it. A workaround is to install a real gcc (using homebrew?) and use that to compile GHC. Whoever builds the GHC distributions for OS X should probably do it that way, so everyone benefits. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by tibbe): * cc: johan.tibell@… (added) -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by thoughtpolice): The situation is 'Okay' now because Clang/LLVM 3.2 and OS X (as of 10.7 **possibly**, but 10.8 for certain) supports TLS. As far as I know, we don't use register variables (which may never appear in LLVM) on x86_64 and instead opt for TLS, so it should be much easier to compile the RTS and compiler using only Clang/LLVM now. We already have code in place in DriverPipeline to run Clang for the assembler and whatnot, so perhaps it wouldn't be that much change to get a Fully-LLVM built GHC on OS X. I might have time to try this since I've been sorting out llvm 3.2 bugs anyway. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): Does the support for TLS compile down to calls to `pthread_getspecific`/`pthread_getspecific`? If so, that won't help much - we could remove some hacks from the code, but it will still perform badly. Someone with a Mac will need to do some measurements to be sure. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?)
---------------------------------+------------------------------------------
Reporter: simonmar | Owner:
Type: bug | Status: new
Priority: normal | Milestone: _|_
Component: Runtime System | Version: 7.6.1
Keywords: | Os: Unknown/Multiple
Architecture: Unknown/Multiple | Failure: None/Unknown
Difficulty: Unknown | Testcase:
Blockedby: | Blocking:
Related: |
---------------------------------+------------------------------------------
Comment(by thoughtpolice):
I don't think so, or at least it doesn't in my trivial case:
{{{
#include

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by thoughtpolice): Ah, I misspoke in earnest without looking more deeply. It does look like the ```callq *(%rdi)``` is the indirection we're still hitting in some sense, although I'm not sure why it's different from apple's implementation in libc: with -O2, this is minimized to a simple load and call, but it's still a significant overhead for the access, at a glance. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): Yeah, it looks like they've done some trickery so that instead of calling `pthread_getspecific` they make an indirect call to the contents of `foo` to get its per-thread location. This won't be fast enough for us, I'm sure. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by chak): * cc: chak@… (added) Comment: Let me just add that llvm-gcc will disappear from Xcode (command line tools) soon — i.e., we will need to use clang. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): I ''believe'' there is no problem with clang, we incorporated patches to make it work a while back. But please report any problems if you find them. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by thoughtpolice): Just as a note, I don't think GCC will help anymore; using GCC 4.7.2 on OSX 10.8, compiling my same 'foo' example earlier with ```gcc -O3```, I get: {{{ (lldb) disassemble -m -n main a.out`main a.out[0x100000f00]: cmpl $1, %edi a.out[0x100000f03]: pushq %rbx a.out[0x100000f04]: jle 0x100000f3f ; main + 63 a.out[0x100000f06]: movq 8(%rsi), %rdi a.out[0x100000f0a]: callq 0x100000f54 ; symbol stub for: atoi a.out[0x100000f0f]: leaq 330(%rip), %rdi ; __emutls_v.foo a.out[0x100000f16]: movl %eax, %ebx a.out[0x100000f18]: callq 0x100000f66 ; symbol stub for: __emutls_get_address a.out[0x100000f1d]: movl %ebx, (%rax) a.out[0x100000f1f]: leaq 314(%rip), %rdi ; __emutls_v.foo a.out[0x100000f26]: callq 0x100000f66 ; symbol stub for: __emutls_get_address a.out[0x100000f2b]: leaq 114(%rip), %rdi ; "foo = %d\n" a.out[0x100000f32]: movl (%rax), %esi a.out[0x100000f34]: xorl %eax, %eax a.out[0x100000f36]: callq 0x100000f60 ; symbol stub for: printf a.out[0x100000f3b]: xorl %eax, %eax a.out[0x100000f3d]: popq %rbx a.out[0x100000f3e]: ret a.out[0x100000f3f]: leaq 282(%rip), %rdi ; __emutls_v.foo a.out[0x100000f46]: callq 0x100000f66 ; symbol stub for: __emutls_get_address a.out[0x100000f4b]: movl $10, (%rax) a.out[0x100000f51]: jmp 0x100000f1f ; main + 31 }}} So I think for now we're going to have to just bite the bullet on this one, and make sure the build is solid with Clang on modern OS X anyway. Maybe we can do something evil here later to recover the loss :/ David Peixotto's original patches made the RTS build with clang at first. I'll run a test against HEAD using Clang and see what I find. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by simonmar): gcc support register variables, which we use instead of `__thread` (see `rts/sm/GCTDecl.h`). So we can use gcc until/unless they drop support for register variables. TLS requires support from the OS, which is why neither gcc nor Clang/LLVM can support it on OS X. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by thoughtpolice): Ah, you're very correct. I shouldn't comment on tickets when it's absurdly late at night and I'm really tired... Apple has advertised OS X as having TLS since Lion I believe. I think this really is TLS support. It's just slower than Linux - on Ubuntu 12.10, the example above moves values directly into ```%fs:0xfffffffffffffff0```. GCC is just weird here because it's calling into ```libgcc``` first from my digging (which would maybe mean it's even slower. I haven't tested.) Modern GCC is at least easily obtainable on OS X, so perhaps this isn't as bad and it's not worth doing something evil. We'll probably need to fix other things, so I'll still look into it. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) ---------------------------------+------------------------------------------ Reporter: simonmar | Owner: Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.6.1 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by thoughtpolice): Just as an aside, I may have found a way to recover (most) of the performance relative to David's original post, even with Clang. It's by using a trick JavaScriptCore (WebKit) uses. I'm building now and will see what happens... -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?)
---------------------------------+------------------------------------------
Reporter: simonmar | Owner:
Type: bug | Status: new
Priority: normal | Milestone: _|_
Component: Runtime System | Version: 7.6.1
Keywords: | Os: Unknown/Multiple
Architecture: Unknown/Multiple | Failure: None/Unknown
Difficulty: Unknown | Testcase:
Blockedby: | Blocking:
Related: |
---------------------------------+------------------------------------------
Comment(by thoughtpolice):
Alright, I think my patch is almost working, but in the mean time I've
verified with a small snippet the behavior I think we want. Simon, can you
please tell me if this approach would be OK?
Essentially, there is a small set of predefined TLS keys in the OS X C
library for various Apple-internal things. There are about 100 of these
special keys. With them, it's possible to use very special inline variants
of ```pthread_getspecific``` and ```pthread_setspecific``` that directly
write into an offset block of the ```%gs``` register. Performance-wise,
this should be very close to Linux's implementation.
One of these things on modern OS X and its libc is WebKit. pthread has a
specific range of keys (5 to be exact) dedicated to WebKit. These are used
in JavaScriptCore's FastMalloc allocator for performance critical sections
- likely for their GC! But only a single key is used by WebKit at all, and
there are 0 references to it elsewhere that I can find on the internet.
You can see this here:
http://www.opensource.apple.com/source/Libc/Libc-825.25/pthreads/pthread_mac...
This defines the inline get/set routines for special TLS keys. If you
scroll down a little you can see the ```JavaScriptCore``` keys (keys 90-94
to be exact.)
Now, look here:
http://code.google.com/codesearch#mcaWan7Aaio/trunk/WebKit-r115846/Source/WTF/wtf/FastMalloc.cpp&q=__PTK_FRAMEWORK_JAVASCRIPTCORE_KEY0&type=cs&l=453
And you can see there's a special stubbed out ```pthread_getspecific```
and ```pthread_setspecific``` routine for this exact purpose.
Therefore, I propose we steal one of the high TLS keys that dedicated to
WebKit's JS engine for the GC. Unfortunately, ```pthread_machdep.h``` is
not installed by default in modern variants of XCode, so we must inline
the definitions ourselves for the necessary architectures.
The following example demonstrates the use of these special keys:
{{{
#include

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by thoughtpolice): * owner: => thoughtpolice * version: 7.6.1 => 7.7 * os: Unknown/Multiple => MacOS X * architecture: Unknown/Multiple => x86_64 (amd64) -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by simonmar): Ok, so an inline `pthread_getspecific` is good, but it's still not ideal, because the compiler can't see the code inside the inline asm. Multiple references to the TLS variable should not cause repeated reads, but with the inline `pthread_getspecific`, they will. However, this can be worked around in the code by loading the TLS variable into a local once at the start of the function. I wouldn't object to this, it's less invasive than passing around the TLS variable as a parameter everywhere. But we should peer at the asm before and after to make sure it's doing what we expect. Stealing the WebKit-reserved slot would probably work for now, until some library developer has the same idea, and then we have a bizarre bug waiting to happen. I suppose I slightly prefer to use gcc for now, since we know it works, has good performance, and doesn't have this infelicity. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by thoughtpolice): OK. Clang is kind of being an issue at the moment so my patch is on hold (see #7678.) Right now my change is localized in ```GCTDecl.h``` behind an ```#ifdef``` so it's nothing more than a configurable performance optimization on Darwin. I think first I will just look at the raw differences between: * clang, with the slow, non-inline pthread calls. * clang, using this change to steal an inline TLS variable * gcc 4.7.2, using register variables. (I don't particularly care about llvm-gcc too much - it's already slow whether or not it's using clang or llvm-gcc, and llvm-gcc will be removed anyway.) Afterwords, we can look at what storing the TLS variable in a local will save us performance wise, and decide on a default. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by chak): Any news here? Did you check out the Xcode 5 DP? It's now clang or nothing. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by thoughtpolice): Sigh. I haven't seen XCode 5. The fact they EOL'd gcc finally isn't totally surprising. The summary is that I have a patch for this and it should help the performance loss quite a bit, but the blocker here is #7678 - clang's preprocessor doesn't like the way we use RULES, among other things. This is work-around-able, but probably not in a very nice way unfortunately. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Sigh. I haven't seen XCode 5. The fact they EOL'd gcc finally isn't totally surprising.
The summary is that I have a patch for this and it should help the
#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by chak): Replying to [comment:19 thoughtpolice]: performance loss quite a bit, but the blocker here is #7678 - clang's preprocessor doesn't like the way we use RULES, among other things.
This is work-around-able, but probably not in a very nice way
unfortunately. Our use of CPP was always a dirty hack and it was only a matter of time, before it was going to bite us. Can't we just disable cpp in modules with rules? -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by chak): Concerning TLS, clang actually supports a choice of different ways of handling TLS since clang 3.2. See [http://llvm.org/releases/3.2/docs/ClangReleaseNotes.html] under heading "Support for tls_model attribute" (the compiler option is "-ftls-model"). The various models are spec'ed in this document: [http://www.akkadia.org/drepper/tls.pdf] -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: normal | Milestone: _|_ Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by thoughtpolice): Well, technically I think that under GCC's -traditional-cpp mode, clang SHOULD respect our code. The problem is I just don't think the "be insensitive to leading whitespace" rule is implemented. :( I don't think turning one or the other off is an option. Base in particular has a few modules with both CPP and Rules, but if you disable one or the other, things will break. Either the preprocessor directives become a syntax error, or you disable the RULES and they are ignored - but the preprocessor would error on them anyway because it runs first. There is another problem. We can change GHC and all of its dependent libraries to remove leading whitespace on lines that would be ambiguous to clang. Unfortunately, 3rd party libraries do this kind of formatting to, so this could break user programs for a very bizarre reason. I think it should be possible to fix this in a transparent way, but it won't be pretty. Basically, we'll need to preemptively strip out lines that begin with whitespace, where the first non whitespace character is #. Something like "s/^\s+\#(.*)/#$1/" or whatever in regex-ese. In the mean time I think it should be possible to get clang building by reformatting a few of libraries using RULES improperly, which should not be too much work I think. Finally, the various TLS models makes no difference on the example using __thread I posted above. I posit that TLS models are silently ignored on OS X, where Apple is free to do what they wish. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by simonmar): * priority: normal => high * milestone: _|_ => 7.8.1 Comment: I'd be really happy if someone took ownership of this problem and drove it to a solution. @thoughtpolice? As I understand it, currently the situation is that you have to install gcc. We can probably make it so that you don't have to install gcc to ''use'' GHC, but still need gcc to ''build'' GHC, because we don't have access to fast TLS support on OS X (or alternatively you can build GHC with clang, but the GC will be slow.) It seems certain privileged projects do get fast TLS support (WebKit), but it's not officially available for general use. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by thoughtpolice): Right, understood. I'll put this on my plate and try to tackle it tonight. Just getting clang to build and recovering the performance I think won't be too hard. The preprocessing hack can happen afterwords, and I can file a separate ticket for it. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Comment(by carter): I'm going to see if I can reach out to someone I know at apple to see if the patched clang CPP stuff http://llvm.org/bugs/show_bug.cgi?id=16363 can be backported to the Xcode 5 clang. It'd be really a shame if we have to bundle up our own GCC/CLang or require end users to build their own before they can use GHC on mac! -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#7602: Threaded RTS performing badly on recent OS X (10.8?) -------------------------------+-------------------------------------------- Reporter: simonmar | Owner: thoughtpolice Type: bug | Status: new Priority: high | Milestone: 7.8.1 Component: Runtime System | Version: 7.7 Keywords: | Os: MacOS X Architecture: x86_64 (amd64) | Failure: None/Unknown Difficulty: Unknown | Testcase: Blockedby: 7678 | Blocking: Related: | -------------------------------+-------------------------------------------- Changes (by lelf): * cc: anton.nik@… (added) -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/7602#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (2)
-
GHC
-
GHC