slow execution of built executables on a Mac

Hi devs, I have a shiny, new iMac in my office. It's thus frustrating that it takes my iMac longer to build GHC than my trusty 28-month-old laptop. Building devel2 on a fresh checkout takes just about an hour. By contrast, my laptop is done after 30 minutes of work (same build settings). The laptop has a 2.8GHz Intel i7 running macOS 10.13.5; the desktop has a 3.5GHz Intel i5 running macOS 10.13.6. Both bootstrapped from the binary distro of GHC 8.6.1. Watching GHC build, everything is snappy enough during the stage-1 build. But then, as soon as we start using GHC-produced executables, things slow down. It's most noticeable in the rts_dist_HC phase, which crawls. Stage 2 is pretty slow, too. So: is there anything anyone knows about recent Macs not liking locally built executables? Or is there some local setting that I need to update? The prepackaged GHC seems to work well, so that gives me hope that someone knows what setting to tweak. Thanks! Richard

theory one:
are you setting the intree gmp flag? and/or dont have gmp the library
installed? maybe your ghc is using integer simple!
theory two: dont set -j greater than about 8 or perhaps 16? (some parts of
ghc are slower on too paralle a setup?)
those are my two off the cuff guesses
you could also look at system monitor to see if your'e being IO bound or
memory or CPU bound
(possible failure siutation: the repo is under dropbox or something and
dropbox is eating your memory bandwidth trying to sync stuff?)
On Fri, Oct 26, 2018 at 2:33 PM Richard Eisenberg
Hi devs,
I have a shiny, new iMac in my office. It's thus frustrating that it takes my iMac longer to build GHC than my trusty 28-month-old laptop. Building devel2 on a fresh checkout takes just about an hour. By contrast, my laptop is done after 30 minutes of work (same build settings). The laptop has a 2.8GHz Intel i7 running macOS 10.13.5; the desktop has a 3.5GHz Intel i5 running macOS 10.13.6. Both bootstrapped from the binary distro of GHC 8.6.1.
Watching GHC build, everything is snappy enough during the stage-1 build. But then, as soon as we start using GHC-produced executables, things slow down. It's most noticeable in the rts_dist_HC phase, which crawls. Stage 2 is pretty slow, too.
So: is there anything anyone knows about recent Macs not liking locally built executables? Or is there some local setting that I need to update? The prepackaged GHC seems to work well, so that gives me hope that someone knows what setting to tweak.
Thanks! Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Oct 26, 2018, at 5:02 PM, Carter Schonwald
wrote: are you setting the intree gmp flag? and/or dont have gmp the library installed? maybe your ghc is using integer simple!
Intriguing possibility. I haven't done anything to install gmp. I'm now at home, away from this machine, but I'll try this next week. I have a hunch you're right. Perhaps we should have a warning when this is the case?
theory two: dont set -j greater than about 8 or perhaps 16? (some parts of ghc are slower on too paralle a setup?)
I'm at -j5, so this isn't the problem.
you could also look at system monitor to see if your'e being IO bound or memory or CPU bound
I tried this earlier, and it looked like I wasn't CPU-bound. I suspected a slow SSD somehow (if that's even possible). But watching the build this morning showed me that stage 1 is snappy while later work isn't, so that made me doubt my earlier guess. And installing the 6GB of LaTeX wasn't unduly slow, either.
(possible failure siutation: the repo is under dropbox or something and dropbox is eating your memory bandwidth trying to sync stuff?)
Thanks for the idea, but I know to keep GHC away from Dropbox. :) I'll give 90% confidence on the gmp thing. Thanks! Richard

rts_dist_HC is where it builds the various versions of the RTS, right? I noticed a similar slowness building the RTS at ICFP on my MacBook Pro (macOS 10.12). I don't think my battery lasted long enough to get to stage 2. I'm afraid I don't have any clue why it's gotten so slow, but you're not alone! I did notice that it seems to build many more variants of the RTS than before, but I haven't tried to build GHC in a long time. On Fri, Oct 26, 2018, at 14:32, Richard Eisenberg wrote:
Hi devs,
I have a shiny, new iMac in my office. It's thus frustrating that it takes my iMac longer to build GHC than my trusty 28-month-old laptop. Building devel2 on a fresh checkout takes just about an hour. By contrast, my laptop is done after 30 minutes of work (same build settings). The laptop has a 2.8GHz Intel i7 running macOS 10.13.5; the desktop has a 3.5GHz Intel i5 running macOS 10.13.6. Both bootstrapped from the binary distro of GHC 8.6.1.
Watching GHC build, everything is snappy enough during the stage-1 build. But then, as soon as we start using GHC-produced executables, things slow down. It's most noticeable in the rts_dist_HC phase, which crawls. Stage 2 is pretty slow, too.
So: is there anything anyone knows about recent Macs not liking locally built executables? Or is there some local setting that I need to update? The prepackaged GHC seems to work well, so that gives me hope that someone knows what setting to tweak.
Thanks! Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Maybe a symptom of an AFPS bug?
https://gregoryszorc.com/blog/2018/10/29/global-kernel-locks-in-apfs/
Just came across this, might be worth a look.
Cheers,
Gabor
On 10/26/18, Richard Eisenberg
Hi devs,
I have a shiny, new iMac in my office. It's thus frustrating that it takes my iMac longer to build GHC than my trusty 28-month-old laptop. Building devel2 on a fresh checkout takes just about an hour. By contrast, my laptop is done after 30 minutes of work (same build settings). The laptop has a 2.8GHz Intel i7 running macOS 10.13.5; the desktop has a 3.5GHz Intel i5 running macOS 10.13.6. Both bootstrapped from the binary distro of GHC 8.6.1.
Watching GHC build, everything is snappy enough during the stage-1 build. But then, as soon as we start using GHC-produced executables, things slow down. It's most noticeable in the rts_dist_HC phase, which crawls. Stage 2 is pretty slow, too.
So: is there anything anyone knows about recent Macs not liking locally built executables? Or is there some local setting that I need to update? The prepackaged GHC seems to work well, so that gives me hope that someone knows what setting to tweak.
Thanks! Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Sadly, none of the suggestions on this thread worked. But here is some more detail: - During stage-1, my machine's CPU is maxed out (or nearly so) in user mode. - After stage-1 (most obviously during rts_dist_HC), my machine spends roughly 80% of its CPU in *system* mode. - Testing on my other machine (which is slower, but much faster at building GHC), I never see high *system* percentages. - Both machines use APFS, which was one candidate for the slowdown. - The slow machine uses XCode 10.1; the fast one uses XCode 9.4.1 - The slow machine uses clang 10.0.0; the fast one uses clang 9.1.0 - `brew install gmp` on the slow machine tells me that gmp is already installed. - As a reminder: the slow machine is macOS 10.13.6; the fast one is macOS 10.13.5. I don't wish to try upgrading the fast one... lest that slow it down! Does anyone have any insight? Thanks! Richard
On Oct 30, 2018, at 2:32 PM, Gabor Greif
wrote: Maybe a symptom of an AFPS bug? https://gregoryszorc.com/blog/2018/10/29/global-kernel-locks-in-apfs/
Just came across this, might be worth a look.
Cheers,
Gabor
On 10/26/18, Richard Eisenberg
wrote: Hi devs,
I have a shiny, new iMac in my office. It's thus frustrating that it takes my iMac longer to build GHC than my trusty 28-month-old laptop. Building devel2 on a fresh checkout takes just about an hour. By contrast, my laptop is done after 30 minutes of work (same build settings). The laptop has a 2.8GHz Intel i7 running macOS 10.13.5; the desktop has a 3.5GHz Intel i5 running macOS 10.13.6. Both bootstrapped from the binary distro of GHC 8.6.1.
Watching GHC build, everything is snappy enough during the stage-1 build. But then, as soon as we start using GHC-produced executables, things slow down. It's most noticeable in the rts_dist_HC phase, which crawls. Stage 2 is pretty slow, too.
So: is there anything anyone knows about recent Macs not liking locally built executables? Or is there some local setting that I need to update? The prepackaged GHC seems to work well, so that gives me hope that someone knows what setting to tweak.
Thanks! Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Richard Eisenberg
Sadly, none of the suggestions on this thread worked.
But here is some more detail:
- During stage-1, my machine's CPU is maxed out (or nearly so) in user mode. - After stage-1 (most obviously during rts_dist_HC), my machine spends roughly 80% of its CPU in *system* mode. - Testing on my other machine (which is slower, but much faster at building GHC), I never see high *system* percentages. - Both machines use APFS, which was one candidate for the slowdown. - The slow machine uses XCode 10.1; the fast one uses XCode 9.4.1 - The slow machine uses clang 10.0.0; the fast one uses clang 9.1.0 - `brew install gmp` on the slow machine tells me that gmp is already installed. - As a reminder: the slow machine is macOS 10.13.6; the fast one is macOS 10.13.5. I don't wish to try upgrading the fast one... lest that slow it down!
Does anyone have any insight?
Were you able to collect a few stacks from the slow processes? It sounds like the author of the original post was somehow able to do this. Under Linux I would just fire up perf and grab a system-wide profile. Knowing precisely what slow path you are hitting would help localize any possible problem in GHC. Cheers, - Ben

Ben, yeah, good point, probabilistically speaking, a few stack traces are
often all you need to pinpoint an egregious performance problem, at least
that's been my experience.
Richard, as you probably know, you can get stack traces by using the
Activity Monitor, select a ghc process and choose sample process from the
drop down of the gear icon on the left hand side of the top of the activity
monitor. It also shows cpu times which may be helpful.
On Wed, Nov 7, 2018 at 7:10 PM Ben Gamari
Richard Eisenberg
writes: Sadly, none of the suggestions on this thread worked.
But here is some more detail:
- During stage-1, my machine's CPU is maxed out (or nearly so) in user mode. - After stage-1 (most obviously during rts_dist_HC), my machine spends roughly 80% of its CPU in *system* mode. - Testing on my other machine (which is slower, but much faster at building GHC), I never see high *system* percentages. - Both machines use APFS, which was one candidate for the slowdown. - The slow machine uses XCode 10.1; the fast one uses XCode 9.4.1 - The slow machine uses clang 10.0.0; the fast one uses clang 9.1.0 - `brew install gmp` on the slow machine tells me that gmp is already installed. - As a reminder: the slow machine is macOS 10.13.6; the fast one is macOS 10.13.5. I don't wish to try upgrading the fast one... lest that slow it down!
Does anyone have any insight?
Were you able to collect a few stacks from the slow processes? It sounds like the author of the original post was somehow able to do this. Under Linux I would just fire up perf and grab a system-wide profile. Knowing precisely what slow path you are hitting would help localize any possible problem in GHC.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

OK. Well, I couldn't sample from Activity Monitor, because the processes came and went too quickly. But the terminal command `sample` takes a *name* as an argument, and so passing ghc-stage1 worked nicely. Samples were taken during rts_dist_HC calls. Here's the first bit: Call graph: 880 Thread_2569398 DispatchQueue_1: com.apple.main-thread (serial) + 868 _pthread_cond_wait (in libsystem_pthread.dylib) + 732 [0x7fff51e2e589] + ! 868 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff51c65a16] + 11 exitTicker (in ghc-stage1) + 80 [0x106ad2c30] + ! 11 _pthread_join (in libsystem_pthread.dylib) + 626 [0x7fff51e31824] + ! 11 __semwait_signal (in libsystem_kernel.dylib) + 10 [0x7fff51c65d82] + 1 osDecommitMemory (in ghc-stage1) + 20 [0x106ad2fc4] + 1 madvise (in libsystem_kernel.dylib) + 10 [0x7fff51c66d0a] 880 Thread_2569402 + 880 thread_start (in libsystem_pthread.dylib) + 13 [0x7fff51e2cbf9] + 880 _pthread_start (in libsystem_pthread.dylib) + 377 [0x7fff51e2d50d] I also sampled gcc. Full output at https://gist.github.com/goldfirere/7316920ad37d776c25c15dbb0ed5996f https://gist.github.com/goldfirere/7316920ad37d776c25c15dbb0ed5996f I'm utterly lost here, but the results suggest that the scheduler is at fault, somehow. Does anyone have a next step to suggest? Thanks, Richard
On Nov 7, 2018, at 6:23 PM, George Colpitts
wrote: Ben, yeah, good point, probabilistically speaking, a few stack traces are often all you need to pinpoint an egregious performance problem, at least that's been my experience.
Richard, as you probably know, you can get stack traces by using the Activity Monitor, select a ghc process and choose sample process from the drop down of the gear icon on the left hand side of the top of the activity monitor. It also shows cpu times which may be helpful.
On Wed, Nov 7, 2018 at 7:10 PM Ben Gamari
mailto:ben@well-typed.com> wrote: Richard Eisenberg mailto:rae@cs.brynmawr.edu> writes: Sadly, none of the suggestions on this thread worked.
But here is some more detail:
- During stage-1, my machine's CPU is maxed out (or nearly so) in user mode. - After stage-1 (most obviously during rts_dist_HC), my machine spends roughly 80% of its CPU in *system* mode. - Testing on my other machine (which is slower, but much faster at building GHC), I never see high *system* percentages. - Both machines use APFS, which was one candidate for the slowdown. - The slow machine uses XCode 10.1; the fast one uses XCode 9.4.1 - The slow machine uses clang 10.0.0; the fast one uses clang 9.1.0 - `brew install gmp` on the slow machine tells me that gmp is already installed. - As a reminder: the slow machine is macOS 10.13.6; the fast one is macOS 10.13.5. I don't wish to try upgrading the fast one... lest that slow it down!
Does anyone have any insight?
Were you able to collect a few stacks from the slow processes? It sounds like the author of the original post was somehow able to do this. Under Linux I would just fire up perf and grab a system-wide profile. Knowing precisely what slow path you are hitting would help localize any possible problem in GHC.
Cheers,
- Ben _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Richard Eisenberg
OK. Well, I couldn't sample from Activity Monitor, because the processes came and went too quickly. But the terminal command `sample` takes a *name* as an argument, and so passing ghc-stage1 worked nicely. Samples were taken during rts_dist_HC calls.
Hmmm, it looks to me like all of these are from GHC waiting on various things (e.g. _pthread_cond_wait, nanosleep, and pthread_join). It's quite surprising that these operations are chewing through cycles in kernel mode. I wonder how rapidly we are context switching. Perhaps we are quickly jumping between kernel and user mode? Perhaps strace (or I think the OS X equivalent is truss?) will shed some light? Cheers, - Ben

I couldn't duplicate this, when system times were high they were usually
between 20-40% which seems high but not 80%. I am on Xcode 10.1, macOS
10.13.6 , a 4 core imac with 12 gb RAM and a hard drive not SSDs. I can't
go to Mojave as it is an old iMac. I did make -j5 using ghc 8.6.2.
The Mac doesn't have truss or strace. It does have dtrace which I believe
is the equivalent.
It might be good to open a ticket for this. As the description of how to
reproduce are a bit vague and the steps are manual, it would be ideal if
there were modified make file(s) to capture statistics using sample and
maybe dtrace or even time.
Cheers
George
On Sat, Nov 10, 2018 at 11:10 PM Ben Gamari
Richard Eisenberg
writes: OK. Well, I couldn't sample from Activity Monitor, because the processes came and went too quickly. But the terminal command `sample` takes a *name* as an argument, and so passing ghc-stage1 worked nicely. Samples were taken during rts_dist_HC calls.
Hmmm, it looks to me like all of these are from GHC waiting on various things (e.g. _pthread_cond_wait, nanosleep, and pthread_join). It's quite surprising that these operations are chewing through cycles in kernel mode. I wonder how rapidly we are context switching. Perhaps we are quickly jumping between kernel and user mode? Perhaps strace (or I think the OS X equivalent is truss?) will shed some light?
Cheers,
- Ben

OK. I've posted a bug: https://ghc.haskell.org/trac/ghc/ticket/15891 https://ghc.haskell.org/trac/ghc/ticket/15891 Thanks much for the attempts thus far! Richard
On Nov 11, 2018, at 8:13 AM, George Colpitts
wrote: I couldn't duplicate this, when system times were high they were usually between 20-40% which seems high but not 80%. I am on Xcode 10.1, macOS 10.13.6 , a 4 core imac with 12 gb RAM and a hard drive not SSDs. I can't go to Mojave as it is an old iMac. I did make -j5 using ghc 8.6.2.
The Mac doesn't have truss or strace. It does have dtrace which I believe is the equivalent.
It might be good to open a ticket for this. As the description of how to reproduce are a bit vague and the steps are manual, it would be ideal if there were modified make file(s) to capture statistics using sample and maybe dtrace or even time.
Cheers George
On Sat, Nov 10, 2018 at 11:10 PM Ben Gamari
mailto:ben@well-typed.com> wrote: Richard Eisenberg mailto:rae@cs.brynmawr.edu> writes: OK. Well, I couldn't sample from Activity Monitor, because the processes came and went too quickly. But the terminal command `sample` takes a *name* as an argument, and so passing ghc-stage1 worked nicely. Samples were taken during rts_dist_HC calls.
Hmmm, it looks to me like all of these are from GHC waiting on various things (e.g. _pthread_cond_wait, nanosleep, and pthread_join). It's quite surprising that these operations are chewing through cycles in kernel mode. I wonder how rapidly we are context switching. Perhaps we are quickly jumping between kernel and user mode? Perhaps strace (or I think the OS X equivalent is truss?) will shed some light?
Cheers,
- Ben
participants (6)
-
Ben Gamari
-
Carter Schonwald
-
Eric Seidel
-
Gabor Greif
-
George Colpitts
-
Richard Eisenberg