Trying to speedup GHC compile times...Help!
 
            Hi ghc devs, I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541 https://gitlab.haskell.org/ghc/ghc/-/issues/18541 and 18535 https://gitlab.haskell.org/ghc/ghc/-/issues/18535. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole. To be specific I'm looking for lingering items such as: 1. It would be great if we had <thing-here> but no one has time. 2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>. 3. Swap <some-type> to an array-like type *non-incrementally*, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module. Point 2 is a specific reference to MR 3571 https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3571 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere). Thanks for the help! - Jeff
 
            "Young, Jeff" 
Hi ghc devs,
I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541 https://gitlab.haskell.org/ghc/ghc/-/issues/18541 and 18535 https://gitlab.haskell.org/ghc/ghc/-/issues/18535. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole.
To be specific I'm looking for lingering items such as: 1. It would be great if we had <thing-here> but no one has time. 2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>. 3. Swap <some-type> to an array-like type *non-incrementally*, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module.
Point 2 is a specific reference to MR 3571 https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3571 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere).
Hi Jeff, Indeed I'm a bit skeptical that (2) will produce a meaningful compile-time improvement on typical programs. I would likely not prioritise this if the goal is compile-time in particular. A few people (namely Sebastian Graf and Andreas Klebinger) have thought about (3) in the past (e.g. for the arguments of TyConApps); preliminary experiments suggest that it's not as clear a win as one might expect, although AFAIK no one has tried to optimise pattern matching on the head, which may help matters (Sebastian has thought about this). One thing area where I feel a bit of attention may be useful is in the performance of substitution. In particular, tracking "taintedness" of the substitution result (as suggested in #19537) will help reduce garbage produced by substitution and potentially reduce compiler residency. Another related class of ideas can be found in #19538, which suggests that we try deferring substitution (or, alternatively, annotation expressions with free variable sets). The payoff here is far less certain that the taintedness idea and consequently I would only explore it after doing the former. This is all that comes to mind at the moment. I'll continue pondering other ideas, however. Cheers, - Ben
 
            Jeff
Great stuff!  Welcome.
I strongly urge you to keep a constantly-update status wiki page, which lists the ideas you are working on, and points to relevant resources and tickets.  An email thread like this is a good way to gather ideas, but NOT a good way to organise and track them.
Looking carefully at profiles is a good plan.  That's the hard data!
I think that some careful investigation of IntMap (at least the bits that GHC uses heavily) would be a good idea.  Clearly we spend a lot of time in these maps, so speedups here will yield a lot of benefit.  Look at the heavy hitters from the profile, stare at the Core code and see if it's s good as it can be.
For example, Sebastian discovered a strange infelicity in IntMap.lookup, which I've documented in a new ticket
https://gitlab.haskell.org/ghc/ghc/-/issues/20069
I think it'd also be worth measuring how unbalanced our IntMap trees get.  See
   https://gitlab.haskell.org/ghc/ghc/-/issues/19820
The speculation there is that we are getting very unbalanced trees.  So measure it!  If it's true, we could improve matters by using a different IntMap; or maybe by scrambling the key a bit --- see the ticket.
Simon
From: ghc-devs 
 
            One piece I'm curious about, reading this thread: why do we have so many IntMaps and operations on them? Name lookup is a fundamental operation a compiler must do, and that would use an IntMap: good. But maybe there are other IntMaps used that are less necessary. A key example: whenever we do substitution, we track an InScopeSet, which is really just an IntMap. This InScopeSet remembers the name of all variables in scope, useful when we need to create a new variable name (this is done by uniqAway). Yet perhaps the tracking of these in-scope variables is very expensive and comprises much of the IntMap time. Might it be better just to always work in a monad capable of giving fresh names? We actually don't even need a monad, if that's too annoying. Instead, we could just pass around an infinite list of fresh uniques. This would still be clutterful, but if it grants us a big speed improvement, the clutter might be worth it. The high-level piece here is that there may be good things that come from understanding where these IntMaps arise. Richard
On Jul 2, 2021, at 4:08 AM, Simon Peyton Jones via ghc-devs
wrote: Jeff
Great stuff! Welcome.
I strongly urge you to keep a constantly-update status wiki page, which lists the ideas you are working on, and points to relevant resources and tickets. An email thread like this is a good way to gather ideas, but NOT a good way to organise and track them.
Looking carefully at profiles is a good plan. That’s the hard data!
I think that some careful investigation of IntMap (at least the bits that GHC uses heavily) would be a good idea. Clearly we spend a lot of time in these maps, so speedups here will yield a lot of benefit. Look at the heavy hitters from the profile, stare at the Core code and see if it’s s good as it can be.
For example, Sebastian discovered a strange infelicity in IntMap.lookup, which I’ve documented in a new ticket https://gitlab.haskell.org/ghc/ghc/-/issues/20069 https://gitlab.haskell.org/ghc/ghc/-/issues/20069
I think it’d also be worth measuring how unbalanced our IntMap trees get. See https://gitlab.haskell.org/ghc/ghc/-/issues/19820 https://gitlab.haskell.org/ghc/ghc/-/issues/19820 The speculation there is that we are getting very unbalanced trees. So measure it! If it’s true, we could improve matters by using a different IntMap; or maybe by scrambling the key a bit --- see the ticket.
Simon
From: ghc-devs
On Behalf Of Young, Jeff Sent: 02 July 2021 02:36 To: ghc-devs@haskell.org Subject: Trying to speedup GHC compile times...Help! Hi ghc devs,
I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F18541&data=04%7C01%7Csimonpj%40microsoft.com%7C8a3369ac28654df6f08008d93cf9e28b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637607867394069068%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Zh5dzhUSjwsZdmr8B3G5T49CssX%2Bv8IGcHILRp8Ekjc%3D&reserved=0and 18535 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F18535&data=04%7C01%7Csimonpj%40microsoft.com%7C8a3369ac28654df6f08008d93cf9e28b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637607867394079062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2I%2BQXmd%2Bn1nGz%2Bcr0qvhq9BLkKCucJkedPNxu7RpKSw%3D&reserved=0. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole.
To be specific I'm looking for lingering items such as:
1. It would be great if we had <thing-here> but no one has time.
2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>.
3. Swap <some-type> to an array-like type non-incrementally, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module.
Point 2 is a specific reference to MR 3571 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F3571&data=04%7C01%7Csimonpj%40microsoft.com%7C8a3369ac28654df6f08008d93cf9e28b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637607867394079062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5Sr1EjmeYN8ttZHFc4FS9%2BAXH1KoRSaNoozORhwwQYA%3D&reserved=0 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere).
Thanks for the help!
- Jeff
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
 
            There are lot of places where it would be pretty tiresome to plumb a unique supply guaranteed unique from every other.   I think the current setup works pretty well - but I bet we can squeeze cycles out of its implementation.
Simon
From: Richard Eisenberg 
 
            Somewhat off-topic: does GHC no longer use "the rapier"? I thought the InScopeSet was needed to check that we can safely skip on extending the substitution as you go under binders when the binder is not in the InScopeSet (naively you would always have to rename binders, and thus extend the substitution, in order to avoid capture as you go under binders). i.e. the IntMap is not just used to generate new variable names, but to ensure the compiler does less work in the form of doing fewer substitutions. On Fri, 2 Jul 2021 at 16:12, Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
There are lot of places where it would be pretty tiresome to plumb a unique supply guaranteed unique from every other. I think the current setup works pretty well – but I bet we can squeeze cycles out of its implementation.
Simon
*From:* Richard Eisenberg
*Sent:* 02 July 2021 14:26 *To:* Simon Peyton Jones *Cc:* Young, Jeff ; ghc-devs@haskell.org *Subject:* Re: Trying to speedup GHC compile times...Help! One piece I'm curious about, reading this thread: why do we have so many IntMaps and operations on them? Name lookup is a fundamental operation a compiler must do, and that would use an IntMap: good. But maybe there are other IntMaps used that are less necessary. A key example: whenever we do substitution, we track an InScopeSet, which is really just an IntMap. This InScopeSet remembers the name of all variables in scope, useful when we need to create a new variable name (this is done by uniqAway). Yet perhaps the tracking of these in-scope variables is very expensive and comprises much of the IntMap time. Might it be better just to always work in a monad capable of giving fresh names? We actually don't even need a monad, if that's too annoying. Instead, we could just pass around an infinite list of fresh uniques. This would still be clutterful, but if it grants us a big speed improvement, the clutter might be worth it.
The high-level piece here is that there may be good things that come from understanding where these IntMaps arise.
Richard
On Jul 2, 2021, at 4:08 AM, Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
Jeff
Great stuff! Welcome.
I strongly urge you to keep a constantly-update status wiki page, which lists the ideas you are working on, and points to relevant resources and tickets. An email thread like this is a good way to gather ideas, but NOT a good way to organise and track them.
Looking carefully at profiles is a good plan. That’s the hard data!
I think that some careful investigation of IntMap (at least the bits that GHC uses heavily) would be a good idea. Clearly we spend a lot of time in these maps, so speedups here will yield a lot of benefit. Look at the heavy hitters from the profile, stare at the Core code and see if it’s s good as it can be.
For example, Sebastian discovered a strange infelicity in IntMap.lookup, which I’ve documented in a new ticket
https://gitlab.haskell.org/ghc/ghc/-/issues/20069 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F20069&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368167086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dSEa4md56IpyMGRPHW%2BVjWtZQGDi7vpVgmOCukyHTIU%3D&reserved=0
I think it’d also be worth measuring how unbalanced our IntMap trees get. See
https://gitlab.haskell.org/ghc/ghc/-/issues/19820 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F19820&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368167086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5rgCmwBbb4mZ9NWoT7RtzRPFFND4e13k2XWeQkwgM%2FY%3D&reserved=0
The speculation there is that we are getting very unbalanced trees. So measure it! If it’s true, we could improve matters by using a different IntMap; or maybe by scrambling the key a bit --- see the ticket.
Simon
*From:* ghc-devs
*On Behalf Of *Young, Jeff *Sent:* 02 July 2021 02:36 *To:* ghc-devs@haskell.org *Subject:* Trying to speedup GHC compile times...Help! Hi ghc devs,
I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F18541&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368177079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=az67TZRzofQayj%2BGsc0aYUibVff1Z%2Fs0%2Bvvt4oD6yaM%3D&reserved=0 and 18535 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F18535&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368177079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zVUtIY1ux%2BRfHBUsI2BoHM3hOK9O8p0W90F7qqk7TeA%3D&reserved=0. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole.
To be specific I'm looking for lingering items such as:
1. It would be great if we had <thing-here> but no one has time.
2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>.
3. Swap <some-type> to an array-like type *non-incrementally*, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module.
Point 2 is a specific reference to MR 3571 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F3571&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368187075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=YnhnkOZXRoYFA18EvNUh4t5lUbPrp4rLpOVrtlNMnI8%3D&reserved=0 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere).
Thanks for the help!
- Jeff
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7Cd477d24fb159472b77ab08d93d5cfbb7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637608293368187075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bCmRr98NfDWWsZXU1WbOFdm0ScDqoj11UR%2Fc%2BzQ1dNs%3D&reserved=0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
 
            GHC precisely use "the rapier".
S
From: Christiaan Baaij 
 
            Hi Jeff,
Welcome to ghc-devs and looking forward to seeing you around.
If you are not already aware, there are a set of grafana dashboards
for tracking compiler performance. Ones of particular interest might
be the `head.hackage` and `Cabal test` dashboards.
* Head.hackage (https://grafana.gitlab.haskell.org/d/7T7oEMlMz/head-hackage-performance?orgId=2&from=now-30d&to=now)
* Cabal test (https://grafana.gitlab.haskell.org/d/Zf4HCvz7z/cabal-test?orgId=2)
Looking at the data it's clear that simplification and code generation
are the two expensive phases in compilation.
So I think some useful things to look at would be:
1. Avoiding pointless work when substituting as suggested by Ben could
reduce allocations significantly in the simplifier.
2. Looking into optimising the pretty printing during code generation.
Ticky profiles indicate that certain pretty printer functions are
called many times and allocate a lot. (I don't have a ticky profile to
hand)
3. Modify the `perf` and `head.hackage` pipeline stages to fetch
recent perf stats from the database and print a comparison to the
baseline.
Cheers,
Matt
On Fri, Jul 2, 2021 at 4:05 PM Simon Peyton Jones via ghc-devs
GHC precisely use “the rapier”.
S
From: Christiaan Baaij
Sent: 02 July 2021 15:38 To: Simon Peyton Jones Cc: Richard Eisenberg ; Young, Jeff ; ghc-devs@haskell.org Subject: Re: Trying to speedup GHC compile times...Help! Somewhat off-topic: does GHC no longer use "the rapier"? I thought the InScopeSet was needed to check that we can safely skip on extending the substitution as you go under binders when the binder is not in the InScopeSet (naively you would always have to rename binders, and thus extend the substitution, in order to avoid capture as you go under binders). i.e. the IntMap is not just used to generate new variable names, but to ensure the compiler does less work in the form of doing fewer substitutions.
On Fri, 2 Jul 2021 at 16:12, Simon Peyton Jones via ghc-devs
wrote: There are lot of places where it would be pretty tiresome to plumb a unique supply guaranteed unique from every other. I think the current setup works pretty well – but I bet we can squeeze cycles out of its implementation.
Simon
From: Richard Eisenberg
Sent: 02 July 2021 14:26 To: Simon Peyton Jones Cc: Young, Jeff ; ghc-devs@haskell.org Subject: Re: Trying to speedup GHC compile times...Help! One piece I'm curious about, reading this thread: why do we have so many IntMaps and operations on them? Name lookup is a fundamental operation a compiler must do, and that would use an IntMap: good. But maybe there are other IntMaps used that are less necessary. A key example: whenever we do substitution, we track an InScopeSet, which is really just an IntMap. This InScopeSet remembers the name of all variables in scope, useful when we need to create a new variable name (this is done by uniqAway). Yet perhaps the tracking of these in-scope variables is very expensive and comprises much of the IntMap time. Might it be better just to always work in a monad capable of giving fresh names? We actually don't even need a monad, if that's too annoying. Instead, we could just pass around an infinite list of fresh uniques. This would still be clutterful, but if it grants us a big speed improvement, the clutter might be worth it.
The high-level piece here is that there may be good things that come from understanding where these IntMaps arise.
Richard
On Jul 2, 2021, at 4:08 AM, Simon Peyton Jones via ghc-devs
wrote: Jeff
Great stuff! Welcome.
I strongly urge you to keep a constantly-update status wiki page, which lists the ideas you are working on, and points to relevant resources and tickets. An email thread like this is a good way to gather ideas, but NOT a good way to organise and track them.
Looking carefully at profiles is a good plan. That’s the hard data!
I think that some careful investigation of IntMap (at least the bits that GHC uses heavily) would be a good idea. Clearly we spend a lot of time in these maps, so speedups here will yield a lot of benefit. Look at the heavy hitters from the profile, stare at the Core code and see if it’s s good as it can be.
For example, Sebastian discovered a strange infelicity in IntMap.lookup, which I’ve documented in a new ticket
https://gitlab.haskell.org/ghc/ghc/-/issues/20069
I think it’d also be worth measuring how unbalanced our IntMap trees get. See
https://gitlab.haskell.org/ghc/ghc/-/issues/19820
The speculation there is that we are getting very unbalanced trees. So measure it! If it’s true, we could improve matters by using a different IntMap; or maybe by scrambling the key a bit --- see the ticket.
Simon
From: ghc-devs
On Behalf Of Young, Jeff Sent: 02 July 2021 02:36 To: ghc-devs@haskell.org Subject: Trying to speedup GHC compile times...Help! Hi ghc devs,
I'm a long-time Haskeller but am just getting into GHC development. I started a 12 week internship at Tweag I/O under Richard Eisenberg this week with the singular goal to speedup GHC compile times. I'm specifically looking to contribute to ghc issues 18541and 18535. So I thought I would reach out to the community to get some direction on issues/features/problems to tackle in the pursuit of faster compilation times. This is a full time internship and so I think there is a real opportunity to nail down a deliverable for the community, but I want to get some guidance from the experts (you fine people!) before going down a rabbit hole.
To be specific I'm looking for lingering items such as:
1. It would be great if we had <thing-here> but no one has time.
2. Primop foo is half complete but is the right type for <common-use-case-which-is-currently-just-a-list>.
3. Swap <some-type> to an array-like type non-incrementally, that is, establish a patch that rips out the previous type and replaces it with the array-like across the entire compiler, rather than module-by-module.
Point 2 is a specific reference to MR 3571 but I'm unsure of the status and etiquette around MRs, and I'm unsure exactly how fulfilling the todos at the end of that MR would aid in faster compilation times (and if there is evidence to that effect somewhere).
Thanks for the help!
- Jeff
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
 
            On Fri, Jul 02, 2021 at 08:08:39AM +0000, Simon Peyton Jones via ghc-devs wrote:
I strongly urge you to keep a constantly-update status wiki page, which lists the ideas you are working on, and points to relevant resources and tickets. An email thread like this is a good way to gather ideas, but NOT a good way to organise and track them.
I remain curious as to whether "Scrap your type applications" is worth a second look. There are edge cases in which compile time blowup is a result of type blowup (as opposed to code blowup via inlining). Might GHC have changed enough in the last ~5 years to make it now "another compiler": https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/if.pdf (Section 4.4): Overall, allocation decreased by a mere 0.1%. The largest reduction was 4%, and the largest increase was 12%, but 120 of the 130 modules showed a change of less than 1%. Presumably, the reduction in work that arises from smaller types is balanced by the additional overheads of SystemIF. On this evidence, the additional complexity introduced by the new reduction rules does not pay its way. Nevertheless, these are matters that are dominated by nitty-gritty representation details, and the balance might well be different in another compiler. Could it be that some of the more compile time intensive packages on hackage (aeson, vector, ...) would benefit more than the various modules in base? Wild speculation aside, of course finding and fixing inefficiencies in the implementation of existing common primitive should be a win across the board, and should not require changing major compiler design features, just leaner code. -- Viktor.
 
            I love "Scrap Your Type Applications" (SYTA) too, although I'm a little biased since I'm a co-author.
But SYTA is a change that has a pretty pervasive effect on the way GHC manipulates types. Since then we've added TypeInType, from which a lot of consequences flowed.  I simply don't know how hard it'd be to do a "scrap your type applications" job on GHC today. I agree that the cost/benefit tradeoff could have shifted.
We can only find out by trying it.  But trying it would take quite a lot of work.  On the other hand, SYTA is the only principled approach that I know of that solves the type blow-up we get with deeply-nested data types (notoriously, tuples).  It's a problem we have known of for decades, but is still essentially unsolved.
Simon
|  -----Original Message-----
|  From: ghc-devs 
participants (7)
- 
                 Ben Gamari Ben Gamari
- 
                 Christiaan Baaij Christiaan Baaij
- 
                 Matthew Pickering Matthew Pickering
- 
                 Richard Eisenberg Richard Eisenberg
- 
                 Simon Peyton Jones Simon Peyton Jones
- 
                 Viktor Dukhovni Viktor Dukhovni
- 
                 Young, Jeff Young, Jeff