Recompilation avoidance questions

newer
Re: "Extensible interface files"...

older
"Extensible interface files" work...

Ömer Sinan Ağacan

21 Apr 2020 21 Apr '20

10:37 a.m.

Hi all, I'm currently reading the "recompilation avoidance" wiki page [1], and I have a few questions about the current design. The wiki page says (in the paragraph "Suppose the change to D ...") if a module B re-exports x from module D, changing x in D does not cause any changes in B's interface. I'm wondering why this is the case. To me this doesn't make sense. Anything that can potentially effect users of B should be a part of B's interface. This includes re-exports. I don't understand why there is a difference between normal exports and re-exports. As far as users of the module concerned there's no difference. So I'd expect any changes in re-exports to make a difference in B's interface. The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of B's interface) that this is because sometimes changes in D.x should not cause recompiling B's users. I don't understand why (1) would cause this problem. If we make x a part of B, as if it's defined in B, similar to how we can avoid recompilation of users of B when a definition of B changes but the interface is the same, we could avoid recompiling users when D.x changes. For example, -- B.hs module B where b = 123123 -- Main.hs import B main = print b $ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) [2 of 2] Compiling Main ( Main.hs, Main.o ) Linking Main ... Now if I update B and recompile I'll only link Main, won't recompile it: -- B.hs module B where b = 123123 + 12308 $ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) Linking Main ... Now suppose B.b was a re-export from D. I don't understand why changing it in D would cause recompiling Main if we make b a part of B's interface. I think what would happen is: because D's interface hash won't change we won't recompile B. No problems at all. Finally, I'm a bit confused about this part

...

To ensure that A is recompiled, we therefore have two options: ... (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand touching can't force recompilation. Example: $ ghc-stage1 Main.hs [1 of 3] Compiling A ( A.hs, A.o ) [2 of 3] Compiling B ( B.hs, B.o ) [3 of 3] Compiling Main ( Main.hs, Main.o ) Linking Main ... $ touch A.hi $ ghc-stage1 Main.hs $ touch B.hi $ ghc-stage1 Main.hs Am I missing anything? Thanks, Ömer [1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation...

Show replies by date

Simon Marlow

22 Apr 22 Apr

9:02 a.m.

On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan wrote:

...

Hi all,

I'm currently reading the "recompilation avoidance" wiki page [1], and I have a few questions about the current design.

The wiki page says (in the paragraph "Suppose the change to D ...") if a module B re-exports x from module D, changing x in D does not cause any changes in B's interface.

I'm wondering why this is the case. To me this doesn't make sense. Anything that can potentially effect users of B should be a part of B's interface. This includes re-exports. I don't understand why there is a difference between normal exports and re-exports. As far as users of the module concerned there's no difference. So I'd expect any changes in re-exports to make a difference in B's interface.

Yes, that's already the case. Under "Deciding whether to recompile", we say: * If anything else has changed in a way that would affect the results of compiling this module, we must recompile. so that's the basic requirement. We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work this way a long time ago, I think GHC 2.x was when it changed) The wiki page says (in "Why not do (1)", where (1) refers to making D.x

...

part of B's interface)

here (1) refers to 1. arrange that make knows about the dependency of A on D. which is not the same as making D.x part of B's interface. This section of the wiki page is about "make", incidentally.

...

that this is because sometimes changes in D.x should not cause recompiling B's users. I don't understand why (1) would cause this problem. If we make x a part of B, as if it's defined in B, similar to how we can avoid recompilation of users of B when a definition of B changes but the interface is the same, we could avoid recompiling users when D.x changes.

For example,

-- B.hs module B where

b = 123123

-- Main.hs import B

main = print b

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) [2 of 2] Compiling Main ( Main.hs, Main.o ) Linking Main ...

Now if I update B and recompile I'll only link Main, won't recompile it:

-- B.hs module B where

b = 123123 + 12308

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) Linking Main ...

Now suppose B.b was a re-export from D. I don't understand why changing it in D would cause recompiling Main if we make b a part of B's interface. I think what would happen is: because D's interface hash won't change we won't recompile B. No problems at all.

I think this all stems from the confusion above.

...

Finally, I'm a bit confused about this part

...
To ensure that A is recompiled, we therefore have two options: ... (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand touching can't force recompilation. Example:

$ ghc-stage1 Main.hs [1 of 3] Compiling A ( A.hs, A.o ) [2 of 3] Compiling B ( B.hs, B.o ) [3 of 3] Compiling Main ( Main.hs, Main.o ) Linking Main ... $ touch A.hi $ ghc-stage1 Main.hs $ touch B.hi $ ghc-stage1 Main.hs

Am I missing anything?

Touching is relevant to "make" only, not ghc --make. Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer? Cheers Simon

...

Thanks,

Ömer

[1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation... _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Ömer Sinan Ağacan

23 Apr 23 Apr

8:17 a.m.

Thanks Simon,

...

We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot.

I think by definition you mean unfoldings, pragmas, annotations, and rules, right? I'm a bit surprised by this, because this would require tracking transitive dependencies, which is opposite of what we want to do in #16885. If M1 re-exports something from M2 and M0 imports M1 then I think we could consider M2 a direct import, but that complicates the story a little bit. I think we don't have to track *all* transitive deps though, only tracking re-export paths should be enough. So maybe this is not too bad. Ömer Simon Marlow , 22 Nis 2020 Çar, 12:02 tarihinde şunu yazdı:

...

On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan wrote:

...
Hi all,

I'm currently reading the "recompilation avoidance" wiki page [1], and I have a few questions about the current design.

The wiki page says (in the paragraph "Suppose the change to D ...") if a module B re-exports x from module D, changing x in D does not cause any changes in B's interface.

I'm wondering why this is the case. To me this doesn't make sense. Anything that can potentially effect users of B should be a part of B's interface. This includes re-exports. I don't understand why there is a difference between normal exports and re-exports. As far as users of the module concerned there's no difference. So I'd expect any changes in re-exports to make a difference in B's interface.

Yes, that's already the case. Under "Deciding whether to recompile", we say:

* If anything else has changed in a way that would affect the results of compiling this module, we must recompile.

so that's the basic requirement.

We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work this way a long time ago, I think GHC 2.x was when it changed)

...
The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of B's interface)

here (1) refers to

1. arrange that make knows about the dependency of A on D.

which is not the same as making D.x part of B's interface.

This section of the wiki page is about "make", incidentally.

...
that this is because sometimes changes in D.x should not cause recompiling B's users. I don't understand why (1) would cause this problem. If we make x a part of B, as if it's defined in B, similar to how we can avoid recompilation of users of B when a definition of B changes but the interface is the same, we could avoid recompiling users when D.x changes.

For example,

-- B.hs module B where

b = 123123

-- Main.hs import B

main = print b

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) [2 of 2] Compiling Main ( Main.hs, Main.o ) Linking Main ...

Now if I update B and recompile I'll only link Main, won't recompile it:

-- B.hs module B where

b = 123123 + 12308

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) Linking Main ...

Now suppose B.b was a re-export from D. I don't understand why changing it in D would cause recompiling Main if we make b a part of B's interface. I think what would happen is: because D's interface hash won't change we won't recompile B. No problems at all.

I think this all stems from the confusion above.

...
Finally, I'm a bit confused about this part

...
To ensure that A is recompiled, we therefore have two options: ... (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand touching can't force recompilation. Example:

$ ghc-stage1 Main.hs [1 of 3] Compiling A ( A.hs, A.o ) [2 of 3] Compiling B ( B.hs, B.o ) [3 of 3] Compiling Main ( Main.hs, Main.o ) Linking Main ... $ touch A.hi $ ghc-stage1 Main.hs $ touch B.hi $ ghc-stage1 Main.hs

Am I missing anything?

Touching is relevant to "make" only, not ghc --make. Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer?

Cheers Simon

...
Thanks,

Ömer

[1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation... _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Simon Marlow

12:55 p.m.

On Thu, 23 Apr 2020 at 09:17, Ömer Sinan Ağacan wrote:

...

Thanks Simon,

...
We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot.

I think by definition you mean unfoldings, pragmas, annotations, and rules, right?

And the types of bindings, and the definitions of types. Everything that is not the name, basically.

...

I'm a bit surprised by this, because this would require tracking transitive dependencies, which is opposite of what we want to do in #16885.

Not really. It's just a tradeoff between copying all the definitions (recursively) of things we need into the current module vs. leaving the definitions in the interface of the original module where the entity was defined. Even if we were to copy the definitions of things we depend on into the current module's interface, we still have to know where they came from, and to know when the original definition changes so that we can recompile. So I don't think there would be any difference in which modules we have to list in the current module's interface file usage list. Note: the "usages" in the interface file is different from the "dependencies". We're not proposing to change how "usages" work. The difference is explained in https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation... If M1 re-exports something from M2 and M0 imports M1 then I think we could

...

consider M2 a direct import, but that complicates the story a little bit. I think we don't have to track *all* transitive deps though, only tracking re-export paths should be enough. So maybe this is not too bad.

I think we already arrived at a reasonable design on #16885, what do you think of it? Also, David already listed all the places that would potentially need to change if we no longer include transitive dependencies in `dep_mods`: https://gitlab.haskell.org/ghc/ghc/issues/16885#note_215715 And a useful summary of the background is https://gitlab.haskell.org/ghc/ghc/-/merge_requests/931#note_208414 There was some subsequent discussion on #16885 about how to handle boot modules, and a proposal to fix that. Aside from that, the idea is to just remove transitive dependencies from `dep_mods` and fix up the places that used it, which David listed in that comment. Cheers Simon

...

Ömer

Simon Marlow , 22 Nis 2020 Çar, 12:02 tarihinde şunu yazdı:

...
On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan

...
...
Hi all,

I'm currently reading the "recompilation avoidance" wiki page [1], and

I have a

...
few questions about the current design.

The wiki page says (in the paragraph "Suppose the change to D ...") if a module B re-exports x from module D, changing x in D does not cause any changes in B's interface.

I'm wondering why this is the case. To me this doesn't make sense. Anything that can potentially effect users of B should be a part of B's interface. This includes re-exports. I don't understand why there is a difference between normal exports and re-exports. As far as users of the module concerned there's no difference. So I'd expect any changes in re-exports to make a difference in B's interface.

Yes, that's already the case. Under "Deciding whether to recompile", we say:

* If anything else has changed in a way that would affect the results of compiling this module, we must recompile.

so that's the basic requirement.

We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work

wrote: this way a long time ago, I think GHC 2.x was when it changed)

...
...
The wiki page says (in "Why not do (1)", where (1) refers to making D.x

...
...
B's interface)

here (1) refers to

1. arrange that make knows about the dependency of A on D.

which is not the same as making D.x part of B's interface.

This section of the wiki page is about "make", incidentally.

...
that this is because sometimes changes in D.x should not cause recompiling B's users. I don't understand why (1) would cause this

...
...
we make x a part of B, as if it's defined in B, similar to how we can avoid recompilation of users of B when a definition of B changes but the interface is the same, we could avoid recompiling users when D.x changes.

For example,

-- B.hs module B where

b = 123123

-- Main.hs import B

main = print b

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) [2 of 2] Compiling Main ( Main.hs, Main.o ) Linking Main ...

Now if I update B and recompile I'll only link Main, won't recompile it:

-- B.hs module B where

b = 123123 + 12308

$ ghc-stage1 Main.hs [1 of 2] Compiling B ( B.hs, B.o ) Linking Main ...

Now suppose B.b was a re-export from D. I don't understand why changing it in D would cause recompiling Main if we make b a part of B's interface. I

part of problem. If think what

...
...
would happen is: because D's interface hash won't change we won't recompile B. No problems at all.

I think this all stems from the confusion above.

...
Finally, I'm a bit confused about this part

...
To ensure that A is recompiled, we therefore have two options: ... (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand

touching

...
can't force recompilation. Example:

$ ghc-stage1 Main.hs [1 of 3] Compiling A ( A.hs, A.o ) [2 of 3] Compiling B ( B.hs, B.o ) [3 of 3] Compiling Main ( Main.hs, Main.o ) Linking Main ... $ touch A.hi $ ghc-stage1 Main.hs $ touch B.hi $ ghc-stage1 Main.hs

Am I missing anything?

Touching is relevant to "make" only, not ghc --make. Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer?

Cheers Simon

...
Thanks,

Ömer

[1]:

https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation...

...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

1900

Age (days ago)

1902

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Simon Marlow
Ömer Sinan Ağacan