Why TcLclEnv and DsGblEnv need to store the same IORef for errors?

Hello folks, as some of you might know me and Richard are reworking how GHC constructs, emits and deals with errors and warnings (See https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values and #18516). To summarise very briefly the spirit, we will have (eventually) proper domain-specific types instead of SDocs. The idea is to have very precise and "focused" types for the different phases of the compilation pipeline, and a "catch-all" monomorphic `GhcMessage` type used for the final pretty-printing and exception-throwing: data GhcMessage where GhcPsMessage :: PsMessage -> GhcMessage GhcTcRnMessage :: TcRnMessage -> GhcMessage GhcDsMessage :: DsMessage -> GhcMessage GhcDriverMessage :: DriverMessage -> GhcMessage GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage While starting to refactor GHC to use these types, I have stepped into something bizarre: the `DsGblEnv` and `TcLclEnv` envs both share the same `IORef` to store the diagnostics (i.e. warnings and errors) accumulated during compilation. More specifically, a function like `GHC.HsToCore.Monad.mkDsEnvsFromTcGbl` simply receives as input the `IORef` coming straight from the `TcLclEnv`, and stores it into the `DsGblEnv`. This is unfortunate, because it would force me to change the type of this `IORef` to be `IORef (Messages GhcMessage)` to accommodate both diagnostic types, but this would bubble up into top-level functions like `initTc`, which would now return a `Messages GhcMessage`. This is once again unfortunate, because is "premature": ideally it might still be nice to return `Messages TcRnMessage`, so that GHC API users could get a very precise diagnostic type rather than the bag `GhcMessage` is. It also violates an implicit contract: we are saying that `initTc` might return (potentially) *any* GHC diagnostic message (including, for example, driver errors/warnings), which I think is misleading. Having said all of that, it's also possible that returning `Messages GhcMessage` is totally fine here and we don't need to be able to do this fine-grained distinction for the GHC API functions. Regardless, I would like to ask the audience: * Why `TcLclEnv` and `DsGblEnv` need to share the same IORef? * Is this for efficiency reasons? * Is this because we need the two monads to independently accumulate errors into the same IORef? Thanks! Alfredo

I think the main reason is that for Template Haskell the renamer/type-checker need to run the desugarer. See the call to initDsTc in GHC.Tc.Gen.Splice.
I suppose an alternative is that the TcGblEnv could have a second IORef to use for error messages that come from desugaring during TH splices.
Nothing deeper than that I think.
Simon
From: ghc-devs

Right, I see, thanks.
This is what I was attempting so far:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
and later:
liftTcRnMessages :: MonadIO m => IORef (Messages TcRnMessage) -> m (IORef
(Messages DsMessage))
liftTcRnMessages ref = liftIO $ do
oldContent <- readIORef ref
newIORef (DsLiftedTcRnMessage <$> oldContent)
...
mkDsEnvsFromTcGbl :: MonadIO m
=> HscEnv -> IORef (Messages TcRnMessage) -> TcGblEnv
-> m (DsGblEnv, DsLclEnv)
mkDsEnvsFromTcGbl hsc_env msg_var tcg_env
= do { cc_st_var <- liftIO $ newIORef newCostCentreState
...
; msg_var' <- liftTcRnMessages msg_var
; return $ mkDsEnvs unit_env this_mod rdr_env type_env fam_inst_env
msg_var' cc_st_var complete_matches
}
While this typechecks, I wonder if that's the right way to think about it
-- from your reply, it seems like the dependency is in the opposite
direction -- we need to store desugaring diagnostics in the TcM due to TH
splicing, not the other way around.
I'll explore the idea of adding a second IORef.
Thanks!
On Tue, 30 Mar 2021 at 10:51, Simon Peyton Jones
I think the main reason is that for Template Haskell the renamer/type-checker need to run the desugarer. See the call to initDsTc in GHC.Tc.Gen.Splice.
I suppose an alternative is that the TcGblEnv could have a second IORef to use for error messages that come from desugaring during TH splices.
Nothing deeper than that I think.
Simon
*From:* ghc-devs
*On Behalf Of *Alfredo Di Napoli *Sent:* 30 March 2021 08:42 *To:* Simon Peyton Jones via ghc-devs *Subject:* Why TcLclEnv and DsGblEnv need to store the same IORef for errors? Hello folks,
as some of you might know me and Richard are reworking how GHC constructs, emits and deals with errors and warnings (See https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fwikis%2FErrors-as-(structured)-values&data=04%7C01%7Csimonpj%40microsoft.com%7C49c033aa2865495eb07c08d8f34f70cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637526870280012102%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ui4JWOp1gl5Yh%2FOYDqcBLXxTm%2FGnQRi0cDshUVEjwmk%3D&reserved=0 and #18516).
To summarise very briefly the spirit, we will have (eventually) proper domain-specific types instead of SDocs. The idea is to have very precise and "focused" types for the different phases of the compilation pipeline, and a "catch-all" monomorphic `GhcMessage` type used for the final pretty-printing and exception-throwing:
data GhcMessage where
GhcPsMessage :: PsMessage -> GhcMessage
GhcTcRnMessage :: TcRnMessage -> GhcMessage
GhcDsMessage :: DsMessage -> GhcMessage
GhcDriverMessage :: DriverMessage -> GhcMessage
GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage
While starting to refactor GHC to use these types, I have stepped into something bizarre: the `DsGblEnv` and `TcLclEnv` envs both share the same `IORef` to store the diagnostics (i.e. warnings and errors) accumulated during compilation. More specifically, a function like `GHC.HsToCore.Monad.mkDsEnvsFromTcGbl` simply receives as input the `IORef` coming straight from the `TcLclEnv`, and stores it into the `DsGblEnv`.
This is unfortunate, because it would force me to change the type of this `IORef` to be
`IORef (Messages GhcMessage)` to accommodate both diagnostic types, but this would bubble up into top-level functions like `initTc`, which would now return a `Messages GhcMessage`. This is once again unfortunate, because is "premature": ideally it might still be nice to return `Messages TcRnMessage`, so that GHC API users could get a very precise diagnostic type rather than the bag `GhcMessage` is. It also violates an implicit contract: we are saying that `initTc` might return (potentially) *any* GHC diagnostic message (including, for example, driver errors/warnings), which I think is misleading.
Having said all of that, it's also possible that returning `Messages GhcMessage` is totally fine here and we don't need to be able to do this fine-grained distinction for the GHC API functions. Regardless, I would like to ask the audience:
* Why `TcLclEnv` and `DsGblEnv` need to share the same IORef?
* Is this for efficiency reasons?
* Is this because we need the two monads to independently accumulate errors into the
same IORef?
Thanks!
Alfredo

On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that). What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe. I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs). Richard

Hello folks,
Richard: as I was in the middle of some other refactoring by the time Simon
replied, you can see a potential refactoring that *doesn't* use the double
IORef, but rather this idea of having a `DsMessage` embed `TcRnMessage`(s)
via a new costructor:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4798/diffs#6eaba7424490c...
(Just grep for "DsMessage", "DsUnknownMessage", and `DsLiftedTcRnMessage`
to see the call sites).
The end result is not bad, I have to say. Or, at least, it doesn't
strike me as totally horrid :)
A.
On Tue, 30 Mar 2021 at 16:14, Richard Eisenberg
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard

Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with - GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243 - The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an... we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase. (One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.) Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration. John On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
mailto:alfredo.dinapoli@gmail.com> wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Morning all,
*Richard*: sorry! Unfortunately MR !4798 is the cornerstone of this
refactoring work but it's also gargantuan. Let's discuss a plan to attack
it, but fundamentally there is a critical mass of changes that needs to
happen atomically or it wouldn't make much sense, and alas this doesn't
play in our favour when it comes to MR size and ease of review. However, to
quickly reply to your remak: currently (for the sake of the
"minimum-viable-product") I am trying to stabilise the external interfaces,
by which I mean giving functions their final type signature while I do
what's easiest to make things typecheck. In this phase what I think is the
easiest is to wrap the majority of diagnostics into the `xxUnknownxx`
constructor, and change them gradually later. A fair warning, though: you
say "I would think that a DsMessage would later be wrapped in an envelope."
This might be true for Ds messages (didn't actually invest any brain cycles
to check that) but in general we have to turn a message into an envelope as
soon as we have a chance to do so, because we need to grab the `SrcSpan`
and the `DynFlags` *at the point of creation* of the diagnostics. Carrying
around a message and make it bubble up at some random point won't be a good
plan (even for Ds messages). Having said that, I clearly have very little
knowledge about this area of GHC, so feel free to disagree :)
*John*: Although it's a bit hard to predict how well this is going to
evolve, my current embedding, to refresh everyone's memory, is the
following:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
-- ^ Stop-gap constructor to ease the migration.
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
-- More messages added in the future, of course
At first I thought this was the wrong way around, due to Simon's comment,
but this actually creates pleasant external interfaces. To give you a bunch
of examples from MR !4798:
deSugar :: HscEnv -> ModLocation -> TcGblEnv -> IO (Messages DsMessage,
Maybe ModGuts)
deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages DsMessage, Maybe
CoreExpr)
Note something interesting: the second function actually calls
`runTcInteractive` inside the body, but thanks to the `DsLiftedTcRnMessage`
we can still expose to the consumer an opaque `DsMessage` , which is what I
would expect to see from a function called "deSugarExpr". Conversely, I
would be puzzled to find those functions returning a `TcRnDsMessage`.
Having said all of that, I am not advocating this design is "the best". I
am sure we will iterate on it. I am just reporting that even this baseline
seems to be decent from an API perspective :)
On Wed, 31 Mar 2021 at 05:45, John Ericson
Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with
- GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243
- The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an...
we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase.
(One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.)
Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration.
John On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing listghc-devs@haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Follow up:
Argh! I have just seen that I have a bunch of test failures related to my
MR (which, needless to say, it's still WIP).
For example:
run/T9140.run.stdout.normalised 2021-03-31 09:35:48.000000000 +0200
@@ -1,12 +1,4 @@
-<interactive>:2:5:
- You can't mix polymorphic and unlifted bindings: a = (# 1 #)
- Probable fix: add a type signature
-
-<interactive>:3:5:
- You can't mix polymorphic and unlifted bindings: a = (# 1, 3 #)
- Probable fix: add a type signature
-
So it looks like some diagnostic is now not being reported and, surprise
surprise, this was emitted from the DsM monad.
I have the suspect that indeed Richard was right (like he always is :) ) --
when we go from a DsM to a TcM monad (See `initDsTc`) for example, I think
we also need to carry into the new monad all the diagnostics we collected
so far.
This implies indeed a mutual dependency (as Simon pointed out, heh).
So I think my cunning plan of embedding is crumbling -- I suspect we would
end up with a type `TcRnDsMessage` which captures the dependency.
Sorry for not seeing it sooner!
On Wed, 31 Mar 2021 at 08:05, Alfredo Di Napoli
Morning all,
*Richard*: sorry! Unfortunately MR !4798 is the cornerstone of this refactoring work but it's also gargantuan. Let's discuss a plan to attack it, but fundamentally there is a critical mass of changes that needs to happen atomically or it wouldn't make much sense, and alas this doesn't play in our favour when it comes to MR size and ease of review. However, to quickly reply to your remak: currently (for the sake of the "minimum-viable-product") I am trying to stabilise the external interfaces, by which I mean giving functions their final type signature while I do what's easiest to make things typecheck. In this phase what I think is the easiest is to wrap the majority of diagnostics into the `xxUnknownxx` constructor, and change them gradually later. A fair warning, though: you say "I would think that a DsMessage would later be wrapped in an envelope." This might be true for Ds messages (didn't actually invest any brain cycles to check that) but in general we have to turn a message into an envelope as soon as we have a chance to do so, because we need to grab the `SrcSpan` and the `DynFlags` *at the point of creation* of the diagnostics. Carrying around a message and make it bubble up at some random point won't be a good plan (even for Ds messages). Having said that, I clearly have very little knowledge about this area of GHC, so feel free to disagree :)
*John*: Although it's a bit hard to predict how well this is going to evolve, my current embedding, to refresh everyone's memory, is the following:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
-- ^ Stop-gap constructor to ease the migration.
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
-- More messages added in the future, of course
At first I thought this was the wrong way around, due to Simon's comment, but this actually creates pleasant external interfaces. To give you a bunch of examples from MR !4798:
deSugar :: HscEnv -> ModLocation -> TcGblEnv -> IO (Messages DsMessage, Maybe ModGuts) deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages DsMessage, Maybe CoreExpr)
Note something interesting: the second function actually calls `runTcInteractive` inside the body, but thanks to the `DsLiftedTcRnMessage` we can still expose to the consumer an opaque `DsMessage` , which is what I would expect to see from a function called "deSugarExpr". Conversely, I would be puzzled to find those functions returning a `TcRnDsMessage`.
Having said all of that, I am not advocating this design is "the best". I am sure we will iterate on it. I am just reporting that even this baseline seems to be decent from an API perspective :)
On Wed, 31 Mar 2021 at 05:45, John Ericson
wrote: Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with
- GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243
- The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an...
we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase.
(One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.)
Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration.
John On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli < alfredo.dinapoli@gmail.com> wrote:
I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing listghc-devs@haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I might still be tempted to do: data DsMessage = ... | DsLiftedTcRnMessage !TcRnMessage -- ^ A diagnostic coming straight from the Typecheck-renamer. data TcRnMessage = ... | TcRnLiftedDsMessage !DsMessage -- ^ A diagnostic coming straight from the Desugarer. tying them together with hs-boot. Yes, that means one can do some silly `TcRnLiftedDsMessage . DsLiftedTcRnMessage . TcRnLiftedDsMessage ...`, but that could even show up in a render as "while desugaring a splice during type checking, while typechecking during desguaring, ..." so arguably the information the wrapping isn't purely superfluous. I think this would pose no practical problem today, while still "soft enforcing" the abstraction boundaries we want. On 3/31/21 3:45 AM, Alfredo Di Napoli wrote:
Follow up:
Argh! I have just seen that I have a bunch of test failures related to my MR (which, needless to say, it's still WIP).
For example:
run/T9140.run.stdout.normalised 2021-03-31 09:35:48.000000000 +0200 @@ -1,12 +1,4 @@ -<interactive>:2:5: - You can't mix polymorphic and unlifted bindings: a = (# 1 #) - Probable fix: add a type signature - -<interactive>:3:5: - You can't mix polymorphic and unlifted bindings: a = (# 1, 3 #) - Probable fix: add a type signature -
So it looks like some diagnostic is now not being reported and, surprise surprise, this was emitted from the DsM monad.
I have the suspect that indeed Richard was right (like he always is :) ) -- when we go from a DsM to a TcM monad (See `initDsTc`) for example, I think we also need to carry into the new monad all the diagnostics we collected so far.
This implies indeed a mutual dependency (as Simon pointed out, heh).
So I think my cunning plan of embedding is crumbling -- I suspect we would end up with a type `TcRnDsMessage` which captures the dependency.
Sorry for not seeing it sooner!
On Wed, 31 Mar 2021 at 08:05, Alfredo Di Napoli
mailto:alfredo.dinapoli@gmail.com> wrote: Morning all,
*Richard*: sorry! Unfortunately MR !4798 is the cornerstone of this refactoring work but it's also gargantuan. Let's discuss a plan to attack it, but fundamentally there is a critical mass of changes that needs to happen atomically or it wouldn't make much sense, and alas this doesn't play in our favour when it comes to MR size and ease of review. However, to quickly reply to your remak: currently (for the sake of the "minimum-viable-product") I am trying to stabilise the external interfaces, by which I mean giving functions their final type signature while I do what's easiest to make things typecheck. In this phase what I think is the easiest is to wrap the majority of diagnostics into the `xxUnknownxx` constructor, and change them gradually later. A fair warning, though: you say "I would think that a DsMessage would later be wrapped in an envelope." This might be true for Ds messages (didn't actually invest any brain cycles to check that) but in general we have to turn a message into an envelope as soon as we have a chance to do so, because we need to grab the `SrcSpan` and the `DynFlags` *at the point of creation* of the diagnostics. Carrying around a message and make it bubble up at some random point won't be a good plan (even for Ds messages). Having said that, I clearly have very little knowledge about this area of GHC, so feel free to disagree :)
*John*: Although it's a bit hard to predict how well this is going to evolve, my current embedding, to refresh everyone's memory, is the following:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
-- ^ Stop-gap constructor to ease the migration.
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
-- More messages added in the future, of course
At first I thought this was the wrong way around, due to Simon's comment, but this actually creates pleasant external interfaces. To give you a bunch of examples from MR !4798:
deSugar :: HscEnv -> ModLocation -> TcGblEnv -> IO (Messages DsMessage, Maybe ModGuts)
deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages DsMessage, Maybe CoreExpr)
Note something interesting: the second function actually calls `runTcInteractive` inside the body, but thanks to the `DsLiftedTcRnMessage` we can still expose to the consumer an opaque `DsMessage` , which is what I would expect to see from a function called "deSugarExpr". Conversely, I would be puzzled to find those functions returning a `TcRnDsMessage`.
Having said all of that, I am not advocating this design is "the best". I am sure we will iterate on it. I am just reporting that even this baseline seems to be decent from an API perspective :)
On Wed, 31 Mar 2021 at 05:45, John Ericson
wrote: Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with
- GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243 https://github.com/ghc-proposals/ghc-proposals/pull/243
- The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an... https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an...
we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase.
(One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.)
Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration.
John
On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
mailto:alfredo.dinapoli@gmail.com> wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hello all,
John: right, I am not opposed to what you describe, but at the end of the
day we need to add all these messages to a single IORef (unless we go with
the two IORef idea that Richard is not fond of), and for that we need a
single monomorphic type, which could be, initially, even something like:
type TcRnDsMessage = Either DsMessage TcRnMessage
I guess I'll have to iterate on this until we get something meaningful and
that passes the full testsuite :)
A.
On Wed, 31 Mar 2021 at 16:36, John Ericson
I might still be tempted to do: data DsMessage = ... | DsLiftedTcRnMessage !TcRnMessage -- ^ A diagnostic coming straight from the Typecheck-renamer.
data TcRnMessage = ... | TcRnLiftedDsMessage !DsMessage -- ^ A diagnostic coming straight from the Desugarer.
tying them together with hs-boot. Yes, that means one can do some silly `TcRnLiftedDsMessage . DsLiftedTcRnMessage . TcRnLiftedDsMessage ...`, but that could even show up in a render as "while desugaring a splice during type checking, while typechecking during desguaring, ..." so arguably the information the wrapping isn't purely superfluous.
I think this would pose no practical problem today, while still "soft enforcing" the abstraction boundaries we want.
On 3/31/21 3:45 AM, Alfredo Di Napoli wrote:
Follow up:
Argh! I have just seen that I have a bunch of test failures related to my MR (which, needless to say, it's still WIP).
For example:
run/T9140.run.stdout.normalised 2021-03-31 09:35:48.000000000 +0200 @@ -1,12 +1,4 @@
-<interactive>:2:5: - You can't mix polymorphic and unlifted bindings: a = (# 1 #) - Probable fix: add a type signature - -<interactive>:3:5: - You can't mix polymorphic and unlifted bindings: a = (# 1, 3 #) - Probable fix: add a type signature -
So it looks like some diagnostic is now not being reported and, surprise surprise, this was emitted from the DsM monad.
I have the suspect that indeed Richard was right (like he always is :) ) -- when we go from a DsM to a TcM monad (See `initDsTc`) for example, I think we also need to carry into the new monad all the diagnostics we collected so far.
This implies indeed a mutual dependency (as Simon pointed out, heh).
So I think my cunning plan of embedding is crumbling -- I suspect we would end up with a type `TcRnDsMessage` which captures the dependency.
Sorry for not seeing it sooner!
On Wed, 31 Mar 2021 at 08:05, Alfredo Di Napoli < alfredo.dinapoli@gmail.com> wrote:
Morning all,
*Richard*: sorry! Unfortunately MR !4798 is the cornerstone of this refactoring work but it's also gargantuan. Let's discuss a plan to attack it, but fundamentally there is a critical mass of changes that needs to happen atomically or it wouldn't make much sense, and alas this doesn't play in our favour when it comes to MR size and ease of review. However, to quickly reply to your remak: currently (for the sake of the "minimum-viable-product") I am trying to stabilise the external interfaces, by which I mean giving functions their final type signature while I do what's easiest to make things typecheck. In this phase what I think is the easiest is to wrap the majority of diagnostics into the `xxUnknownxx` constructor, and change them gradually later. A fair warning, though: you say "I would think that a DsMessage would later be wrapped in an envelope." This might be true for Ds messages (didn't actually invest any brain cycles to check that) but in general we have to turn a message into an envelope as soon as we have a chance to do so, because we need to grab the `SrcSpan` and the `DynFlags` *at the point of creation* of the diagnostics. Carrying around a message and make it bubble up at some random point won't be a good plan (even for Ds messages). Having said that, I clearly have very little knowledge about this area of GHC, so feel free to disagree :)
*John*: Although it's a bit hard to predict how well this is going to evolve, my current embedding, to refresh everyone's memory, is the following:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
-- ^ Stop-gap constructor to ease the migration.
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
-- More messages added in the future, of course
At first I thought this was the wrong way around, due to Simon's comment, but this actually creates pleasant external interfaces. To give you a bunch of examples from MR !4798:
deSugar :: HscEnv -> ModLocation -> TcGblEnv -> IO (Messages DsMessage, Maybe ModGuts) deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages DsMessage, Maybe CoreExpr)
Note something interesting: the second function actually calls `runTcInteractive` inside the body, but thanks to the `DsLiftedTcRnMessage` we can still expose to the consumer an opaque `DsMessage` , which is what I would expect to see from a function called "deSugarExpr". Conversely, I would be puzzled to find those functions returning a `TcRnDsMessage`.
Having said all of that, I am not advocating this design is "the best". I am sure we will iterate on it. I am just reporting that even this baseline seems to be decent from an API perspective :)
On Wed, 31 Mar 2021 at 05:45, John Ericson
wrote: Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with
- GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243
- The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an...
we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase.
(One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.)
Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration.
John On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli < alfredo.dinapoli@gmail.com> wrote:
I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing listghc-devs@haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Yeah good point. Ultimately, I hope we can abstract over things like the IORef itself so that while TcM can kick off DsM and vice-versa, each monad can only log messages of the right type, but that can come later. Good luck, very excited to see this work happen! John On 4/1/21 2:00 AM, Alfredo Di Napoli wrote:
Hello all,
John: right, I am not opposed to what you describe, but at the end of the day we need to add all these messages to a single IORef (unless we go with the two IORef idea that Richard is not fond of), and for that we need a single monomorphic type, which could be, initially, even something like:
type TcRnDsMessage = Either DsMessage TcRnMessage
I guess I'll have to iterate on this until we get something meaningful and that passes the full testsuite :)
A.
On Wed, 31 Mar 2021 at 16:36, John Ericson
wrote: I might still be tempted to do:
data DsMessage = ... | DsLiftedTcRnMessage !TcRnMessage -- ^ A diagnostic coming straight from the Typecheck-renamer.
data TcRnMessage = ... | TcRnLiftedDsMessage !DsMessage -- ^ A diagnostic coming straight from the Desugarer.
tying them together with hs-boot. Yes, that means one can do some silly `TcRnLiftedDsMessage . DsLiftedTcRnMessage . TcRnLiftedDsMessage ...`, but that could even show up in a render as "while desugaring a splice during type checking, while typechecking during desguaring, ..." so arguably the information the wrapping isn't purely superfluous.
I think this would pose no practical problem today, while still "soft enforcing" the abstraction boundaries we want.
On 3/31/21 3:45 AM, Alfredo Di Napoli wrote:
Follow up:
Argh! I have just seen that I have a bunch of test failures related to my MR (which, needless to say, it's still WIP).
For example:
run/T9140.run.stdout.normalised 2021-03-31 09:35:48.000000000 +0200 @@ -1,12 +1,4 @@ -<interactive>:2:5: - You can't mix polymorphic and unlifted bindings: a = (# 1 #) - Probable fix: add a type signature - -<interactive>:3:5: - You can't mix polymorphic and unlifted bindings: a = (# 1, 3 #) - Probable fix: add a type signature -
So it looks like some diagnostic is now not being reported and, surprise surprise, this was emitted from the DsM monad.
I have the suspect that indeed Richard was right (like he always is :) ) -- when we go from a DsM to a TcM monad (See `initDsTc`) for example, I think we also need to carry into the new monad all the diagnostics we collected so far.
This implies indeed a mutual dependency (as Simon pointed out, heh).
So I think my cunning plan of embedding is crumbling -- I suspect we would end up with a type `TcRnDsMessage` which captures the dependency.
Sorry for not seeing it sooner!
On Wed, 31 Mar 2021 at 08:05, Alfredo Di Napoli
mailto:alfredo.dinapoli@gmail.com> wrote: Morning all,
*Richard*: sorry! Unfortunately MR !4798 is the cornerstone of this refactoring work but it's also gargantuan. Let's discuss a plan to attack it, but fundamentally there is a critical mass of changes that needs to happen atomically or it wouldn't make much sense, and alas this doesn't play in our favour when it comes to MR size and ease of review. However, to quickly reply to your remak: currently (for the sake of the "minimum-viable-product") I am trying to stabilise the external interfaces, by which I mean giving functions their final type signature while I do what's easiest to make things typecheck. In this phase what I think is the easiest is to wrap the majority of diagnostics into the `xxUnknownxx` constructor, and change them gradually later. A fair warning, though: you say "I would think that a DsMessage would later be wrapped in an envelope." This might be true for Ds messages (didn't actually invest any brain cycles to check that) but in general we have to turn a message into an envelope as soon as we have a chance to do so, because we need to grab the `SrcSpan` and the `DynFlags` *at the point of creation* of the diagnostics. Carrying around a message and make it bubble up at some random point won't be a good plan (even for Ds messages). Having said that, I clearly have very little knowledge about this area of GHC, so feel free to disagree :)
*John*: Although it's a bit hard to predict how well this is going to evolve, my current embedding, to refresh everyone's memory, is the following:
data DsMessage =
DsUnknownMessage !DiagnosticMessage
-- ^ Stop-gap constructor to ease the migration.
| DsLiftedTcRnMessage !TcRnMessage
-- ^ A diagnostic coming straight from the Typecheck-renamer.
-- More messages added in the future, of course
At first I thought this was the wrong way around, due to Simon's comment, but this actually creates pleasant external interfaces. To give you a bunch of examples from MR !4798:
deSugar :: HscEnv -> ModLocation -> TcGblEnv -> IO (Messages DsMessage, Maybe ModGuts)
deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages DsMessage, Maybe CoreExpr)
Note something interesting: the second function actually calls `runTcInteractive` inside the body, but thanks to the `DsLiftedTcRnMessage` we can still expose to the consumer an opaque `DsMessage` , which is what I would expect to see from a function called "deSugarExpr". Conversely, I would be puzzled to find those functions returning a `TcRnDsMessage`.
Having said all of that, I am not advocating this design is "the best". I am sure we will iterate on it. I am just reporting that even this baseline seems to be decent from an API perspective :)
On Wed, 31 Mar 2021 at 05:45, John Ericson
mailto:john.ericson@obsidian.systems wrote: Alfredo also replied to this pointing his embedding plan. I also prefer that, because I really wish TH didn't smear together the phases so much. Moreover, I hope with
- GHC proposals https://github.com/ghc-proposals/ghc-proposals/pull/412 https://github.com/ghc-proposals/ghc-proposals/pull/412 / https://github.com/ghc-proposals/ghc-proposals/pull/243 https://github.com/ghc-proposals/ghc-proposals/pull/243
- The parallelism work currently be planned in https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an... https://gitlab.haskell.org/ghc/ghc/-/wikis/Plan-for-increased-parallelism-an...
we might actually have an opportunity/extra motivation to do that. Splices and quotes will still induce intricate inter-phase dependencies, but I hope that could be mediated by the driver rather than just baked into each phase.
(One final step would be the "stuck macros" technique of https://www.youtube.com/watch?v=nUvKoG_V_U0 https://www.youtube.com/watch?v=nUvKoG_V_U0 / https://github.com/gelisam/klister https://github.com/gelisam/klister, where TH splices would be able to making "blocking queries" of the the compiler in ways that induce more of these fine-grained dependencies.)
Anyways, while we could also do a "RnTsDsError" and split later, I hope Alfredo's alternative of embedding won't be too much harder and prepare us for these exciting areas of exploration.
John
On 3/30/21 10:14 AM, Richard Eisenberg wrote:
On Mar 30, 2021, at 4:57 AM, Alfredo Di Napoli
mailto:alfredo.dinapoli@gmail.com> wrote: I'll explore the idea of adding a second IORef.
Renaming/type-checking is already mutually recursive. (The renamer must call the type-checker in order to rename -- that is, evaluate -- untyped splices. I actually can't recall why the type-checker needs to call the renamer.) So we will have a TcRnError. Now we see that the desugarer ends up mixed in, too. We could proceed how Alfredo suggests, by adding a second IORef. Or we could just make TcRnDsError (maybe renaming that).
What's the disadvantage? Clients will have to potentially know about all the different error forms with either approach (that is, using my combined type or using multiple IORefs). The big advantage to separating is maybe module dependencies? But my guess is that the dependencies won't be an issue here, due to the fact that these components are already leaning on each other. Maybe the advantage is just in having smaller types? Maybe.
I don't have a great sense as to what to do here, but I would want a clear reason that e.g. the TcRn monad would have two IORefs, while other monads will work with GhcMessage (instead of a whole bunch of IORefs).
Richard
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (4)
-
Alfredo Di Napoli
-
John Ericson
-
Richard Eisenberg
-
Simon Peyton Jones