TTG: Handling Source Locations

Hi devs, I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place. General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSour... https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSour... Phab diff: https://phabricator.haskell.org/D5036 https://phabricator.haskell.org/D5036 Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495 https://ghc.haskell.org/trac/ghc/ticket/15495 Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59... https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59... I see why this change is wanted and how the new version works. It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
foo :: LPat p -> ... foo (VarPat ...) = ...
Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work. However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons: 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard. 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later. So I'm uncomfortable with this direction of travel. Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above. Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs. Thanks, Richard

I wholly share this concern, which is why I commented on the Phab diff:
Does this rely on the caller to call dL on the pattern? Very fragile, let's not do that.
In addition, I'm worried about illegal states where we end up with
multiple nested levels of `NewPat`, and calling `dL` once is not
sufficient.
As to the better solution, I think we should just go with Solution B
from the Wiki page. Yes, it's somewhat more boilerplate, but it
guarantees to have locations in the right places for all nodes. The
main argument against it was that we'd have to define `type instance
XThing (GhcPass p) = SrcSpan` for many a `Thing`, but I don't see it
as a downside at all. We should do so anyway, to get rid of parsing
API annotations and put them in the AST proper.
All the best,
Vladislav
On Sat, Feb 9, 2019 at 7:19 PM Richard Eisenberg
Hi devs,
I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSour... Phab diff: https://phabricator.haskell.org/D5036 Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495 Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59...
I see why this change is wanted and how the new version works.
It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
foo :: LPat p -> ... foo (VarPat ...) = ...
Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons: 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard. 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
So I'm uncomfortable with this direction of travel.
Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
Thanks, Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

I just did this now, it was quite disconcerting that my code continued
to compile after applying `cL loc` to the return value of one of my
functions.
On Sat, Feb 9, 2019 at 5:40 PM Vladislav Zavialov
I wholly share this concern, which is why I commented on the Phab diff:
Does this rely on the caller to call dL on the pattern? Very fragile, let's not do that.
In addition, I'm worried about illegal states where we end up with multiple nested levels of `NewPat`, and calling `dL` once is not sufficient.
As to the better solution, I think we should just go with Solution B from the Wiki page. Yes, it's somewhat more boilerplate, but it guarantees to have locations in the right places for all nodes. The main argument against it was that we'd have to define `type instance XThing (GhcPass p) = SrcSpan` for many a `Thing`, but I don't see it as a downside at all. We should do so anyway, to get rid of parsing API annotations and put them in the AST proper.
All the best, Vladislav
On Sat, Feb 9, 2019 at 7:19 PM Richard Eisenberg
wrote: Hi devs,
I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSour... Phab diff: https://phabricator.haskell.org/D5036 Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495 Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59...
I see why this change is wanted and how the new version works.
It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
foo :: LPat p -> ... foo (VarPat ...) = ...
Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons: 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard. 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
So I'm uncomfortable with this direction of travel.
Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
Thanks, Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere. We can still say (Located t) in places where we want to guarantee a SrcSpan.
Yes, this lets us add more than one; that's redundant but not harmful.
Simon
| -----Original Message-----
| From: ghc-devs

One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere: a user-written .hs file, a GHCi command, or compiler-generated code (via TH or deriving). We should never omit this information from a node. And when we are writing code that consumes an AST, it always makes sense to ask what the provenance of a node is, for example to use it in an error message.
this lets us add more than one; that's redundant but not harmful
It goes against the philosophy of making illegal states irrepresentable. Now all code must be careful not to end up in an illegal state of nested SrcSpan, without any help from the typechecker. The code that pattern matches on an AST, at the same time, must be prepared to handle this case anyway (or else we risk to crash), which it currently does with stripSrcSpanPat in the implementation of dL. And having to remember to apply dL when matching on the AST is more trivia to learn and remember. Not even a warning if one forgets to do that, no appropriate place to explain this to new contributors (reading another Note just to start doing anything at all with the AST? unnecessary friction), and only a test failure at best in case of a mistake. My concrete proposal: let's just put SrcSpan in the extension fields of each node. In other words, take these lines type instance XVarPat (GhcPass _) = NoExt type instance XLazyPat (GhcPass _) = NoExt type instance XAsPat (GhcPass _) = NoExt type instance XParPat (GhcPass _) = NoExt type instance XBangPat (GhcPass _) = NoExt ... and replace them with type instance XVarPat (GhcPass _) = SrcSpan type instance XLazyPat (GhcPass _) = SrcSpan type instance XAsPat (GhcPass _) = SrcSpan type instance XParPat (GhcPass _) = SrcSpan type instance XBangPat (GhcPass _) = SrcSpan ... And don't bother with the HasSrcSpan class, don't define composeSrcSpan and decomposeSrcSpan. Very straightforward and beneficial for both producers and consumers of an AST. All the best, Vladislav

Hi Richard,
[Richard:] It seems to me, though, that this move makes us *less typed*. [and] However, worse, we might forget to *add* a location when downstream functions expect one.
We had a more sophisticated version of TTG that could support the ping-pong style (and hence typed tagging of locations), but it came at the price of more complicated encoding [0]. We have decided to abandon the more typed variant since tracking whether a node is located or not is inherently a dynamic/run-time process, not a static/compile-time process: there are some nodes that are generated in the process by the compiler by an arbitrary logic (hard to encode by types), hence have no location (in the source code). The types `LHsExpr`, `LPat`, and the like will be deleted! It will be all `HsExpr`, `Pat` and the like. Baking-in, e.g. `LHsExpr` into `HsExpr`, was a mistake in the first place: we were cheating using `Maybe` type anyway, when for example an `LHsExpr` was forcibly required but we had only `HsExpr` and used `noLoc`.
Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
We were really careful about the refactoring. The new code aside, I
don't see how we can introduce any bugs by the refactoring of the old
code explained in the wiki.
About the new code, the convention is straightforward: anytime you
destruct an AST node, assume a wrapper node inside (add an extra
case), or use the smart constructors/pattern synonyms.
I'd be happy to rediscuss the design space here. It would be great to
have everyone fully on board as it is not a trivial change.
/Shayan
[0] https://github.com/shayan-najd/HsAST/blob/master/Paper.pdf
On Sat, 9 Feb 2019 at 17:19, Richard Eisenberg
Hi devs,
I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSour... Phab diff: https://phabricator.haskell.org/D5036 Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495 Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59...
I see why this change is wanted and how the new version works.
It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
foo :: LPat p -> ... foo (VarPat ...) = ...
Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons: 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard. 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
So I'm uncomfortable with this direction of travel.
Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
Thanks, Richard _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call. One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style. Thanks, Richard

My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
I am not sure if I understand: shouldn't the totality checker warn if
there is no pattern for the wrapper constructor (hence enforce the
convention)?
On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg
On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
Thanks, Richard

[Richard:] I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
Depends on the semantics of `dL`: currently (for `Pat`) it returns the top-level `SrcSpan` and then the underlying node with all the inner wrappers stripped away. So one use of `dL` is enough in this semantic. (see https://github.com/ghc/ghc/blob/master/compiler/hsSyn/HsPat.hs#L341)
[Vlad:] As to the better solution, I think we should just go with Solution B from the Wiki page. [Richard:] I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favour.
It may help to identify at least three sorts of functions commonly
used *currently* in GHC when interacting with AST nodes (please add,
if I am missing some):
(a) those that ignore source locations;
(b) those that generically handle source locations regardless of the
constructor of the underlying node; and
(c) those that handle source locations case-by-case (often by nested
pattern matching).
The key issue with Solution B, as listed in the wiki, is that it ruins
the separation of two concerns in functions working on AST nodes:
handling source locations, and the actual logic of the function.
With the ping-pong style, handling of source locations is sometimes
refactored in a separate function, and with Solution A refactored in a
separate case/function clause.
With Solution B, however, every time we construct a node we should
have a source location ready to put into it.
That is, with Solution B, (a) and (b) are not cleanly implemented.
(I can explain more if not clear.)
/Shayan
On Tue, 12 Feb 2019 at 15:30, Shayan Najd
My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
I am not sure if I understand: shouldn't the totality checker warn if there is no pattern for the wrapper constructor (hence enforce the convention)?
On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg
wrote: On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
Thanks, Richard

That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic. There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here. Richard
On Feb 12, 2019, at 9:30 AM, Shayan Najd
wrote: My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
I am not sure if I understand: shouldn't the totality checker warn if there is no pattern for the wrapper constructor (hence enforce the convention)?
On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg
wrote: On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
Thanks, Richard

Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
With Solution A, there shouldn't be two functions `rnLPatAndThen` and `rnPatAndThen` anyways. There should be only `rnPatAndThen` with an extra case for the wrapper node.
There's also the problem that the pattern-match checker can't usefully look through view patterns.
Yes, I have reported it while back. I don't know of the progress in fixing this.
On Tue, 12 Feb 2019 at 16:24, Richard Eisenberg
That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here.
Richard
On Feb 12, 2019, at 9:30 AM, Shayan Najd
wrote: My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
I am not sure if I understand: shouldn't the totality checker warn if there is no pattern for the wrapper constructor (hence enforce the convention)?
On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg
wrote: On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
Thanks, Richard

While we're on the topic, is there any plan to get rid of all those panics? AFAICS they are entirely unnecessary: we should just use an empty datatype for unused constructor extension points, then we can eliminate it to get whatever we like. See #15247. Adam On 12/02/2019 15:40, Shayan Najd wrote:
Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
With Solution A, there shouldn't be two functions `rnLPatAndThen` and `rnPatAndThen` anyways. There should be only `rnPatAndThen` with an extra case for the wrapper node.
There's also the problem that the pattern-match checker can't usefully look through view patterns.
Yes, I have reported it while back. I don't know of the progress in fixing this.
On Tue, 12 Feb 2019 at 16:24, Richard Eisenberg
wrote: That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here.
Richard
On Feb 12, 2019, at 9:30 AM, Shayan Najd
wrote: My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
I am not sure if I understand: shouldn't the totality checker warn if there is no pattern for the wrapper constructor (hence enforce the convention)?
On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg
wrote: On Feb 12, 2019, at 5:19 AM, Shayan Najd
wrote: About the new code, the convention is straightforward: anytime you destruct an AST node, assume a wrapper node inside (add an extra case), or use the smart constructors/pattern synonyms.
Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs
wrote: One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
We can still say (Located t) in places where we want to guarantee a SrcSpan.
This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
Yes, this lets us add more than one; that's redundant but not harmful.
I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov
wrote: I claim an SrcSpan makes sense everywhere, so this is not a useful distinction. Think about it as code provenance, an AST node always comes from somewhere
I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
My concrete proposal: let's just put SrcSpan in the extension fields of each node
I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
data Phase = Parsed | Renamed
data Exp p = Node (XNode p) Int
type family XNode (p :: Phase) type instance XNode p = NodeExt p
data NodeExt p where NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
type family RenamedOnly p t where RenamedOnly Parsed _ = () RenamedOnly Renamed t = t
example :: Exp Parsed example = Node (NodeExt { flag = True, fvs = () }) 5
rename :: Exp Parsed -> Exp Renamed rename (Node ext n) = Node (ext { fvs = "xyz" }) n
Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
Thanks, Richard
-- Adam Gundry, Haskell Consultant Well-Typed LLP, https://www.well-typed.com/
participants (6)
-
Adam Gundry
-
Matthew Pickering
-
Richard Eisenberg
-
Shayan Najd
-
Simon Peyton Jones
-
Vladislav Zavialov