
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST. Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream. The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated. In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3]. The motivation for this change is then 1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers. 2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves. 3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4]. I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way. Regards Alan [1] https://phabricator.haskell.org/D157 [2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H... [3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H... [4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...

In general I’m fine with this direction of travel. Some specifics: · You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely. Simon From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST. Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream. The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated. In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3]. The motivation for this change is then 1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers. 2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves. 3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4]. I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way. Regards Alan [1] https://phabricator.haskell.org/D157 [2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H... [3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H... [4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...

For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do"
BlockSrcSpans
(HsStmtContext Name) -- The parameterisation is unimportant
-- because in this context we never use
-- the PatGuard or ParStmt variant
[ExprLStmt id] -- "do":one or more stmts
PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level
... -- stuff to track the appearance of any semicolons
| BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

This does have the advantage of being explicit. I modelled the initial
proposal on HSE as a proven solution, and I think that they were trying to
keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind
of information throughout the AST. For someone interested in
source-to-source conversions (like me) this is great, others may find it
intrusive.
The other question, which is probably orthogonal to this, is whether we
want the annotation to be a parameter to the AST, which allows it to be
overridden by various tools for various purposes, or fixed as in Richard's
suggestion.
A parameterised annotation allows the annotations to be manipulated via
something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the
annotations.
class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node.
ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation
of the node itself is affected, and not
-- the annotations of any child nodes. if all nodes in the AST tree are
to be affected, use fmap.
amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do" BlockSrcSpans (HsStmtContext Name) -- The parameterisation is unimportant -- because in this context we never use -- the PatGuard or ParStmt variant [ExprLStmt id] -- "do":one or more stmts PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level ... -- stuff to track the appearance of any semicolons | BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
wrote: In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I have started capturing the discussion here
https://ghc.haskell.org/trac/ghc/wiki/GhcAstAnnotations.
On Thu, Aug 28, 2014 at 8:34 PM, Alan & Kim Zimmerman
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the annotations. class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node. ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not -- the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap. amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
wrote: For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do" BlockSrcSpans (HsStmtContext Name) -- The parameterisation is unimportant -- because in this context we never use -- the PatGuard or ParStmt variant [ExprLStmt id] -- "do":one or more stmts PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level ... -- stuff to track the appearance of any semicolons | BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
wrote: In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
It’s probably not too bad if you use record syntax; thus
| HsDo { hsdo_do_loc :: SrcSpan -- of the word "do"
, hsdo_blocks :: BlockSrcSpans
, hsdo_ctxt :: HsStmtContext Name
, hsdo_stmts :: [ExprLStmt id]
, hsdo_type :: PostTcType }
Simon
From: Alan & Kim Zimmerman [mailto:alan.zimm@gmail.com]
Sent: 28 August 2014 19:35
To: Richard Eisenberg
Cc: Simon Peyton Jones; ghc-devs@haskell.org
Subject: Re: GHC AST Annotations
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the annotations.
class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node.
ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not
-- the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap.
amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.orgmailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

A further use case would be to be able to convert all the locations to be
relative, or include a relative portion, so that as tools manipulate the
AST by adding or removing parts the layout can be preserved.
I think I may need to make a wip branch for this and experiment, it is
always easier to comment on concrete things.
Alan
On Thu, Aug 28, 2014 at 10:38 PM, Simon Peyton Jones
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
It’s probably not too bad if you use record syntax; thus
| HsDo { hsdo_do_loc :: SrcSpan -- of the word "do"
, hsdo_blocks :: BlockSrcSpans
, hsdo_ctxt :: HsStmtContext Name
, hsdo_stmts :: [ExprLStmt id]
, hsdo_type :: PostTcType }
Simon
*From:* Alan & Kim Zimmerman [mailto:alan.zimm@gmail.com] *Sent:* 28 August 2014 19:35 *To:* Richard Eisenberg *Cc:* Simon Peyton Jones; ghc-devs@haskell.org *Subject:* Re: GHC AST Annotations
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the annotations. class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node. ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not -- the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap.
amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
wrote: For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do" BlockSrcSpans (HsStmtContext Name) -- The parameterisation is unimportant -- because in this context we never use -- the PatGuard or ParStmt variant [ExprLStmt id] -- "do":one or more stmts PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level ... -- stuff to track the appearance of any semicolons | BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
wrote: In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Since Alan is trying to do something for HaRe that I want for HLint on
top of haskell-src-exts, he asked me for my opinions on the proposal.
There seem to be two approaches to take:
* Add SrcSpan's throughout. The HSE approach of having a list of inner
source spans is nasty - the details of which source space goes where
is entirely undocumented and hard to discover. Even worse, for things
like instance, which may or may not have a where after, the number of
inner SrcSpan's changes. Simon's idea of hsdo_do_loc is much cleaner,
and easily extends to Maybe SrcSpan if the keyword is optional.
* Having the annotation be a type parameter gives much greater
flexibility. In particular, it would let you mark certain nodes as
being added/deleted. However, since SrcSpan has an Int in it, you can
always pass around a separate IntMap and make the SrcSpan really be an
index into more detailed information. It's nasty, but only the people
who use it pay for it.
Both approaches have disadvantages. You could always combine both
ideas, and have a SrcSpan and entirely separately an annotation (which
defaults to (), rather than SrcSpanInfo), but maybe that's too much
extra baggage on the AST.
Thanks, Neil
On Sat, Aug 30, 2014 at 3:32 PM, Alan & Kim Zimmerman
A further use case would be to be able to convert all the locations to be relative, or include a relative portion, so that as tools manipulate the AST by adding or removing parts the layout can be preserved.
I think I may need to make a wip branch for this and experiment, it is always easier to comment on concrete things.
Alan
On Thu, Aug 28, 2014 at 10:38 PM, Simon Peyton Jones
wrote: I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
It’s probably not too bad if you use record syntax; thus
| HsDo { hsdo_do_loc :: SrcSpan -- of the word "do"
, hsdo_blocks :: BlockSrcSpans
, hsdo_ctxt :: HsStmtContext Name
, hsdo_stmts :: [ExprLStmt id]
, hsdo_type :: PostTcType }
Simon
From: Alan & Kim Zimmerman [mailto:alan.zimm@gmail.com] Sent: 28 August 2014 19:35 To: Richard Eisenberg Cc: Simon Peyton Jones; ghc-devs@haskell.org Subject: Re: GHC AST Annotations
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the annotations. class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node. ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not -- the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap.
amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
wrote: For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do" BlockSrcSpans (HsStmtContext Name) -- The parameterisation is unimportant -- because in this context we never use -- the PatGuard or ParStmt variant [ExprLStmt id] -- "do":one or more stmts PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level ... -- stuff to track the appearance of any semicolons | BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
wrote: In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I have created https://ghc.haskell.org/trac/ghc/ticket/9628 for this, and
have decided to first tackle adding a type parameter to the entire AST, so
that tool writers can add custom information as required.
My first stab at this is to do is as follows
```
data HsModule r name
= HsModule {
ann :: r, -- ^ Annotation for external tool writers
hsmodName :: Maybe (Located ModuleName),
-- ^ @Nothing@: \"module X where\" is omitted (in which case the
next
-- field is Nothing too)
hsmodExports :: Maybe [LIE name],
....
```
Salient points
1. It comes as the first type parameter, and is called r
2. It gets added as the first field of the syntax element
3. It is always called ann
Before undertaking this particular change, I would appreciate some feedback.
Regards
Alan
On Thu, Aug 28, 2014 at 8:34 PM, Alan & Kim Zimmerman
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:
-- |AST nodes are annotated, and this class allows manipulation of the annotations. class Functor ast => Annotated ast where
-- |Retrieve the annotation of an AST node. ann :: ast l -> l
-- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not -- the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap. amap :: (l -> l) -> ast l -> ast l
Alan
On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg
wrote: For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:
| HsDo SrcSpan -- of the word "do" BlockSrcSpans (HsStmtContext Name) -- The parameterisation is unimportant -- because in this context we never use -- the PatGuard or ParStmt variant [ExprLStmt id] -- "do":one or more stmts PostTcType -- Type of the whole expression
...
data BlockSrcSpans = LayoutBlock Int -- the parameter is the indentation level ... -- stuff to track the appearance of any semicolons | BracesBlock ... -- stuff to track the braces and semicolons
The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
Popping up a level, I do support the idea of including this info in the AST.
Richard
On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones
wrote: In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Dear Alan,
Nice going and thanks for undertaking yet another useful AST transformation!
A few thoughts (do with them as you see fit):
- Always called "ann"; doesn't this require OverloadedRecordFields? You're in danger of delaying your modification (scheduled to land in 7.10). Other than that, as before, from a design perspective: yes please.
- In terms of presentation/comments; when I first started looking at (i.e. traversing, selectively printing etc.) the AST, I was always really annoyed that every child in the tree has one extra step of indirection, due to the location annotations being "L loc thing", as opposed to a loc-field as part of the thing. I would simply call it annotation (no talk of external tool writers). In time, I hope GHC-annotations also move to that field.
Regards,
Philip
________________________________
From: Alan & Kim Zimmerman
In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.orgmailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

- In terms of presentation/comments; when I first started looking at (i.e. traversing, selectively printing etc.) the AST, I was always really annoyed that every child in the tree has one extra step of indirection, due to the location annotations being "L loc thing", as opposed to a loc-field as part of the thing. I would simply call it annotation (no talk of external tool writers). In time, I hope GHC-annotations also move to that field.
Replacing the (L loc thing) story by adding a location field to every single data constructor of HsSyn would be entirely possible. But it would mean adding a lot of extra fields. I don’t have a strong opinion either way, but other clients of the GHC API would be affected.
What we can’t do is have both the (L loc thing) and an extra field!
- Always called "ann"; doesn't this require OverloadedRecordFields? You're in danger of delaying your modification (scheduled to land in 7.10). Other than that, as before, from a design perspective: yes please.
Tiresomely it is indeed the case that (for now anyway) the field would need a different name in each data type.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of p.k.f.holzenspies@utwente.nl
Sent: 26 September 2014 09:08
To: alan.zimm@gmail.com; eir@cis.upenn.edu
Cc: ghc-devs@haskell.org
Subject: RE: GHC AST Annotations
Dear Alan,
Nice going and thanks for undertaking yet another useful AST transformation!
A few thoughts (do with them as you see fit):
- Always called "ann"; doesn't this require OverloadedRecordFields? You're in danger of delaying your modification (scheduled to land in 7.10). Other than that, as before, from a design perspective: yes please.
- In terms of presentation/comments; when I first started looking at (i.e. traversing, selectively printing etc.) the AST, I was always really annoyed that every child in the tree has one extra step of indirection, due to the location annotations being "L loc thing", as opposed to a loc-field as part of the thing. I would simply call it annotation (no talk of external tool writers). In time, I hope GHC-annotations also move to that field.
Regards,
Philip
________________________________
From: Alan & Kim Zimmerman
In general I’m fine with this direction of travel. Some specifics:
· You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities” · Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree? Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
Simon
From: ghc-devs [mailto:ghc-devs-bounces@haskell.orgmailto:ghc-devs-bounces@haskell.org] On Behalf Of Alan & Kim Zimmerman Sent: 28 August 2014 15:00 To: ghc-devs@haskell.orgmailto:ghc-devs@haskell.org Subject: GHC AST Annotations
Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the syntactic markers, depending on the particular AST fragment being annotated.
In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
The motivation for this change is then
1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
Regards
Alan
[1] https://phabricator.haskell.org/D157
[2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
[4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-H...
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.orgmailto:ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (5)
-
Alan & Kim Zimmerman
-
Neil Mitchell
-
p.k.f.holzenspies@utwente.nl
-
Richard Eisenberg
-
Simon Peyton Jones