HIE Files

Zubin Duggal

14 May 2018 14 May '18

12:31 p.m.

Hello, I will be working on a GSOC project that will allow GHC to output a new .hie file to be written next to .hi files. It will contain information about the typechecked Haskell AST, allowing tooling(like haddocks --hyperlinked-source and haskell-ide-engine) to work without having to parse, rename and typecheck files all over again. I have made a GHC wiki page containing more details here: https://ghc.haskell.org/trac/ghc/wiki/HIEFiles Looking forward to any comments and suggestions. Thanks, Zubin.

Attachments:

attachment.html (text/html — 696 bytes)

Show replies by date

Simon Peyton Jones

14 May 14 May

1:30 p.m.

Interesting. Please do keep the wiki page up to date so that it accurately describes the current design. For example, I hope you’ll flesh out what a “simplified, source aware, annotated AST derived from the Renamed/Typechecked Source” really is. Why not put the .hie-file info into the .hi file? (Optionally, of course.) What tools/libraries do you plan to produce to allow clients to read a .hie file and make send of the contents? Simon From: ghc-devs On Behalf Of Zubin Duggal Sent: 14 May 2018 13:32 To: ghc-devs@haskell.org Cc: Joachim Breitner ; Gershom B Subject: HIE Files Hello, I will be working on a GSOC project that will allow GHC to output a new .hie file to be written next to .hi files. It will contain information about the typechecked Haskell AST, allowing tooling(like haddocks --hyperlinked-source and haskell-ide-engine) to work without having to parse, rename and typecheck files all over again. I have made a GHC wiki page containing more details here: https://ghc.haskell.org/trac/ghc/wiki/HIEFiles Looking forward to any comments and suggestions. Thanks, Zubin.

Peter Podlovics

8:56 p.m.

Hi, Sometimes, when working with a type-checked AST, it can be useful to know the types of subexpressions as well. A simple use case would be any kind of static analysis of the code. Currently, the type-checker discards all intermediate results after the type checking ends, so the AST only has info about nodes with names. Storing the types of subexpressions somewhere would be a great benefit for many tools. Do you plan on including these intermediate results in the HIE file? Regards, Peter On Mon, May 14, 2018 at 3:30 PM, Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:

...

Interesting.

Please do keep the wiki page up to date so that it accurately describes the current design. For example, I hope you’ll flesh out what a “simplified, source aware, annotated AST derived from the Renamed/Typechecked Source” really is.

Why not put the .hie-file info into the .hi file? (Optionally, of course.)

What tools/libraries do you plan to produce to allow clients to read a .hie file and make send of the contents?

Simon

*From:* ghc-devs *On Behalf Of *Zubin Duggal *Sent:* 14 May 2018 13:32 *To:* ghc-devs@haskell.org *Cc:* Joachim Breitner ; Gershom B < gershomb@gmail.com> *Subject:* HIE Files

Hello,

I will be working on a GSOC project that will allow GHC to output a new .hie file to be written next to .hi files. It will contain information about the typechecked Haskell AST, allowing tooling(like haddocks --hyperlinked-source and haskell-ide-engine) to work without having to parse, rename and typecheck files all over again.

I have made a GHC wiki page containing more details here:

https://ghc.haskell.org/trac/ghc/wiki/HIEFiles

Looking forward to any comments and suggestions.

Thanks,

Zubin.

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Gershom B

15 May 15 May

6:27 a.m.

On Mon, May 14, 2018 at 9:30 AM, Simon Peyton Jones wrote:

...

Why not put the .hie-file info into the .hi file? (Optionally, of course.)

Simon, I'm curious what benefits you think we might get from this? (I'm one of the mentors on this GSoC project btw).

...

What tools/libraries do you plan to produce to allow clients to read a .hie file and make send of the contents?

For GSoC as a proof of concept the idea is to teach haddock's hyperlinked-source backend to use this information to add type-annotation-on-hover to the colorized, hyperlinked, html source. I think what is anticipated more broadly is that other tools like the Haskell IDE Engine (which Zubin has contributed to in the past) will also be able to make use of these files to provide ide and tooling features in a more lightweight way than needing to directly interface with the GHC API. (This by the way is one of the key benefits of keeping the file separate from standard hi files -- it should be parseable and consumable without needing to link in GHC). -g

Simon Peyton Jones

8:42 a.m.

| > Why not put the .hie-file info into the .hi file? (Optionally, of | > course.) | > | | Simon, I'm curious what benefits you think we might get from this? | (I'm one of the mentors on this GSoC project btw). Well, I've always thought that we should really put the .hi file into the .o file! Having two files risks getting things out of sync, and three makes that worse. The file is just a place to keep a blob of info. What's the motivation for having two .hie as well as .hi? | | > What tools/libraries do you plan to produce to allow clients to read | a .hie file and make send of the contents? | | For GSoC as a proof of concept the idea is to teach haddock's | hyperlinked-source backend to use this information to add type- | annotation-on-hover to the colorized, hyperlinked, html source. That's great. But would it not be good to offer a library, with a well-defined API, that allows a client (including Haddock) to parse those .hie files into syntax trees or whatever? You'll need to do that to allow the haddock thing you describe -- and it'd be much better to make the parser (and doubtless lots of utility function like finding things in the tree) available to any client not just haddock. And that in turn raises the questions of WHAT syntax tree. HsSyn? Template Haskell? Haskell-src-exts? Or something new? Shayan and Alan are busy parameterising HsSyn to make it non-GHC-specific, and directly usable for this kind of endeavour ("Trees that grow"). It'd be great to build on their work. | with the GHC API. (This by the way is one of the key benefits of | keeping the file separate from standard hi files -- it should be | parseable and consumable without needing to link in GHC). Yes, not linking in GHC is a reasonable goal; but having two files and file formats is not a necessary consequence of that goal. Nothing stops us making a library to parse .hi files -- indeed the entire iface/ directory in GHC is quite well separated for that precise purpose. None of this is to criticise the plan. I think it's a great idea to make more info more readily available to more tools. I'm just poking at it a bit 😊. Simon

Zubin Duggal

9:13 a.m.

...

And that in turn raises the questions of WHAT syntax tree. HsSyn? Template Haskell? Haskell-src-exts? Or something new? Shayan and Alan are busy parameterising HsSyn to make it non-GHC-specific, and directly usable for this kind of endeavour ("Trees that grow"). It'd be great to build on their work.

Mainly, we need information on every Token that appears in the original source. My plan is to further group Tokens into a simple rose-tree based on how they occur in HsSyn. We intentionally want to avoid capturing too much information so the format doesn't change much with changes to the GHC AST. I've made a file describing roughly what the data structures involved should look like https://gist.github.com/wz1000/edf14747bd890b08c01c226d5bc6a1d6 The plan is to group the Tokens together in a tree in way similar to what structured-haskell-mode does. (The gifs in the following link might provide some idea) https://github.com/chrisdone/structured-haskell-mode/ For example, here is what structured-haskell-mode outputs for a small snippet of code: https://gist.github.com/wz1000/db42d4f533ba7d2345934906b312f743 We want something similar for the HIE AST, but grouped into a tree, where each node(roughly corresponding to HsSyn constructors) points to all the subnodes and tokens it spans over. That's great. But would it not be good to offer a library, with a

...

well-defined API, that allows a client (including Haddock) to parse those .hie files into syntax trees or whatever? You'll need to do that to allow the haddock thing you describe -- and it'd be much better to make the parser (and doubtless lots of utility function like finding things in the tree) available to any client not just haddock.

Yes, a library to consume these files is definitely something we need, and I believe it will grow out naturally as we work out the integration with haddock and haskell-ide-engine. On 15 May 2018 at 14:12, Simon Peyton Jones wrote:

...

| > Why not put the .hie-file info into the .hi file? (Optionally, of | > course.) | > | | Simon, I'm curious what benefits you think we might get from this? | (I'm one of the mentors on this GSoC project btw).

Well, I've always thought that we should really put the .hi file into the .o file! Having two files risks getting things out of sync, and three makes that worse. The file is just a place to keep a blob of info. What's the motivation for having two .hie as well as .hi?

| | > What tools/libraries do you plan to produce to allow clients to read | a .hie file and make send of the contents? | | For GSoC as a proof of concept the idea is to teach haddock's | hyperlinked-source backend to use this information to add type- | annotation-on-hover to the colorized, hyperlinked, html source.

That's great. But would it not be good to offer a library, with a well-defined API, that allows a client (including Haddock) to parse those .hie files into syntax trees or whatever? You'll need to do that to allow the haddock thing you describe -- and it'd be much better to make the parser (and doubtless lots of utility function like finding things in the tree) available to any client not just haddock.

And that in turn raises the questions of WHAT syntax tree. HsSyn? Template Haskell? Haskell-src-exts? Or something new? Shayan and Alan are busy parameterising HsSyn to make it non-GHC-specific, and directly usable for this kind of endeavour ("Trees that grow"). It'd be great to build on their work.

| with the GHC API. (This by the way is one of the key benefits of | keeping the file separate from standard hi files -- it should be | parseable and consumable without needing to link in GHC).

Yes, not linking in GHC is a reasonable goal; but having two files and file formats is not a necessary consequence of that goal. Nothing stops us making a library to parse .hi files -- indeed the entire iface/ directory in GHC is quite well separated for that precise purpose.

None of this is to criticise the plan. I think it's a great idea to make more info more readily available to more tools. I'm just poking at it a bit 😊.

Simon

Simon Peyton Jones

9:19 a.m.

Mainly, we need information on every Token that appears in the original source. Good idea. Alan Zimmerman’s exact-print stuff has precisely that goal, I believe. So it’d be worth talking to him; perhaps by working together you can make much more rapid progress. Or not – but a conversation would be helpful in any case. I’m very happy to see more attention and effort being devoted to this space. Thank you! Simon From: Zubin Duggal Sent: 15 May 2018 10:13 To: Simon Peyton Jones Cc: Gershom B ; ghc-devs@haskell.org; Joachim Breitner ; Shayan Najd ; Alan & Kim Zimmerman Subject: Re: HIE Files And that in turn raises the questions of WHAT syntax tree. HsSyn? Template Haskell? Haskell-src-exts? Or something new? Shayan and Alan are busy parameterising HsSyn to make it non-GHC-specific, and directly usable for this kind of endeavour ("Trees that grow"). It'd be great to build on their work. Mainly, we need information on every Token that appears in the original source. My plan is to further group Tokens into a simple rose-tree based on how they occur in HsSyn. We intentionally want to avoid capturing too much information so the format doesn't change much with changes to the GHC AST. I've made a file describing roughly what the data structures involved should look like https://gist.github.com/wz1000/edf14747bd890b08c01c226d5bc6a1d6 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fwz1000%2Fedf14747bd890b08c01c226d5bc6a1d6&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949664358&sdata=PR4Y9%2FYfXl5ubStTbKXRkmtosP%2Fn9GiXRhZrfokwfZY%3D&reserved=0 The plan is to group the Tokens together in a tree in way similar to what structured-haskell-mode does. (The gifs in the following link might provide some idea) https://github.com/chrisdone/structured-haskell-mode/https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchrisdone%2Fstructured-haskell-mode%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949674367&sdata=sQzFKVRTcL82CNIxFi2A7fbdP4zyzReXPM1kkUoiaN0%3D&reserved=0 For example, here is what structured-haskell-mode outputs for a small snippet of code: https://gist.github.com/wz1000/db42d4f533ba7d2345934906b312f743 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fwz1000%2Fdb42d4f533ba7d2345934906b312f743&data=02%7C01%7Csimonpj%40microsoft.com%7C5801941ba48648e4206008d5ba441605%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636619723949684376&sdata=e9OoLIRFfa2JCmLAf99LfT1Sqc8UFMDN2lIIu10QVBg%3D&reserved=0 We want something similar for the HIE AST, but grouped into a tree, where each node(roughly corresponding to HsSyn constructors) points to all the subnodes and tokens it spans over. That's great. But would it not be good to offer a library, with a well-defined API, that allows a client (including Haddock) to parse those .hie files into syntax trees or whatever? You'll need to do that to allow the haddock thing you describe -- and it'd be much better to make the parser (and doubtless lots of utility function like finding things in the tree) available to any client not just haddock. Yes, a library to consume these files is definitely something we need, and I believe it will grow out naturally as we work out the integration with haddock and haskell-ide-engine. On 15 May 2018 at 14:12, Simon Peyton Jones mailto:simonpj@microsoft.com> wrote: | > Why not put the .hie-file info into the .hi file? (Optionally, of | > course.) | > | | Simon, I'm curious what benefits you think we might get from this? | (I'm one of the mentors on this GSoC project btw). Well, I've always thought that we should really put the .hi file into the .o file! Having two files risks getting things out of sync, and three makes that worse. The file is just a place to keep a blob of info. What's the motivation for having two .hie as well as .hi? | | > What tools/libraries do you plan to produce to allow clients to read | a .hie file and make send of the contents? | | For GSoC as a proof of concept the idea is to teach haddock's | hyperlinked-source backend to use this information to add type- | annotation-on-hover to the colorized, hyperlinked, html source. That's great. But would it not be good to offer a library, with a well-defined API, that allows a client (including Haddock) to parse those .hie files into syntax trees or whatever? You'll need to do that to allow the haddock thing you describe -- and it'd be much better to make the parser (and doubtless lots of utility function like finding things in the tree) available to any client not just haddock. And that in turn raises the questions of WHAT syntax tree. HsSyn? Template Haskell? Haskell-src-exts? Or something new? Shayan and Alan are busy parameterising HsSyn to make it non-GHC-specific, and directly usable for this kind of endeavour ("Trees that grow"). It'd be great to build on their work. | with the GHC API. (This by the way is one of the key benefits of | keeping the file separate from standard hi files -- it should be | parseable and consumable without needing to link in GHC). Yes, not linking in GHC is a reasonable goal; but having two files and file formats is not a necessary consequence of that goal. Nothing stops us making a library to parse .hi files -- indeed the entire iface/ directory in GHC is quite well separated for that precise purpose. None of this is to criticise the plan. I think it's a great idea to make more info more readily available to more tools. I'm just poking at it a bit 😊. Simon

2607

Age (days ago)

2608

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Gershom B
Peter Podlovics
Simon Peyton Jones
Zubin Duggal

HIE Files

Zubin Duggal

Simon Peyton Jones

Peter Podlovics

Gershom B

Simon Peyton Jones

Zubin Duggal

Simon Peyton Jones

tags

participants (4)