Tracking down instances from use-sites

Christopher Done

26 Jun 2018 26 Jun '18

3:40 p.m.

Hi all, Given a TypecheckedModule, what's the most direct way given a Var expression retrieved from the AST, to determine: 1) that it's a class method e.g. `read` 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String` 3) or if it's a resolved instance, then which instance is it and which package, module and declaration is that defined in? Starting with this file that has a TypecheckedModule in it: https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-... I presume at this point that instance resolution has taken place. I'm not sure that dictionaries or chosen instances are inserted into the AST, or whether just the resolved types are inserted e.g. `Int -> String`, where I want e.g. `Read Int`, which might lead me to finding the matching instance from an InstEnv or so. I'd like to do some analyses of Haskell codebases, and the fact that calls to class methods are opaque is a bit of a road-blocker. Any handy tips? Prior work? It'd be neat in tooling to just hit a goto-definition key on `read` and be taken to the instance implementation rather than the class definition. Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not. Any tips appreciated, Chris

Show replies by date

Matthew Pickering

26 Jun 26 Jun

4:18 p.m.

Chris, I have also considered this question. 1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`. 2,3, When a class selector `foo` is typechecked, the instance information is of course resolved. The selector `foo` is then wrapped in a `HsWrapper` which when desugared will apply the type arguments and dictionary arguments. Thus, in order to understand what instance has been selected, we need to look into the `HsWrapper`. In particular, one of the constructors is the `WpEvApp` constructor which is what will apply the dictionary argument. In case 2, this will be a type variable. In case 3, this will be the dictionary variable. I'm not sure how to distinguish these two cases easily. Then once you have the dictionary id, you can use `idType` to get the type of the dictionary which will be something like `Show ()` in order to tell you which instance was selected. You can inspect the AST of a typechecked program using the `-ddump-tc-ast` flag. Finally, you should considering writing this as a source plugin rather than using the GHC API as it will be easier to run in a variety of different scenarios. Cheers, Matt On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done wrote:

...

Hi all,

Given a TypecheckedModule, what's the most direct way given a Var expression retrieved from the AST, to determine:

1) that it's a class method e.g. `read` 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String` 3) or if it's a resolved instance, then which instance is it and which package, module and declaration is that defined in?

Starting with this file that has a TypecheckedModule in it: https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-...

I presume at this point that instance resolution has taken place. I'm not sure that dictionaries or chosen instances are inserted into the AST, or whether just the resolved types are inserted e.g. `Int -> String`, where I want e.g. `Read Int`, which might lead me to finding the matching instance from an InstEnv or so.

I'd like to do some analyses of Haskell codebases, and the fact that calls to class methods are opaque is a bit of a road-blocker. Any handy tips? Prior work?

It'd be neat in tooling to just hit a goto-definition key on `read` and be taken to the instance implementation rather than the class definition.

Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not.

Any tips appreciated,

Chris _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Christopher Done

5:04 p.m.

...

The selector `foo` is then wrapped in a `HsWrapper` which when desugared will apply the type arguments and dictionary arguments.

Nice! I'll give this a try and report back. Thanks.

...

Finally, you should considering writing this as a source plugin rather than using the GHC API as it will be easier to run in a variety of different scenarios.

It took me a few minutes to find what you meant. For posterity, I think "frontend plugins" is the name: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_gh... That sounds like a good idea. This is the first time I've seen this feature of GHC. Cheers! On Tue, 26 Jun 2018 at 17:19, Matthew Pickering wrote:

...

Chris,

I have also considered this question.

1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`. 2,3,

When a class selector `foo` is typechecked, the instance information is of course resolved. The selector `foo` is then wrapped in a `HsWrapper` which when desugared will apply the type arguments and dictionary arguments. Thus, in order to understand what instance has been selected, we need to look into the `HsWrapper`. In particular, one of the constructors is the `WpEvApp` constructor which is what will apply the dictionary argument. In case 2, this will be a type variable. In case 3, this will be the dictionary variable. I'm not sure how to distinguish these two cases easily. Then once you have the dictionary id, you can use `idType` to get the type of the dictionary which will be something like `Show ()` in order to tell you which instance was selected.

You can inspect the AST of a typechecked program using the `-ddump-tc-ast` flag.

Finally, you should considering writing this as a source plugin rather than using the GHC API as it will be easier to run in a variety of different scenarios.

Cheers,

Matt

On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done wrote:

...
Hi all,

Given a TypecheckedModule, what's the most direct way given a Var expression retrieved from the AST, to determine:

1) that it's a class method e.g. `read` 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String` 3) or if it's a resolved instance, then which instance is it and which package, module and declaration is that defined in?

Starting with this file that has a TypecheckedModule in it: https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-...

I presume at this point that instance resolution has taken place. I'm not sure that dictionaries or chosen instances are inserted into the AST, or whether just the resolved types are inserted e.g. `Int -> String`, where I want e.g. `Read Int`, which might lead me to finding the matching instance from an InstEnv or so.

I'd like to do some analyses of Haskell codebases, and the fact that calls to class methods are opaque is a bit of a road-blocker. Any handy tips? Prior work?

It'd be neat in tooling to just hit a goto-definition key on `read` and be taken to the instance implementation rather than the class definition.

Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not.

Any tips appreciated,

Chris _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Matthew Pickering

5:17 p.m.

Sorry, they are not "frontend plugins" but a new feature that will be in GHC 8.6. They are an implementation of this GHC proposal. https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0017-so... There is also this thread last year about the same topic which Simon answered in the same way that I did but you may find either explanation more useful. https://mail.haskell.org/pipermail/ghc-devs/2017-October/014826.html Cheers, Matt On Tue, Jun 26, 2018 at 6:04 PM, Christopher Done wrote:

...

...
The selector `foo` is then wrapped in a `HsWrapper` which when desugared will apply the type arguments and dictionary arguments.

Nice! I'll give this a try and report back. Thanks.

...
Finally, you should considering writing this as a source plugin rather than using the GHC API as it will be easier to run in a variety of different scenarios.

It took me a few minutes to find what you meant. For posterity, I think "frontend plugins" is the name: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_gh...

That sounds like a good idea. This is the first time I've seen this feature of GHC.

Cheers!

On Tue, 26 Jun 2018 at 17:19, Matthew Pickering wrote:

...
Chris,

I have also considered this question.

1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`. 2,3,

When a class selector `foo` is typechecked, the instance information is of course resolved. The selector `foo` is then wrapped in a `HsWrapper` which when desugared will apply the type arguments and dictionary arguments. Thus, in order to understand what instance has been selected, we need to look into the `HsWrapper`. In particular, one of the constructors is the `WpEvApp` constructor which is what will apply the dictionary argument. In case 2, this will be a type variable. In case 3, this will be the dictionary variable. I'm not sure how to distinguish these two cases easily. Then once you have the dictionary id, you can use `idType` to get the type of the dictionary which will be something like `Show ()` in order to tell you which instance was selected.

You can inspect the AST of a typechecked program using the `-ddump-tc-ast` flag.

Finally, you should considering writing this as a source plugin rather than using the GHC API as it will be easier to run in a variety of different scenarios.

Cheers,

Matt

On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done wrote:

...
Hi all,

Given a TypecheckedModule, what's the most direct way given a Var expression retrieved from the AST, to determine:

1) that it's a class method e.g. `read` 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String` 3) or if it's a resolved instance, then which instance is it and which package, module and declaration is that defined in?

Starting with this file that has a TypecheckedModule in it: https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-...

I presume at this point that instance resolution has taken place. I'm not sure that dictionaries or chosen instances are inserted into the AST, or whether just the resolved types are inserted e.g. `Int -> String`, where I want e.g. `Read Int`, which might lead me to finding the matching instance from an InstEnv or so.

I'd like to do some analyses of Haskell codebases, and the fact that calls to class methods are opaque is a bit of a road-blocker. Any handy tips? Prior work?

It'd be neat in tooling to just hit a goto-definition key on `read` and be taken to the instance implementation rather than the class definition.

Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not.

Any tips appreciated,

Chris _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Ben Gamari

5:21 p.m.

Christopher Done writes:

...

Hi all,

Given a TypecheckedModule, what's the most direct way given a Var expression retrieved from the AST, to determine:

1) that it's a class method e.g. `read` 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String` 3) or if it's a resolved instance, then which instance is it and which package, module and declaration is that defined in?

Starting with this file that has a TypecheckedModule in it: https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-...

I presume at this point that instance resolution has taken place. I'm not sure that dictionaries or chosen instances are inserted into the AST, or whether just the resolved types are inserted e.g. `Int -> String`, where I want e.g. `Read Int`, which might lead me to finding the matching instance from an InstEnv or so.

I'd like to do some analyses of Haskell codebases, and the fact that calls to class methods are opaque is a bit of a road-blocker. Any handy tips? Prior work?

It'd be neat in tooling to just hit a goto-definition key on `read` and be taken to the instance implementation rather than the class definition.

Indeed that would be great. I believe (1) is quite straightforward: You can recognize a class operation by looking at the function's IdDetails (specifically looking for ClassOpId). This contains the Class to which the method belongs. Getting back to the instance is a bit trickier. I'll admit I don't know whether there is a convenient way to do this. However, I can try to fill in some background and give a few ideas. First let's review of how typeclass evidence is represented in HsSyn (apologies if this is already known): For concreteness, let's consider the program, showList :: Show a => [a] -> String showList x = show x After typechecking this will likely turn into something like (taken from the output of -ddump-tc -fprint-typechecker-elaboration): AbsBindsSig [a_a1hj] [$dShow_a1hl] {Exported type: Hi.showList :: forall a. Show a => [a] -> String [LclId] Bind: showList_a1hk x_azo = show @ [a_a1hj] $dShow_a1hn x_azo Evidence: EvBinds{[W] $dShow_a1hn = GHC.Show.$fShow[] @[a_a1hj] [$dShow_a1hl]}} This AbsBind represents a binding abstracted over a dictionary argument ($dShow_a1hl :: Show a_a1hj). The "Evidence" section gives a list of evidence bindings which the desugarer will wrap the RHS in; in this case the typechecker has built a `Show [a_a1hj]` instance from the `Show a => Show [a]` instance defined in GHC.Show and the abstracted `$dShow_A1hl` dictionary. The `show` call site will then look something like this in HsSyn: HsApp (HsWrap (WpEvApp $dShow_a1hn) (HsWrap (WpTyApp a_a1hj) (HsVar GHC.Show.show))) (HsVar x_azo) Here the typechecker has wrapped the (show x_azo) expression in a pair of HsWrappers which apply its type and dictionary arguments. This suggests an approach to identify "generic" call sites (item (2) above): look at whether the RHS of the call site's dictionary is lambda-bound or not. In the above case we see that it is not lambda-bound but rather a concrete dictionary: `GHC.Show.$fShow[]`. You can know that this is a dictionary by looking at its IdDetails (specifically, it is of the DFunId variety). By contrast if we have a generic call-site: printIt :: Show a => a -> IO () printIt x = putStrLn $ show x We see that we the evidence binding is headed by a lambda-bound dictionary: AbsBindsSig [a_a1AP] [$dShow_a1AR] {Exported type: printIt :: forall a. Show a => a -> IO () [LclId] Bind: printIt_a1AQ x_a12W = putStrLn $ show @ a_a1AP $dShow_a1AV x_a12W Evidence: EvBinds{[W] $dShow_a1AV = $dShow_a1AR}} Of course, in the case that you have a concrete dictionary you *also* want to know the source location of the instance declaration from which it arose. I'm afraid this may be quite challenging as this isn't information we currently keep. Currently interface files don't really keep any information that might be useful to IDE tooling users. It's possible that we could add such information, although it's unclear exactly what this would look like. It would be great to hear more from tooling users regarding what information they would like to see. Also relevant here is the HIE file GSoC project [1] being worked on this summer of Zubin Duggal (CC'd).

...

Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not.

This may be non-trivial; you may be able to get something along these lines out of the strictness signature present in IdInfo. However, I suspect this will be a bit fragile (e.g. we don't even run demand analysis with -O0 IIRC). Cheers, - Ben [1] https://ghc.haskell.org/trac/ghc/wiki/HIEFiles

Christopher Done

6:07 p.m.

Ben, Thanks for the in-depth elaboration of what Mathew/Simon were describing! It seems within reach!

...

Of course, in the case that you have a concrete dictionary you *also* want to know the source location of the instance declaration from which it arose. I'm afraid this may be quite challenging as this isn't information we currently keep. Currently interface files don't really keep any information that might be useful to IDE tooling users. It's possible that we could add such information, although it's unclear exactly what this would look like. It would be great to hear more from tooling users regarding what information they would like to see.

Indeed, not having the exact source location was a stretch, I didn't have high hopes for that. However, the package and module is actually useful. Regarding that, I did find the following field: -- | @is_dfun_name = idName . is_dfun@. -- -- We use 'is_dfun_name' for the visibility check, -- 'instIsVisible', which needs to know the 'Module' which the -- dictionary is defined in. However, we cannot use the 'Module' -- attached to 'is_dfun' since doing so would mean we would -- potentially pull in an entire interface file unnecessarily. -- This was the cause of #12367. , is_dfun_name :: Name So it seems like I could use the Name to get a Module which contains a UnitId (package and version) and ModuleName. If I've already generated the right metadata for that package and module, then I can do the mapping.

...

Also relevant here is the HIE file GSoC project [1] being worked on this summer of Zubin Duggal (CC'd).

I think this would be a good use-case for that.

...

...
Also, listing all functions that use throw# or functions defined in terms of throw# or FFI calls would be helpful, especially for doing audits. If I could immediately list all partial functions in a project, then list all call-sites, it would be a very convenient way when doing an audit to see whether partial functions (such as head) are used with the proper preconditions or not.

This may be non-trivial; you may be able to get something along these lines out of the strictness signature present in IdInfo. However, I suspect this will be a bit fragile (e.g. we don't even run demand analysis with -O0 IIRC).

I was going to start with a very naive approach of creating a dependency graph merely based on presence in a declaration, not on use. E.g. foo = if False then head [] else 123 would still be flagged up as partial, even though upon inspection it isn't. But it uses `head`, so it should arouse suspicion. I'd want to review it myself and determine that it's safe and then mark it safe. In the least, I might mark such code as having potential for bugs. Cheers!

2748

Age (days ago)

2748

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Ben Gamari
Christopher Done
Matthew Pickering