
Dear GHC devs, I think that having automated security advisory warnings from build tools is important for Haskell adoption in certain industries. This can be done based on build plans, but a package is really the wrong granularity - a large, widely-used package might export a little-used definition that is the subject of an advisory, and it would be good to warn only the users of said definition (cf base and readFloat). Tristan is exploring using HIE files to do this check, but I don't know if you read Discourse, where he posted the question: https://discourse.haskell.org/t/rfc-using-hie-files-to-list-external-declara... Thanks! David

On Mon, Jul 31, 2023 at 11:05 David Christiansen via ghc-devs wrote:
Dear GHC devs,
I think that having automated security advisory warnings from build tools is important for Haskell adoption in certain industries. This can be done based on build plans, but a package is really the wrong granularity - a large, widely-used package might export a little-used definition that is the subject of an advisory, and it would be good to warn only the users of said definition (cf base and readFloat).
Tristan is exploring using HIE files to do this check, but I don't know if you read Discourse, where he posted the question: https://discourse.haskell.org/t/rfc-using-hie-files-to-list-external-declara...
Thank you David for bringing this up here. One thing to note is that we would need hie files for ghc libraries, as proposed in: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1337 Cheers, -Tristan

On Mon, Jul 31, 2023 at 16:26 Tristan Cacqueray wrote:
On Mon, Jul 31, 2023 at 11:05 David Christiansen via ghc-devs wrote:
Dear GHC devs,
I think that having automated security advisory warnings from build tools is important for Haskell adoption in certain industries. This can be done based on build plans, but a package is really the wrong granularity - a large, widely-used package might export a little-used definition that is the subject of an advisory, and it would be good to warn only the users of said definition (cf base and readFloat).
Tristan is exploring using HIE files to do this check, but I don't know if you read Discourse, where he posted the question: https://discourse.haskell.org/t/rfc-using-hie-files-to-list-external-declara...
Thank you David for bringing this up here. One thing to note is that we would need hie files for ghc libraries, as proposed in: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1337
Cheers, -Tristan
Dear GHC devs, To recap, the goal of this project is to check if a given declaration is used by a package. For example, I would like to check if such definition: "package:Module.name" is reachable from another module. In this post I list the considered options, and raise some questions about using the simplified core from .hi files. I would appreciate if you could have a look and help me figure out the remaining blockers. Note that I'm not very familiar with the GHC internals and how to properly read Core expressions, so any feedback would be appreciated. # Context and Problem Statement We would like to check if a package is affected by a known vulnerability. Instead of looking at the build dependencies names and versions, we would like to search for individual functions. This is particularly important to avoid false alarm when a given vulnerability only appears in a rarely used declaration of a popular package. Therefor, we need a way to search the whole call graph to assert with confidence that a given declaration is not used (e.g. reachable). # Considered Options To obtain the call graph data, the following options are considered: * .hie files produced when using the `-fwrite-ide-info` flag. * .modpack files produced by the [wpc-plugin][grin]. * custom GHC plugin. * .hi files containing the simplified core when using the `-fwrite-if-simplified-core` flag. # Pros and Cons of the Options ### Hie files This option is similar to what [weeder][weeder] already implements. However this file format is designed for IDE, and it may not be suitable for our problem. For example, RULES, deriving, RebindableSyntax and template haskell are not well captured. [weeder]: https://github.com/ocharles/weeder/ ### Modpack This option appears to work, but it seems overkill. I don't think we need to reach for STG representation. [grin]: https://github.com/grin-compiler/ghc-whole-program-compiler-project ### Custom GHC plugin This option enables extra metadata to be collected, but if using the simplified core is enough, then it is just an extra step compared to using .hi files. ### Hi files Using .hi files is the only option that doesn't require an extra compilation artifacts, the necessary files are already part of the packages. To collect hie files or files generated by a GHC plugin, ghc/cabal/stack all need some extra work: - ghc libraries doesn't ship hie files ([issue!16901](https://gitlab.haskell.org/ghc/ghc/-/issues/16901)). - cabal needs recent changes for hie files ([PR#9019](https://github.com/haskell/cabal/pull/9019)) and plugin artifacts ([PR#8662](https://github.com/haskell/cabal/pull/8662)). - stack doesn't seem to install hie files for global library. Moreover, creating artifacts with a plugin for ghc libraries may requires manual steps because these libraries are not built by the end user. Therefor, using .hi files is the most straightforward solution. # Questions In this section I present the current implementation of [cabal-audit](https://github.com/TristanCacqueray/cabal-audit/). ## Collecting dependencies from core In the [cabal-audit-core:CabalAudit.Core](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-core/s...) module I implemented the logic to extract the call graph from core expression into a list of declarations composed of `UnitId:ModuleName.OccName` and their dependencies. Here is an example output for the [cabal-audit-test:CabalAudit.Test.User](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-test/s...) module: ```ShellSession $ cabal run -O0 --write-ghc-environment=always cabal-audit-hi -- CabalAudit.Test.User cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined: base:GHC.Num.$fNumInt, base:GHC.Num.-, ghc-prim:GHC.Types.I# cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea: cabal-audit-test:CabalAudit.Test.Instance.$ctasty1 cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee: cabal-audit-test:CabalAudit.Test.Instance.$ctasty cabal-audit-test:CabalAudit.Test.Instance.$ctasty: ghc-prim:GHC.Classes.&&, ghc-prim:GHC.Types.True cabal-audit-test:CabalAudit.Test.Instance.$ctasty1: base:GHC.Base.., cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue, ghc-prim:GHC.Classes.not cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue: base:GHC.Base.const, ghc-prim:GHC.Types.True cabal-audit-test:CabalAudit.Test.User.monDoubleDecr: base:GHC.Num.$fNumInt, base:GHC.Num.-, cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined, ghc-prim:GHC.Types.I# cabal-audit-test:CabalAudit.Test.User.useAlwaysTrue: cabal-audit-test:CabalAudit.Test.Instance.Tea, cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea cabal-audit-test:CabalAudit.Test.User.useCofeeInstance: cabal-audit-test:CabalAudit.Test.Instance.Cofee, cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee ``` This appears correct, in particular: - Type class instances are uniquely identified (that was not working well when using a custom plugin). - Inlined declaration are not inlined in the simplified core when built with `-O0`. However this is collecting extra definitions that are not part of the source file. I understand that '$fTestClassTea' means the 'TestClass' instance of 'Tea'. But it seems like the actual implementation is behind the extra '$ctasty' declaration. Moreover, when analyzing the other test modules, I see many declarations named 'lvlXX', which I guess are local names that have been floated out. This is not ideal because the resulting graph contains extra edges that are not relevant for the end user. I tried to tidy this using 'isExportedId' and 'idDetails' from 'GHC.Types.Var' but I worry that this not a good strategy. So my question is: how to recover the original declarations context of core expressions, so that the resulting dependency graph only contains edges that are part of the source declaration? I assume this can be done by dissolving the declarations starting with '$' or 'lvl', but it would be good to know how to do that reliably. ## Handling inlined declaration When compiling with `-O1`, declarations seem to be inlined in the simplified core. In that case, is it possible to recover the original inlined OccName? If not, I guess we have to use a GHC plugin. I investigated this strategy in [cabal-audit-plugin:CabalAudit.Plugin](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-plugin...). However I am not sure this is done correctly and I could use some guidances on how to proceed. ## Loading hidden module If I understand correctly, accessing the ModIface mi_extra_decls to get the simplified core requires an HscEnv. In the [cabal-audit-hi:GhcExtras](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-hi/src...) module, I put together the following helpers using GHC as a library: ```haskell -- | Setup a Ghc session using the packages found in the local environment file runGhcWithEnv :: Ghc a -> IO a -- | Lookup a module and extract the simplified core. getCoreBind :: ModuleName -> Maybe FastString -> Ghc (Maybe (Module, [CoreBind])) ``` However this doesn't work for hidden modules, trying to load them with 'GHC.lookupModule' fails with this error: ```ShellSession Could not load module `GHC.Event.Thread' it is a hidden module in the package `base-4.18.0.0' ``` I tried to reset the hsc_env.hsc_dflags.hiddenModules but without luck. Is there a trick to access the ModIface of hidden modules? ## Including simplified core in .hi files by default In the cabal-audit flake, I am using a nix override to set the `-fwrite-if-simplified-core` ghc-options by default and to patch the ghc build phase to use the `+hi_core` hadrian transformers. To avoid rebuilding the dependencies, it would be great to have the simplified core in the hi file by default. Is there an issue or a downside when enabling the flag by default? Could the libraries shipped with GHC contains the simplified core in the future? ## Declaration identifications In the [cabal-audit-command:CabalAudit.Command](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-comman...) module, I implemented a proof of concept reverse lookup to find reachable declarations. For example using this command: ```ShellSession $ cabal-audit-hi --target GHC.Exception.throw CabalAudit.Test.Simple base:GHC.Exception.throw | `- base:GHC.IO.Handle.Internals.ioe_finalizedHandle | `- base:GHC.IO.Handle.FD.$wstdHandleFinalizer | `- base:GHC.IO.Handle.FD.stdout | +- base:System.IO.putStrLn1 | | | `- base:System.IO.putStrLn | | | `- cabal-audit-test:CabalAudit.Test.Simple.afficheNombre | `- base:System.IO.putStr1 | `- base:System.IO.putStr | `- cabal-audit-test:CabalAudit.Test.Simple.maFonction ``` In the event a vulnerability happens in a type class instance, how to identify the affected instance? Instead of using 'package:Module.$fClassNameDataName', is there an established format we could use (for example "Typeclass X instance of T"). What about data types or type families, would it makes sense to include them in the graph? If so, how to identify them in the advisory database? Please let me know if I miss something. Thanks for your time! -Tristan

Hi Tristan, I wouldn't do this with Core (cf inlining issue and issue associating what you find with source syntax). I think you should use the output of the renamer instead. Either with a GHC plugin using `renamedResultAction` or just by dumping the renamed AST (fully qualified) with -ddump-rn-ast -ddump-to-file and grepping for the names you want. Cheers, Sylvain Le 9 août 2023 à 21:07, à 21:07, Tristan Cacqueraya écrit: > >On Mon, Jul 31, 2023 at 16:26 Tristan Cacqueray wrote: >> On Mon, Jul 31, 2023 at 11:05 David Christiansen via ghc-devs wrote: >>> Dear GHC devs, >>> >>> I think that having automated security advisory warnings from build >tools >>> is important for Haskell adoption in certain industries. This can be >done >>> based on build plans, but a package is really the wrong granularity >- a >>> large, widely-used package might export a little-used definition >that is >>> the subject of an advisory, and it would be good to warn only the >users of >>> said definition (cf base and readFloat). >>> >>> Tristan is exploring using HIE files to do this check, but I don't >know if >>> you read Discourse, where he posted the question: >>> >https://discourse.haskell.org/t/rfc-using-hie-files-to-list-external-declarations-for-cabal-audit/7147 >>> >> >> Thank you David for bringing this up here. One thing to note is that >we >> would need hie files for ghc libraries, as proposed in: >> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1337 >> >> Cheers, >> -Tristan > >Dear GHC devs, > >To recap, the goal of this project is to check if a given declaration >is >used by a package. For example, I would like to check if such >definition: "package:Module.name" is reachable from another module. > >In this post I list the considered options, and raise some questions >about using the simplified core from .hi files. > >I would appreciate if you could have a look and help me figure out the >remaining blockers. Note that I'm not very familiar with the GHC >internals and how to properly read Core expressions, so any feedback >would be appreciated. > > ># Context and Problem Statement > >We would like to check if a package is affected by a known >vulnerability. Instead of looking at the build dependencies names and >versions, we would like to search for individual functions. This is >particularly important to avoid false alarm when a given vulnerability >only appears in a rarely used declaration of a popular package. > >Therefor, we need a way to search the whole call graph to assert with >confidence that a given declaration is not used (e.g. reachable). > > ># Considered Options > >To obtain the call graph data, the following options are considered: > >* .hie files produced when using the `-fwrite-ide-info` flag. >* .modpack files produced by the [wpc-plugin][grin]. >* custom GHC plugin. >* .hi files containing the simplified core when using the > `-fwrite-if-simplified-core` flag. > > ># Pros and Cons of the Options > >### Hie files > >This option is similar to what [weeder][weeder] already implements. >However this file format is designed for IDE, and it may not be >suitable >for our problem. For example, RULES, deriving, RebindableSyntax and >template haskell are not well captured. > >[weeder]: https://github.com/ocharles/weeder/ > >### Modpack > >This option appears to work, but it seems overkill. I don't think we >need to reach for STG representation. > >[grin]: >https://github.com/grin-compiler/ghc-whole-program-compiler-project > >### Custom GHC plugin > >This option enables extra metadata to be collected, but if using the >simplified core is enough, then it is just an extra step compared to >using .hi files. > >### Hi files > >Using .hi files is the only option that doesn't require an extra >compilation artifacts, the necessary files are already part of the >packages. > >To collect hie files or files generated by a GHC plugin, >ghc/cabal/stack >all need some extra work: > >- ghc libraries doesn't ship hie files >([issue!16901](https://gitlab.haskell.org/ghc/ghc/-/issues/16901)). >- cabal needs recent changes for hie files >([PR#9019](https://github.com/haskell/cabal/pull/9019)) and plugin >artifacts ([PR#8662](https://github.com/haskell/cabal/pull/8662)). >- stack doesn't seem to install hie files for global library. > >Moreover, creating artifacts with a plugin for ghc libraries may >requires manual steps because these libraries are not built by the >end user. > >Therefor, using .hi files is the most straightforward solution. > > ># Questions > >In this section I present the current implementation of >[cabal-audit](https://github.com/TristanCacqueray/cabal-audit/). > > >## Collecting dependencies from core > >In the >[cabal-audit-core:CabalAudit.Core](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-core/src/CabalAudit/Core.hs) >module I implemented the logic to extract the call graph from core >expression into a list of declarations composed of > `UnitId:ModuleName.OccName` and their dependencies. > >Here is an example output for the >[cabal-audit-test:CabalAudit.Test.User](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-test/src/CabalAudit/Test/User.hs) >module: > >```ShellSession >$ cabal run -O0 --write-ghc-environment=always cabal-audit-hi -- >CabalAudit.Test.User >cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined: >base:GHC.Num.$fNumInt, base:GHC.Num.-, ghc-prim:GHC.Types.I# >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea: >cabal-audit-test:CabalAudit.Test.Instance.$ctasty1 >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee: >cabal-audit-test:CabalAudit.Test.Instance.$ctasty >cabal-audit-test:CabalAudit.Test.Instance.$ctasty: >ghc-prim:GHC.Classes.&&, ghc-prim:GHC.Types.True >cabal-audit-test:CabalAudit.Test.Instance.$ctasty1: base:GHC.Base.., >cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue, >ghc-prim:GHC.Classes.not >cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue: >base:GHC.Base.const, ghc-prim:GHC.Types.True >cabal-audit-test:CabalAudit.Test.User.monDoubleDecr: >base:GHC.Num.$fNumInt, base:GHC.Num.-, >cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined, >ghc-prim:GHC.Types.I# >cabal-audit-test:CabalAudit.Test.User.useAlwaysTrue: >cabal-audit-test:CabalAudit.Test.Instance.Tea, >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea >cabal-audit-test:CabalAudit.Test.User.useCofeeInstance: >cabal-audit-test:CabalAudit.Test.Instance.Cofee, >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee >``` > >This appears correct, in particular: > >- Type class instances are uniquely identified (that was not working >well when using a custom plugin). >- Inlined declaration are not inlined in the simplified core when built >with `-O0`. > >However this is collecting extra definitions that are not part of the >source file. I understand that '$fTestClassTea' means the 'TestClass' >instance of 'Tea'. But it seems like the actual implementation is >behind >the extra '$ctasty' declaration. Moreover, when analyzing the other >test >modules, I see many declarations named 'lvlXX', which I guess are local >names that have been floated out. > >This is not ideal because the resulting graph contains extra edges that >are not relevant for the end user. I tried to tidy this using >'isExportedId' and 'idDetails' from 'GHC.Types.Var' but I worry that >this not a good strategy. So my question is: how to recover the >original >declarations context of core expressions, so that the resulting >dependency graph only contains edges that are part of the source >declaration? I assume this can be done by dissolving the declarations >starting with '$' or 'lvl', but it would be good to know how to do that >reliably. > > >## Handling inlined declaration > >When compiling with `-O1`, declarations seem to be inlined in the >simplified core. In that case, is it possible to recover the original >inlined OccName? > >If not, I guess we have to use a GHC plugin. >I investigated this strategy in >[cabal-audit-plugin:CabalAudit.Plugin](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-plugin/src/CabalAudit/Plugin.hs). > >However I am not sure this is done correctly and I could use some >guidances on how to proceed. > > >## Loading hidden module > >If I understand correctly, accessing the ModIface mi_extra_decls to get >the simplified core requires an HscEnv. >In the >[cabal-audit-hi:GhcExtras](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-hi/src/GhcExtras.hs) >module, I put together the following helpers using GHC as a library: > >```haskell >-- | Setup a Ghc session using the packages found in the local >environment file >runGhcWithEnv :: Ghc a -> IO a > >-- | Lookup a module and extract the simplified core. >getCoreBind :: ModuleName -> Maybe FastString -> Ghc (Maybe (Module, >[CoreBind])) >``` > >However this doesn't work for hidden modules, trying to load them with >'GHC.lookupModule' fails with this error: > >```ShellSession > Could not load module `GHC.Event.Thread' > it is a hidden module in the package `base-4.18.0.0' >``` > >I tried to reset the hsc_env.hsc_dflags.hiddenModules but without luck. >Is there a trick to access the ModIface of hidden modules? > > >## Including simplified core in .hi files by default > >In the cabal-audit flake, I am using a nix override to set the >`-fwrite-if-simplified-core` ghc-options by default and to patch the >ghc >build phase to use the `+hi_core` hadrian transformers. > >To avoid rebuilding the dependencies, it would be great to have the >simplified core in the hi file by default. >Is there an issue or a downside when enabling the flag by default? >Could the libraries shipped with GHC contains the simplified core in >the >future? > > >## Declaration identifications > >In the >[cabal-audit-command:CabalAudit.Command](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-command/src/CabalAudit/Command.hs) >module, I implemented a proof of concept reverse lookup to find >reachable declarations. For example using this command: > >```ShellSession >$ cabal-audit-hi --target GHC.Exception.throw CabalAudit.Test.Simple >base:GHC.Exception.throw >| >`- base:GHC.IO.Handle.Internals.ioe_finalizedHandle > | > `- base:GHC.IO.Handle.FD.$wstdHandleFinalizer > | > `- base:GHC.IO.Handle.FD.stdout > | > +- base:System.IO.putStrLn1 > | | > | `- base:System.IO.putStrLn > | | > | `- cabal-audit-test:CabalAudit.Test.Simple.afficheNombre > | > `- base:System.IO.putStr1 > | > `- base:System.IO.putStr > | > `- cabal-audit-test:CabalAudit.Test.Simple.maFonction >``` > >In the event a vulnerability happens in a type class instance, how to >identify the affected instance? >Instead of using 'package:Module.$fClassNameDataName', is there an >established format we could use (for example "Typeclass X instance of >T"). > >What about data types or type families, would it makes sense to include >them in the graph? If so, how to identify them in the advisory >database? > > >Please let me know if I miss something. >Thanks for your time! >-Tristan > > >------------------------------------------------------------------------ > >_______________________________________________ >ghc-devs mailing list >ghc-devs@haskell.org >http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Hi Sylvain, Using the output of the renamer looks good. However it doesn't seem like it contains the typeclass instances: when calling a typeclass method for a given type, the AST contains the name of the method and the type, but not the instance. Is it complicated to lookup the relevant instance from the renamer output? Thanks, -Tristan On Wed, Aug 09, 2023 at 21:36 Sylvain Henry wrote:
Hi Tristan,
I wouldn't do this with Core (cf inlining issue and issue associating what you find with source syntax).
I think you should use the output of the renamer instead. Either with a GHC plugin using `renamedResultAction` or just by dumping the renamed AST (fully qualified) with -ddump-rn-ast -ddump-to-file and grepping for the names you want.
Cheers, Sylvain
participants (3)
-
David Christiansen
-
Sylvain Henry
-
Tristan Cacqueray