
9 Aug
2023
9 Aug
'23
3:36 p.m.
Hi Tristan, I wouldn't do this with Core (cf inlining issue and issue associating what you find with source syntax). I think you should use the output of the renamer instead. Either with a GHC plugin using `renamedResultAction` or just by dumping the renamed AST (fully qualified) with -ddump-rn-ast -ddump-to-file and grepping for the names you want. Cheers, Sylvain Le 9 août 2023 à 21:07, à 21:07, Tristan Cacqueraya écrit: > >On Mon, Jul 31, 2023 at 16:26 Tristan Cacqueray wrote: >> On Mon, Jul 31, 2023 at 11:05 David Christiansen via ghc-devs wrote: >>> Dear GHC devs, >>> >>> I think that having automated security advisory warnings from build >tools >>> is important for Haskell adoption in certain industries. This can be >done >>> based on build plans, but a package is really the wrong granularity >- a >>> large, widely-used package might export a little-used definition >that is >>> the subject of an advisory, and it would be good to warn only the >users of >>> said definition (cf base and readFloat). >>> >>> Tristan is exploring using HIE files to do this check, but I don't >know if >>> you read Discourse, where he posted the question: >>> >https://discourse.haskell.org/t/rfc-using-hie-files-to-list-external-declarations-for-cabal-audit/7147 >>> >> >> Thank you David for bringing this up here. One thing to note is that >we >> would need hie files for ghc libraries, as proposed in: >> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1337 >> >> Cheers, >> -Tristan > >Dear GHC devs, > >To recap, the goal of this project is to check if a given declaration >is >used by a package. For example, I would like to check if such >definition: "package:Module.name" is reachable from another module. > >In this post I list the considered options, and raise some questions >about using the simplified core from .hi files. > >I would appreciate if you could have a look and help me figure out the >remaining blockers. Note that I'm not very familiar with the GHC >internals and how to properly read Core expressions, so any feedback >would be appreciated. > > ># Context and Problem Statement > >We would like to check if a package is affected by a known >vulnerability. Instead of looking at the build dependencies names and >versions, we would like to search for individual functions. This is >particularly important to avoid false alarm when a given vulnerability >only appears in a rarely used declaration of a popular package. > >Therefor, we need a way to search the whole call graph to assert with >confidence that a given declaration is not used (e.g. reachable). > > ># Considered Options > >To obtain the call graph data, the following options are considered: > >* .hie files produced when using the `-fwrite-ide-info` flag. >* .modpack files produced by the [wpc-plugin][grin]. >* custom GHC plugin. >* .hi files containing the simplified core when using the > `-fwrite-if-simplified-core` flag. > > ># Pros and Cons of the Options > >### Hie files > >This option is similar to what [weeder][weeder] already implements. >However this file format is designed for IDE, and it may not be >suitable >for our problem. For example, RULES, deriving, RebindableSyntax and >template haskell are not well captured. > >[weeder]: https://github.com/ocharles/weeder/ > >### Modpack > >This option appears to work, but it seems overkill. I don't think we >need to reach for STG representation. > >[grin]: >https://github.com/grin-compiler/ghc-whole-program-compiler-project > >### Custom GHC plugin > >This option enables extra metadata to be collected, but if using the >simplified core is enough, then it is just an extra step compared to >using .hi files. > >### Hi files > >Using .hi files is the only option that doesn't require an extra >compilation artifacts, the necessary files are already part of the >packages. > >To collect hie files or files generated by a GHC plugin, >ghc/cabal/stack >all need some extra work: > >- ghc libraries doesn't ship hie files >([issue!16901](https://gitlab.haskell.org/ghc/ghc/-/issues/16901)). >- cabal needs recent changes for hie files >([PR#9019](https://github.com/haskell/cabal/pull/9019)) and plugin >artifacts ([PR#8662](https://github.com/haskell/cabal/pull/8662)). >- stack doesn't seem to install hie files for global library. > >Moreover, creating artifacts with a plugin for ghc libraries may >requires manual steps because these libraries are not built by the >end user. > >Therefor, using .hi files is the most straightforward solution. > > ># Questions > >In this section I present the current implementation of >[cabal-audit](https://github.com/TristanCacqueray/cabal-audit/). > > >## Collecting dependencies from core > >In the >[cabal-audit-core:CabalAudit.Core](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-core/src/CabalAudit/Core.hs) >module I implemented the logic to extract the call graph from core >expression into a list of declarations composed of > `UnitId:ModuleName.OccName` and their dependencies. > >Here is an example output for the >[cabal-audit-test:CabalAudit.Test.User](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-test/src/CabalAudit/Test/User.hs) >module: > >```ShellSession >$ cabal run -O0 --write-ghc-environment=always cabal-audit-hi -- >CabalAudit.Test.User >cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined: >base:GHC.Num.$fNumInt, base:GHC.Num.-, ghc-prim:GHC.Types.I# >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea: >cabal-audit-test:CabalAudit.Test.Instance.$ctasty1 >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee: >cabal-audit-test:CabalAudit.Test.Instance.$ctasty >cabal-audit-test:CabalAudit.Test.Instance.$ctasty: >ghc-prim:GHC.Classes.&&, ghc-prim:GHC.Types.True >cabal-audit-test:CabalAudit.Test.Instance.$ctasty1: base:GHC.Base.., >cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue, >ghc-prim:GHC.Classes.not >cabal-audit-test:CabalAudit.Test.Instance.alwaysTrue: >base:GHC.Base.const, ghc-prim:GHC.Types.True >cabal-audit-test:CabalAudit.Test.User.monDoubleDecr: >base:GHC.Num.$fNumInt, base:GHC.Num.-, >cabal-audit-test:CabalAudit.Test.Inline.fonctionInlined, >ghc-prim:GHC.Types.I# >cabal-audit-test:CabalAudit.Test.User.useAlwaysTrue: >cabal-audit-test:CabalAudit.Test.Instance.Tea, >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassTea >cabal-audit-test:CabalAudit.Test.User.useCofeeInstance: >cabal-audit-test:CabalAudit.Test.Instance.Cofee, >cabal-audit-test:CabalAudit.Test.Instance.$fTestClassCofee >``` > >This appears correct, in particular: > >- Type class instances are uniquely identified (that was not working >well when using a custom plugin). >- Inlined declaration are not inlined in the simplified core when built >with `-O0`. > >However this is collecting extra definitions that are not part of the >source file. I understand that '$fTestClassTea' means the 'TestClass' >instance of 'Tea'. But it seems like the actual implementation is >behind >the extra '$ctasty' declaration. Moreover, when analyzing the other >test >modules, I see many declarations named 'lvlXX', which I guess are local >names that have been floated out. > >This is not ideal because the resulting graph contains extra edges that >are not relevant for the end user. I tried to tidy this using >'isExportedId' and 'idDetails' from 'GHC.Types.Var' but I worry that >this not a good strategy. So my question is: how to recover the >original >declarations context of core expressions, so that the resulting >dependency graph only contains edges that are part of the source >declaration? I assume this can be done by dissolving the declarations >starting with '$' or 'lvl', but it would be good to know how to do that >reliably. > > >## Handling inlined declaration > >When compiling with `-O1`, declarations seem to be inlined in the >simplified core. In that case, is it possible to recover the original >inlined OccName? > >If not, I guess we have to use a GHC plugin. >I investigated this strategy in >[cabal-audit-plugin:CabalAudit.Plugin](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-plugin/src/CabalAudit/Plugin.hs). > >However I am not sure this is done correctly and I could use some >guidances on how to proceed. > > >## Loading hidden module > >If I understand correctly, accessing the ModIface mi_extra_decls to get >the simplified core requires an HscEnv. >In the >[cabal-audit-hi:GhcExtras](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-hi/src/GhcExtras.hs) >module, I put together the following helpers using GHC as a library: > >```haskell >-- | Setup a Ghc session using the packages found in the local >environment file >runGhcWithEnv :: Ghc a -> IO a > >-- | Lookup a module and extract the simplified core. >getCoreBind :: ModuleName -> Maybe FastString -> Ghc (Maybe (Module, >[CoreBind])) >``` > >However this doesn't work for hidden modules, trying to load them with >'GHC.lookupModule' fails with this error: > >```ShellSession > Could not load module `GHC.Event.Thread' > it is a hidden module in the package `base-4.18.0.0' >``` > >I tried to reset the hsc_env.hsc_dflags.hiddenModules but without luck. >Is there a trick to access the ModIface of hidden modules? > > >## Including simplified core in .hi files by default > >In the cabal-audit flake, I am using a nix override to set the >`-fwrite-if-simplified-core` ghc-options by default and to patch the >ghc >build phase to use the `+hi_core` hadrian transformers. > >To avoid rebuilding the dependencies, it would be great to have the >simplified core in the hi file by default. >Is there an issue or a downside when enabling the flag by default? >Could the libraries shipped with GHC contains the simplified core in >the >future? > > >## Declaration identifications > >In the >[cabal-audit-command:CabalAudit.Command](https://github.com/TristanCacqueray/cabal-audit/blob/main/cabal-audit-command/src/CabalAudit/Command.hs) >module, I implemented a proof of concept reverse lookup to find >reachable declarations. For example using this command: > >```ShellSession >$ cabal-audit-hi --target GHC.Exception.throw CabalAudit.Test.Simple >base:GHC.Exception.throw >| >`- base:GHC.IO.Handle.Internals.ioe_finalizedHandle > | > `- base:GHC.IO.Handle.FD.$wstdHandleFinalizer > | > `- base:GHC.IO.Handle.FD.stdout > | > +- base:System.IO.putStrLn1 > | | > | `- base:System.IO.putStrLn > | | > | `- cabal-audit-test:CabalAudit.Test.Simple.afficheNombre > | > `- base:System.IO.putStr1 > | > `- base:System.IO.putStr > | > `- cabal-audit-test:CabalAudit.Test.Simple.maFonction >``` > >In the event a vulnerability happens in a type class instance, how to >identify the affected instance? >Instead of using 'package:Module.$fClassNameDataName', is there an >established format we could use (for example "Typeclass X instance of >T"). > >What about data types or type families, would it makes sense to include >them in the graph? If so, how to identify them in the advisory >database? > > >Please let me know if I miss something. >Thanks for your time! >-Tristan > > >------------------------------------------------------------------------ > >_______________________________________________ >ghc-devs mailing list >ghc-devs@haskell.org >http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs