Collecting all external names in a module

Hi, I have a question regarding the GHC API. Given a module, I'm trying to collect * the Name and SrcSpan of all top-level definitions, * the Name and SrcSpan of all (local) uses of these top-level definition * the Name and SrcSpan of all uses of imported definitions. For example, given the file A.hs module B where data Foo = Bar | Baz String main = print $ "Hello, World!" ++ show test test = let x = 2 in x I would like to output: B.Foo - A.hs:3:5-7 B.Bar - A.hs:3:10-12 B.Baz - A.hs:3:10-12 GHC.Base.String - A.hs:3:13-18 B.main - A.hs:5:1-4 System.IO.print - A.hs:5:8-11 GHC.Base.++ - A.hs:5:18-19 etc. (The line/column numbers are made up.) * I do not want to output e.g. 'x' as it's not a top-level identifier (the code I've included below gets this wrong). * I want to output whether the SrcSpan corresponds to a use site or definition site of the Name. For example: 'Foo' is a definition site while 'print' is a use site. I started writing a manual traversal of the RenamedSource AST (as I want qualified names) but I thought I check if I'm going about this right before I spend all the time required to write the traversal for the whole AST. Here's the code I have so far, am I on the right track? ---------- -- | Collects all qualified names that are referred to in a module, -- that are either defineds at the top-level in that module or that -- are imported from some other module. module Main where import Bag import DynFlags ( defaultDynFlags ) import GHC import GHC.Paths ( libdir ) import Outputable import System.Environment -- | Is the 'Name' defined here? data Origin = Local | External type Use = (Name, Origin, SrcSpan) local :: Name -> Use local name = (name, Local, nameSrcSpan name) external :: Name -> SrcSpan -> Use external name loc = (name, External, loc) showName :: (Name, Origin, SrcSpan) -> String showName (name, org, loc) = showSDoc (ppr name) ++ "," ++ showSDoc (ppr loc) ++ "," ++ showOrg org where showOrg Local = "1" showOrg External = "0" main :: IO () main = do [targetFile] <- getArgs res <- defaultErrorHandler defaultDynFlags $ runGhc (Just libdir) $ do dflags <- getSessionDynFlags _ <- setSessionDynFlags dflags target <- guessTarget targetFile Nothing setTargets [target] _ <- load LoadAllTargets modSum <- getModSummary $ mkModuleName "B" p <- parseModule modSum t <- typecheckModule p let Just (r, _, _, _) = tm_renamed_source t return r putStrLn $ showSDoc $ ppr res putStrLn "" putStr $ unlines $ map showName (collectHsGroup res) ------------------------------------------------------------------------ -- AST traversal -- | Collect all external qualified names in the module. collectHsGroup :: HsGroup Name -> [Use] collectHsGroup = collectHsValBindsLR . hs_valds collectHsValBindsLR :: HsValBindsLR Name Name -> [Use] collectHsValBindsLR (ValBindsOut xs _) = concatMap collectHsBindNames . map unLoc . concatMap bagToList . map snd $ xs collectHsValBindsLR (ValBindsIn binds _) = concatMap (collectHsBindNames . unLoc) (bagToList binds) collectHsBindNames :: HsBindLR Name Name -> [Use] collectHsBindNames fb@(FunBind { fun_id = L _ f }) = [local f] ++ collectMatchGroupNames (fun_matches fb) collectHsBindNames _ = [] collectMatchGroupNames :: MatchGroup Name -> [Use] collectMatchGroupNames (MatchGroup matches _) = concat [collectGRHSsNames x | Match _ _ x <- map unLoc matches] collectGRHSsNames :: GRHSs Name -> [Use] collectGRHSsNames (GRHSs xs _) = concatMap (collectGRHSNames . unLoc) xs collectGRHSNames :: GRHS Name -> [Use] collectGRHSNames (GRHS _stmts exprs) = collectHsExprNames exprs -- For less typing collectHsExprNames :: LHsExpr Name -> [Use] collectHsExprNames = collect collect :: LHsExpr Name -> [Use] collect (L loc expr) = go expr where go (HsVar name) | isExternalName name = [external name loc] | otherwise = [] go (HsIPVar _) = [] go (HsOverLit _) = [] go (HsLit _) = [] go (HsLam mg) = collectMatchGroupNames mg go (HsApp e1 e2) = collect e1 ++ collect e2 go (OpApp e1 e2 _ e3) = collect e1 ++ collect e2 ++ collect e3 go (NegApp e1 _) = collect e1 -- ++ collect e2 -- ??? go (HsPar e) = collect e go (SectionL e1 e2) = collect e1 ++ collect e2 go (SectionR e1 e2) = collect e1 ++ collect e2 go (ExplicitTuple xs _) = concat [ collect x | Present x <- xs] go (HsCase e mg) = collect e ++ collectMatchGroupNames mg go (HsIf e1 e2 e3) = collect e1 ++ collect e2 ++ collect e3 go (HsLet binds e) = collectHsLocalBindsLR binds ++ collect e -- go (HsDo (HsStmtContext Name) [LStmt id] (LHsExpr id) PostTcType) = -- go (ExplicitList PostTcType [LHsExpr id]) = -- go (ExplicitPArr PostTcType [LHsExpr id]) = -- go (RecordCon (Located id) PostTcExpr (HsRecordBinds id)) = -- go (RecordUpd (LHsExpr id) (HsRecordBinds id) [DataCon] [PostTcType] [PostTcType]) = -- go (ExprWithTySig (LHsExpr id) (LHsType id)) = -- go (ExprWithTySigOut (LHsExpr id) (LHsType Name)) = -- go (ArithSeq PostTcExpr (ArithSeqInfo id)) = -- go (PArrSeq PostTcExpr (ArithSeqInfo id)) = -- go (HsSCC FastString (LHsExpr id)) = -- go (HsCoreAnn FastString (LHsExpr id)) = -- go (HsBracket (HsBracket id)) = -- go (HsBracketOut (HsBracket Name) [PendingSplice]) = -- go (HsSpliceE (HsSplice id)) = -- go (HsQuasiQuoteE (HsQuasiQuote id)) = -- go (HsProc (LPat id) (LHsCmdTop id)) = -- go (HsArrApp (LHsExpr id) (LHsExpr id) PostTcType HsArrAppType Bool) = -- go (HsArrForm (LHsExpr id) (Maybe Fixity) [LHsCmdTop id]) = -- go (HsTick Int [id] (LHsExpr id)) = -- go (HsBinTick Int Int (LHsExpr id)) = -- go (HsTickPragma (FastString, (Int, Int), (Int, Int)) (LHsExpr id)) = -- go (EWildPat) = -- go (EAsPat (Located id) (LHsExpr id)) = -- go (EViewPat (LHsExpr id) (LHsExpr id)) = -- go (ELazyPat (LHsExpr id)) = -- go (HsType (LHsType id)) = -- go (HsWrap HsWrapper (HsExpr id)) = go _ = [] collectHsLocalBindsLR :: HsLocalBindsLR Name Name -> [Use] collectHsLocalBindsLR (HsValBinds x) = collectHsValBindsLR x collectHsLocalBindsLR _ = []

Johan GHC already collects all RdrNames for imported things, for use when reporting unused imports. But it doesn't collect the SrcSpan of the occurrences, nor does it collect occurrences of locally-bound things. I suggest you write a general traversal looking like data Gather var res = Gather { g_empty :: res , g_union :: res -> res -> res , g_occ :: Located var -> res , g_del :: Located var -> res -> res } getExpr :: Gather v res -> HsExpr v -> res .. and similarly for each other data type... You could even use generic programming to do it (all the Hs things are in class Data I think). The GHC could use this function instead of its present mechanism for the unused-import thing, and you could use it too. Simon From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Johan Tibell Sent: 10 September 2010 15:35 To: glasgow-haskell-users Subject: Collecting all external names in a module Hi, I have a question regarding the GHC API. Given a module, I'm trying to collect * the Name and SrcSpan of all top-level definitions, * the Name and SrcSpan of all (local) uses of these top-level definition * the Name and SrcSpan of all uses of imported definitions. For example, given the file A.hs module B where data Foo = Bar | Baz String main = print $ "Hello, World!" ++ show test test = let x = 2 in x I would like to output: B.Foo - A.hs:3:5-7 B.Bar - A.hs:3:10-12 B.Baz - A.hs:3:10-12 GHC.Base.String - A.hs:3:13-18 B.main - A.hs:5:1-4 System.IO.print - A.hs:5:8-11 GHC.Base.++ - A.hs:5:18-19 etc. (The line/column numbers are made up.) * I do not want to output e.g. 'x' as it's not a top-level identifier (the code I've included below gets this wrong). * I want to output whether the SrcSpan corresponds to a use site or definition site of the Name. For example: 'Foo' is a definition site while 'print' is a use site. I started writing a manual traversal of the RenamedSource AST (as I want qualified names) but I thought I check if I'm going about this right before I spend all the time required to write the traversal for the whole AST. Here's the code I have so far, am I on the right track? ---------- -- | Collects all qualified names that are referred to in a module, -- that are either defineds at the top-level in that module or that -- are imported from some other module. module Main where import Bag import DynFlags ( defaultDynFlags ) import GHC import GHC.Paths ( libdir ) import Outputable import System.Environment -- | Is the 'Name' defined here? data Origin = Local | External type Use = (Name, Origin, SrcSpan) local :: Name -> Use local name = (name, Local, nameSrcSpan name) external :: Name -> SrcSpan -> Use external name loc = (name, External, loc) showName :: (Name, Origin, SrcSpan) -> String showName (name, org, loc) = showSDoc (ppr name) ++ "," ++ showSDoc (ppr loc) ++ "," ++ showOrg org where showOrg Local = "1" showOrg External = "0" main :: IO () main = do [targetFile] <- getArgs res <- defaultErrorHandler defaultDynFlags $ runGhc (Just libdir) $ do dflags <- getSessionDynFlags _ <- setSessionDynFlags dflags target <- guessTarget targetFile Nothing setTargets [target] _ <- load LoadAllTargets modSum <- getModSummary $ mkModuleName "B" p <- parseModule modSum t <- typecheckModule p let Just (r, _, _, _) = tm_renamed_source t return r putStrLn $ showSDoc $ ppr res putStrLn "" putStr $ unlines $ map showName (collectHsGroup res) ------------------------------------------------------------------------ -- AST traversal -- | Collect all external qualified names in the module. collectHsGroup :: HsGroup Name -> [Use] collectHsGroup = collectHsValBindsLR . hs_valds collectHsValBindsLR :: HsValBindsLR Name Name -> [Use] collectHsValBindsLR (ValBindsOut xs _) = concatMap collectHsBindNames . map unLoc . concatMap bagToList . map snd $ xs collectHsValBindsLR (ValBindsIn binds _) = concatMap (collectHsBindNames . unLoc) (bagToList binds) collectHsBindNames :: HsBindLR Name Name -> [Use] collectHsBindNames fb@(FunBind { fun_id = L _ f }) = [local f] ++ collectMatchGroupNames (fun_matches fb) collectHsBindNames _ = [] collectMatchGroupNames :: MatchGroup Name -> [Use] collectMatchGroupNames (MatchGroup matches _) = concat [collectGRHSsNames x | Match _ _ x <- map unLoc matches] collectGRHSsNames :: GRHSs Name -> [Use] collectGRHSsNames (GRHSs xs _) = concatMap (collectGRHSNames . unLoc) xs collectGRHSNames :: GRHS Name -> [Use] collectGRHSNames (GRHS _stmts exprs) = collectHsExprNames exprs -- For less typing collectHsExprNames :: LHsExpr Name -> [Use] collectHsExprNames = collect collect :: LHsExpr Name -> [Use] collect (L loc expr) = go expr where go (HsVar name) | isExternalName name = [external name loc] | otherwise = [] go (HsIPVar _) = [] go (HsOverLit _) = [] go (HsLit _) = [] go (HsLam mg) = collectMatchGroupNames mg go (HsApp e1 e2) = collect e1 ++ collect e2 go (OpApp e1 e2 _ e3) = collect e1 ++ collect e2 ++ collect e3 go (NegApp e1 _) = collect e1 -- ++ collect e2 -- ??? go (HsPar e) = collect e go (SectionL e1 e2) = collect e1 ++ collect e2 go (SectionR e1 e2) = collect e1 ++ collect e2 go (ExplicitTuple xs _) = concat [ collect x | Present x <- xs] go (HsCase e mg) = collect e ++ collectMatchGroupNames mg go (HsIf e1 e2 e3) = collect e1 ++ collect e2 ++ collect e3 go (HsLet binds e) = collectHsLocalBindsLR binds ++ collect e -- go (HsDo (HsStmtContext Name) [LStmt id] (LHsExpr id) PostTcType) = -- go (ExplicitList PostTcType [LHsExpr id]) = -- go (ExplicitPArr PostTcType [LHsExpr id]) = -- go (RecordCon (Located id) PostTcExpr (HsRecordBinds id)) = -- go (RecordUpd (LHsExpr id) (HsRecordBinds id) [DataCon] [PostTcType] [PostTcType]) = -- go (ExprWithTySig (LHsExpr id) (LHsType id)) = -- go (ExprWithTySigOut (LHsExpr id) (LHsType Name)) = -- go (ArithSeq PostTcExpr (ArithSeqInfo id)) = -- go (PArrSeq PostTcExpr (ArithSeqInfo id)) = -- go (HsSCC FastString (LHsExpr id)) = -- go (HsCoreAnn FastString (LHsExpr id)) = -- go (HsBracket (HsBracket id)) = -- go (HsBracketOut (HsBracket Name) [PendingSplice]) = -- go (HsSpliceE (HsSplice id)) = -- go (HsQuasiQuoteE (HsQuasiQuote id)) = -- go (HsProc (LPat id) (LHsCmdTop id)) = -- go (HsArrApp (LHsExpr id) (LHsExpr id) PostTcType HsArrAppType Bool) = -- go (HsArrForm (LHsExpr id) (Maybe Fixity) [LHsCmdTop id]) = -- go (HsTick Int [id] (LHsExpr id)) = -- go (HsBinTick Int Int (LHsExpr id)) = -- go (HsTickPragma (FastString, (Int, Int), (Int, Int)) (LHsExpr id)) = -- go (EWildPat) = -- go (EAsPat (Located id) (LHsExpr id)) = -- go (EViewPat (LHsExpr id) (LHsExpr id)) = -- go (ELazyPat (LHsExpr id)) = -- go (HsType (LHsType id)) = -- go (HsWrap HsWrapper (HsExpr id)) = go _ = [] collectHsLocalBindsLR :: HsLocalBindsLR Name Name -> [Use] collectHsLocalBindsLR (HsValBinds x) = collectHsValBindsLR x collectHsLocalBindsLR _ = []

Hi Simon,
Thanks for the pointers!
On Fri, Sep 17, 2010 at 6:29 PM, Simon Peyton-Jones
GHC already collects all RdrNames for imported things, for use when reporting unused imports. But it doesn’t collect the SrcSpan of the occurrences, nor does it collect occurrences of locally-bound things.
I found the spot where the collected RdrNames are used to generate the unused import warnings, but I don't quite understand where they are gathered. Is there an AST traversal function somewhere that gathers these RdrNames? If so, I could use it as a blue print to write my own traversal.
I suggest you write a general traversal looking like
data Gather var res
= Gather { g_empty :: res
, g_union :: res -> res -> res
, g_occ :: Located var -> res
, g_del :: Located var -> res -> res }
getExpr :: Gather v res -> HsExpr v -> res
.. and similarly for each other data type...
Could you expand a little bit on this design? Is the idea that the Gather data type carries functions to apply in different parts of the AST? What's "occ" short for, OccName? What about "del"? There are different kind of ASTs (e.g. after renaming, after type checking, etc), which one should I use if I want to gather all qualified names? Thanks! -- Johan

On 18 September 2010 12:25, Johan Tibell
Hi Simon, Thanks for the pointers! On Fri, Sep 17, 2010 at 6:29 PM, Simon Peyton-Jones
wrote: GHC already collects all RdrNames for imported things, for use when reporting unused imports. But it doesn’t collect the SrcSpan of the occurrences, nor does it collect occurrences of locally-bound things.
I found the spot where the collected RdrNames are used to generate the unused import warnings, but I don't quite understand where they are gathered. Is there an AST traversal function somewhere that gathers these RdrNames? If so, I could use it as a blue print to write my own traversal.
I suggest you write a general traversal looking like
data Gather var res
= Gather { g_empty :: res
, g_union :: res -> res -> res
, g_occ :: Located var -> res
, g_del :: Located var -> res -> res }
getExpr :: Gather v res -> HsExpr v -> res
.. and similarly for each other data type...
Could you expand a little bit on this design? Is the idea that the Gather data type carries functions to apply in different parts of the AST? What's "occ" short for, OccName? What about "del"? There are different kind of ASTs (e.g. after renaming, after type checking, etc), which one should I use if I want to gather all qualified names?
You probably want the renamed AST. The typechecked AST is essentially only a list of top-level bindings. The utility functions Simon suggest look to me like a special sort of fold. g_empty and g_union are clear. g_occ records the occurrence (hence "occ") of a variable and adjusts the fold state accordingly. g_del is most likely to be intended for binders, i.e., a place where a variable goes out of scope when going up. A particular Gather operation is of course not obliged to actually delete anything. So, maybe the "g_del" better be called "g_bind". While Gather on the surface may look polymorphic in the "v" argument, in practise it isn't really. Some parts of the AST will be bound to "error" thunks depending on whether you have a parse AST, a renamed AST or typechecked AST. You also may want different traversal modes. E.g., the renamed AST explicitly fills in which ">>", ">>=", "return", etc. are used by a "do" statement, and it depends on the kind of analysis your doing whether these desugarer-introduced nodes should be included or not. You could also use http://hackage.haskell.org/package/ghc-syb. Here's an example that customises the traversal depending on the stage: http://github.com/nominolo/ghc-syb/blob/master/utils/GHC/SYB/Utils.hs / Thomas

I found the spot where the collected RdrNames are used to generate the unused import warnings, but I don't quite understand where they are gathered. Is there an AST traversal function somewhere that gathers these RdrNames? If so, I could use it as a blue print to write my own traversal. No, it won't be useful. It's currently done as a side effect by the renamer. One could add more, but I'm reluctant to do that. Better to design a separate traversal. Could you expand a little bit on this design? Is the idea that the Gather data type carries functions to apply in different parts of the AST? What's "occ" short for, OccName? What about "del"? Thomas had it right; it's just a particular kind of fold. The key parts of the traversal would be: * Occurrences. getExpr g (HsVar v) = g_occ g v * Combining. getExpr g (HsApp e1 e2) = g_union (getExpr g e1) (getExpr g e2) * Binding> getExpr (Lam v e) = g_del v (getExpr g e) Does that help? Maybe one could generalise a bit. There are different kind of ASTs (e.g. after renaming, after type checking, etc), which one should I use if I want to gather all qualified names? Well, the traversal function is polymorphic so you don't need to decide. But you need to decide when you apply it. When you say "gather all qualified names" do you really mean to gather only names that the programmer wrote qualified? That is, gather Prelude.map, but not map? If so, you need to traverse before renaming, because the renamer throws away the info of whether the user wrote the thing qualified or not. Simon

On Mon, Sep 20, 2010 at 12:47 PM, Simon Peyton-Jones
Thomas had it right; it’s just a particular kind of fold. The key parts of the traversal would be:
· Occurrences. getExpr g (HsVar v) = g_occ g v
Don't I need to work on LHsExpr rather than HsExpr? How would I otherwise get the location? getExpr :: Gather v res -> LHsExpr v -> res getExpr g (L loc e) = case e of HsVar v -> g_occ g (L loc v) HsApp e1 e2 -> g_union g (getExpr g e1) (getExpr g e2)
· Binding> getExpr (Lam v e) = g_del v (getExpr g e)
I still don't quite the purpose of this, is it to say the v is now shadowing whatever had the same name in the scope surrounding the lambda? Or is "g_del" intended to mean "g_declaration" and capture the declaration sites (and if so, how I know if it's a exported top-level declaration or not)?
Does that help? Maybe one could generalise a bit.
Definitely. Thanks for the pointers. Cheers, Johan

More hacking leads to more questions! IDs occur in many places in the AST and I'm not sure which ones I should record (by calling g_occ) during my traversal. Should I only gather the ones in HsVar or are there other IDs of interest? As I explained in my first email, I'm looking for occurrences of 1) imported identifiers and 2) top-level identifiers defined in the current module. Should I be recording all IDs I find and the try to filter them out in my particular definition of Gather? I'm also unsure which data types I should traverse, looking for LHsExpr. For example, LPat may contain view patterns that mention imported identifiers. Is there something in the HsExpr tree that I shouldn't traverse? Thanks! -- Johan

Don't I need to work on LHsExpr rather than HsExpr? How would I otherwise get the location? Yes, indeed, LHsExpr. * Binding> getExpr (Lam v e) = g_del v (getExpr g e) I still don't quite the purpose of this Well if you have (\x -> y + x) there is a free occurrence of 'y' but not 'x'. So take the occurrences in the body of the lambda and delete the ones for 'x', leaving only 'y'. In general, use g_del (maybe badly named? maybe g_bind?) whenever there's a binding site. IDs occur in many places in the AST and I'm not sure which ones I should record (by calling g_occ) during my traversal. Should I only gather the ones in HsVar or are there other IDs of interest? As I explained in my first email, I'm looking for occurrences of 1) imported identifiers and 2) top-level identifiers defined in the current module. Should I be recording all IDs I find and the try to filter them out in my particular definition of Gather? Well, your g_occ function can choose which occurrences to record. (For others it can return g_empty). The generic traversal doesn't need to know about that. Do Ids appear in many places? It seems to me to be mainly HsVar. And VarPat for patterns, which are binding sites. I'm also unsure which data types I should traverse, looking for LHsExpr. For example, LPat may contain view patterns that mention imported identifiers. Is there something in the HsExpr tree that I shouldn't traverse? For now I'd look everywhere. (But until the typechecker has run don't look in fields of type PlaceholderType) S

Johan:
I started writing a manual traversal of the RenamedSource AST (as I want qualified names) but I thought I check if I'm going about this right before I spend all the time required to write the traversal for the whole AST.
Manual traversal code on ASTs tends to consist mainly of boilerplate, so I'd like to emphasize Simon's suggestion: Simon:
You could even use generic programming to do it (all the Hs things are in class Data I think).
There is a page on the GHC wiki on generic AST traversals http://hackage.haskell.org/trac/ghc/wiki/GhcApiAstTraversals by now probably out of date (eg, the instances may now be available without needing ghc-syb, I think?), but it should get you started (using generic programming is a lot easier than trying to understand its types, you just need to make sure that your traversals avoid some holes in the GHC ASTs; there's a Data-based show variant that illustrates the idea), and you might be able to correct out-of-date info. Claus
participants (4)
-
Claus Reinke
-
Johan Tibell
-
Simon Peyton-Jones
-
Thomas Schilling