Why do Names defined in the current module lack a module name?

Hi, I'm writing an app that uses the GHC API to extract names from modules. I've written a little AST traversal that finds all Names in the type-checked AST and writes them to a file. I noticed that every Name defined in the current module (i.e. Names generated for top-level function definitions) lack a module name i.e. nameModule_maybe return Nothing. Why is this? Should I just assume that every time nameModule_maybe return Nothing the module name is in fact the name of the current module being compiled? -- Johan

Does this help http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NameType Something that starts top-level may not finish up as top-level. Nested bindings are never qualified. After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not. Does it matter? Simon From: ghc-devs-bounces@haskell.org [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Johan Tibell Sent: 28 March 2013 20:35 To: ghc-devs@haskell.org Subject: Why do Names defined in the current module lack a module name? Hi, I'm writing an app that uses the GHC API to extract names from modules. I've written a little AST traversal that finds all Names in the type-checked AST and writes them to a file. I noticed that every Name defined in the current module (i.e. Names generated for top-level function definitions) lack a module name i.e. nameModule_maybe return Nothing. Why is this? Should I just assume that every time nameModule_maybe return Nothing the module name is in fact the name of the current module being compiled? -- Johan

Hi Simon,
On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
Does this help
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NameType
A bit, but it's still not clear to me exactly when user defined exported entities will have full (i.e. including module) names.
Something that starts top-level may not finish up as top-level. Nested bindings are never qualified.
After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not.
Here's my example program: ```haskell module Test ( mysum ) where import Data.List (foldl') import Import (imported) mysum :: [Int] -> Int mysum xs = foldl' (+) imported xs ``` As you see it has a top-level exported thing (mysum). My problem is that I'm traversing the type-checked AST (i.e. returned by `typecheckedSource module`) trying to collect all the names so I can index them for a code search project I'm working on. It's a bit similar to the GHC ctags/etags tool, except I'm trying to index all the source code. So for every Name I run into in the source code I need to figure out what kind of name it is. That's made quite tricky by the fact that name resolution isn't actually quite done by the time we have the typed AST (i.e. mysum ought to have the name "Test.mysum", but it has the name "mysum"). I can try to implement this last resolution step myself, but then I need to understand how to identify names such as mysum above, while traversing the AST. -- Johan

I've just been checking. The top level name *is* (and remains) an External Name. See below.
Don't be misled by the dump labelled "Typechecker" (-ddump-tc). It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory
Simon
==================== Renamer ====================
Foo.mysum :: [Int] -> Int
Foo.mysum xs = foldl' (+) 0 xs
TYPE SIGNATURES
mysum :: [Int] -> Int
TYPE CONSTRUCTORS
COERCION AXIOMS
Dependent modules: []
Dependent packages: [base, ghc-prim, integer-gmp]
==================== Typechecker ====================
AbsBinds
[]
[]
[mysum <= mysum
<>]
mysum :: [Int] -> Int
[LclId]
mysum xs = foldl' ((+)) 0 xs
==================== Desugar (after optimization) ====================
Result size = 11
Foo.mysum :: [GHC.Types.Int] -> GHC.Types.Int
[LclIdX]
Foo.mysum =
\ (xs_abz :: [GHC.Types.Int]) ->
Data.List.foldl'
@ GHC.Types.Int
@ GHC.Types.Int
(GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumInt)
(GHC.Types.I# 0)
xs_abz
| -----Original Message-----
| From: Johan Tibell [mailto:johan.tibell@gmail.com]
| Sent: 02 April 2013 17:43
| To: Simon Peyton-Jones
| Cc: ghc-devs@haskell.org
| Subject: Re: Why do Names defined in the current module lack a module
| name?
|
| Hi Simon,
|
| On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
|

On Wed, Apr 3, 2013 at 5:14 AM, Simon Peyton-Jones
I've just been checking. The top level name *is* (and remains) an External Name. See below.
Don't be misled by the dump labelled "Typechecker" (-ddump-tc). It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory
I'm not dumping the names using a command line flag, but using a program that uses the GHC API that I wrote. This is how I use the GHC API to get hold of the AST once the source code has been compiled: forEachM z xs f = foldM f z xs indexSymbols :: ModuleGraph -> Ghc Builder.Builder indexSymbols graph = forEachM Builder.new graph $ \ builder ms -> do let filename = msHsFilePath ms handleSourceError printErrorAndExit $ do liftIO $ putStrLn ("Loading " ++ filename ++ " ...") mod <- loadModule =<< typecheckModule =<< parseModule ms case mod of _ | isBootSummary ms -> return builder _ | Just (group, _, _, _) <- renamedSource mod -> do let modname = moduleName $ ms_mod ms builder' = invertedIndex builder modname $ bagToList $ symbols group (typecheckedSource mod) -- ** HERE ** liftIO $ print $ builder' return $! builder' _ -> liftIO $ exitWith (ExitFailure 1) where printErrorAndExit e = do printException e liftIO $ exitWith (ExitFailure 1) So I invoke my AST traversal function 'symbols' (which just extracts all the Names in the AST into a Bag) on the AST returned by 'typecheckedSource mod'. This is the AST I expected to have fully qualified names but doesn't.

Johan Tibell
I've just been checking. The top level name *is* (and remains) an External Name. See below.
Don't be misled by the dump labelled "Typechecker" (-ddump-tc). It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory I'm not dumping the names using a command line flag, but using a
On Wed, Apr 3, 2013 at 5:14 AM, Simon Peyton-Jones
wrote: program that uses the GHC API that I wrote.
Simon's point is that GHC can print names in different ways. The verbosity of a printed name depends on the context in which it is printed (and -dppr-debug is a way to tell GHC that you want to get all info, including uniques etc).
This is how I use the GHC API to get hold of the AST once the source code has been compiled:
forEachM z xs f = foldM f z xs
indexSymbols :: ModuleGraph -> Ghc Builder.Builder indexSymbols graph = forEachM Builder.new graph $ \ builder ms -> do let filename = msHsFilePath ms handleSourceError printErrorAndExit $ do liftIO $ putStrLn ("Loading " ++ filename ++ " ...") mod <- loadModule =<< typecheckModule =<< parseModule ms case mod of _ | isBootSummary ms -> return builder _ | Just (group, _, _, _) <- renamedSource mod -> do let modname = moduleName $ ms_mod ms builder' = invertedIndex builder modname $ bagToList $ symbols group (typecheckedSource mod) -- ** HERE ** liftIO $ print $ builder' return $! builder' _ -> liftIO $ exitWith (ExitFailure 1) where printErrorAndExit e = do printException e liftIO $ exitWith (ExitFailure 1)
So I invoke my AST traversal function 'symbols' (which just extracts all the Names in the AST into a Bag) on the AST returned by 'typecheckedSource mod'. This is the AST I expected to have fully qualified names but doesn't.
I'm not sure what your 'invertedIndex' does. You seem to use the 'Show' instance via 'print'. However, I don't think 'Name' has a 'Show' instance. So, I wonder how you convert the 'Name's to 'String's. In GHC, we do the following, use 'ppr' to convert an entity to an 'SDoc' and then when the 'SDoc' gets turned into a 'String', the dynamic flags determine what detail of a 'Name' to print. You can do this in one step with 'showPpr' http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/Outputable.h..., but you need to supply some 'DynFlags' http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/DynFlags.htm.... Manuel

On Wed, Apr 3, 2013 at 5:42 PM, Manuel M T Chakravarty
Simon's point is that GHC can print names in different ways. The verbosity of a printed name depends on the context in which it is printed (and -dppr-debug is a way to tell GHC that you want to get all info, including uniques etc).
I understand that. I think it's unrelated to my problem however. moduleName_maybe, the function that's supposed to give me the module of a Name returns Nothing for all locally defined names. My guess is that these Names aren't made fully qualified until some later stage.
I'm not sure what your 'invertedIndex' does. You seem to use the 'Show' instance via 'print'. However, I don't think 'Name' has a 'Show' instance. So, I wonder how you convert the 'Name's to 'String's.
invertedIndex isn't interesting in this case. Briefly, it takes a Bag of Names created by the 'symbols' function and build an inverted index (a search engine term) from the names. I derived Show instances for all GHC types for debugging.
In GHC, we do the following, use 'ppr' to convert an entity to an 'SDoc' and then when the 'SDoc' gets turned into a 'String', the dynamic flags determine what detail of a 'Name' to print. You can do this in one step with 'showPpr' http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/Outputable.h..., but you need to supply some 'DynFlags' http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/DynFlags.htm....

| I understand that. I think it's unrelated to my problem however. | moduleName_maybe, the function that's supposed to give me the module | of a Name returns Nothing for all locally defined names. My guess is | that these Names aren't made fully qualified until some later stage. I think there is a danger of speaking at cross purposes here. In the example I sent, you can see that the module name is printed, in the output of the renamer, typechecker, and desugarer. (You need -dppr-debug to see this in the -ddump-tc output, for reasons I mentioned; perhaps that should be changed.) I would be very surprised if nameModule_maybe returned Nothing for a Name that printed with its module name. Yet it appears that you are saying that a Name that prints with its module name with -ddump-ds or -ddump-tc replies to nameModule_maybe with Nothing I don't understand that. I suppose that the only way to do so is for Johan to send a reproducible test case; or perhaps I have misunderstood the question. Simon

On Thu, Apr 4, 2013 at 1:44 AM, Simon Peyton-Jones
I would be very surprised if nameModule_maybe returned Nothing for a Name that printed with its module name. Yet it appears that you are saying that a Name that prints with its module name with -ddump-ds or -ddump-tc replies to nameModule_maybe with Nothing I don't understand that.
Thanks Simon. This is progress, because now I know that this is not expected behavior, so I can try to figure out if I'm doing something wrong and if not send you a test case. :)

Simon, it looks like I'm running into problems similar to those Edsko described in the email "Top-level type signatures in TcGblEnv?" to ghc-devs a while ago. The complete information about a source file is spread out over several different ASTs.

Johan I'd like to help but I don't have a clear enough information of the problem you are trying to solve, or the actual concrete problem you are encountering, to give a meaningful response. If you are ok, fine; if you want me to help specifically then Skype, a concrete test case, ... something Simon | -----Original Message----- | From: Johan Tibell [mailto:johan.tibell@gmail.com] | Sent: 04 April 2013 18:46 | To: Simon Peyton-Jones | Cc: Manuel M T Chakravarty; ghc-devs@haskell.org | Subject: Re: Why do Names defined in the current module lack a module | name? | | Simon, it looks like I'm running into problems similar to those Edsko | described in the email "Top-level type signatures in TcGblEnv?" to ghc- | devs a while ago. The complete information about a source file is spread | out over several different ASTs.

Simon, I've created a small standalone test case here: https://gist.github.com/tibbe/5321268 Usage: Download the Main.hs and Test.hs files into the same directory: https://gist.github.com/tibbe/5321268/raw/68537a79865209d7218429b916a7ab1cce... https://gist.github.com/tibbe/5321268/raw/f9cce61b35b3349eeeac622a4c27c1dd5e... Compile and run: ghc Main.hs ./Main Expected output: Loading Test.hs ... ["Test.mysum","Data.List.foldl'","GHC.Num.+"] Actual output: Loading Test.hs ... [".mysum","Data.List.foldl'","GHC.Num.+"] Note how the locally defined and exported function mysum lacks a module name. The code that creates a string from the Name is on line 59 in Main.hs: nameToString name = (maybe "" (moduleNameString . moduleName) . nameModule_maybe $ name) ++ "." ++ getOccString name The module name is missing in the output because nameModule_maybe returns Nothing for the mysum Name. -- Johan

Aha. You should be looking in the abs_exports field of AbsBinds, not in abs_binds. I've added some (long-needed) comments to HsBinds, which I append below. I hope that helps. Simon Note [AbsBinds] ~~~~~~~~~~~~~~~ The AbsBinds constructor is used in the output of the type checker, to record *typechecked* and *generalised* bindings. Consider a module M, with this top-level binding M.reverse [] = [] M.reverse (x:xs) = M.reverse xs ++ [x] In Hindley-Milner, a recursive binding is typechecked with the *recursive* uses being *monomorphic*. So after typechecking *and* deugaring we will get something like this M.reverse :: forall a. [a] -> [a] = /\a. letrec reverse :: [a] -> [a] = \xs -> case xs of [] -> [] (x:xs) -> reverse xs ++ [x] in reverse Notice that 'M.reverse' is polymorphic as expected, but there is a local defintion for plain 'reverse' which is *monomorphic*. The type variable 'a' scopes over the entire letrec. That's after desugaring. What about after type checking but before desugaring? That's where AbsBinds comes in. It looks like this: AbsBinds { abs_tvs = [a] , abs_exports = [ABE { abe_poly = M.reverse :: forall a. [a] -> [a], , abe_mono = reverse :: a -> a}] , abs_binds = { reverse :: [a] -> [a] = \xs -> case xs of [] -> [] (x:xs) -> reverse xs ++ [x] } } Here, * abs_tvs says what type variables are abstracted over the binding group, just 'a' in this case. * abs_binds is the *monomorphic* bindings of the group * abs_exports describes how to get the polymorphic Id 'M.reverse' from the monomorphic one 'reverse' Notice that the *original* function (the polymorphic one you thought you were defining) appears in the abe_poly field of the abs_exports. The bindings in abs_binds are for fresh, local, Ids with a *monomorphic* Id. If there is a group of mutually recusive functions without type signatures, we get one AbsBinds with the monomorphic versions of the bindings in abs_binds, and one element of abe_exports for each variable bound in the mutually recursive group. This is true even for pattern bindings. Example: (f,g) = (\x -> x, f) After type checking we get AbsBinds { abs_tvs = [a] , abs_exports = [ ABE { abe_poly = M.f :: forall a. a -> a , abe_mono = f :: a -> a } , ABE { abe_poly = M.g :: forall a. a -> a , abe_mono = g :: a -> a }] , abs_binds = { (f,g) = (\x -> x, f) } | -----Original Message----- | From: Johan Tibell [mailto:johan.tibell@gmail.com] | Sent: 05 April 2013 18:58 | To: Simon Peyton-Jones | Cc: Manuel M T Chakravarty; ghc-devs@haskell.org | Subject: Re: Why do Names defined in the current module lack a module name? | | Simon, | | I've created a small standalone test case here: | https://gist.github.com/tibbe/5321268 | | Usage: | | Download the Main.hs and Test.hs files into the same directory: | | https://gist.github.com/tibbe/5321268/raw/68537a79865209d7218429b916a7ab | 1ccebc18e4/Main.hs | https://gist.github.com/tibbe/5321268/raw/f9cce61b35b3349eeeac622a4c27c1dd | 5e0410cf/Test.hs | | Compile and run: | ghc Main.hs | ./Main | | Expected output: | Loading Test.hs ... | ["Test.mysum","Data.List.foldl'","GHC.Num.+"] | | Actual output: | Loading Test.hs ... | [".mysum","Data.List.foldl'","GHC.Num.+"] | | Note how the locally defined and exported function mysum lacks a | module name. The code that creates a string from the Name is on line | 59 in Main.hs: | | nameToString name = (maybe "" (moduleNameString . moduleName) . | nameModule_maybe $ name) | ++ "." ++ getOccString name | | The module name is missing in the output because nameModule_maybe | returns Nothing for the mysum Name. | | -- Johan

Johan Tibell
On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
wrote: A bit, but it's still not clear to me exactly when user defined exported entities will have full (i.e. including module) names. Something that starts top-level may not finish up as top-level. Nested bindings are never qualified.
After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not.
Here's my example program:
```haskell module Test ( mysum ) where
import Data.List (foldl')
import Import (imported)
mysum :: [Int] -> Int mysum xs = foldl' (+) imported xs ```
As you see it has a top-level exported thing (mysum). My problem is that I'm traversing the type-checked AST (i.e. returned by `typecheckedSource module`) trying to collect all the names so I can index them for a code search project I'm working on. It's a bit similar to the GHC ctags/etags tool, except I'm trying to index all the source code.
So for every Name I run into in the source code I need to figure out what kind of name it is. That's made quite tricky by the fact that name resolution isn't actually quite done by the time we have the typed AST (i.e. mysum ought to have the name "Test.mysum", but it has the name "mysum"). I can try to implement this last resolution step myself, but then I need to understand how to identify names such as mysum above, while traversing the AST.
I'm not sure what information you are trying to collect, but if you can traverse the Core program after TidyPgm instead of the type checked AST, that would have all names cleaned up (as Simon wrote) and still have valid source positions. Otherwise, there are predicates on 'Id's and 'Name's that determine their categories, e.g., 'isExportedId'. Just keep in mind that in GHC-speak a "local" 'Id' is one that is defined in the currently compiled module. It may still be a top-level 'Id' that is exported. (Not your classic definition of "local", I guess ;) Manuel
participants (3)
-
Johan Tibell
-
Manuel M T Chakravarty
-
Simon Peyton-Jones