
Hello all, I am writing an interactive tool using the ghc api. It is able to load and typecheck a source file in a user's package. I obtain the flags that cabal uses to compile the user's package via the hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool, then I `setTargets' all the home modules (with targetAllowObjCode=True). I use HscNothing and NoLink because I only want access to the trees, I don't want to produce any output files. For the file that I wish to inspect, I `removeTarget' the module and `addTarget` it again but this time providing the full path to the file and don't allow object code. Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume that the file under inspection only has a module definiton and no imports or top levels. Functionally, my code is working great and I am able to do what I want with the typechecked tree. However, load is very slow (~10 seconds user time) on large projects. Here is a cpu time trace of my program (milliseconds): main 1 parse flags 93 load 20436 typecheck 20437 I can enable a bit more ghc timing info via -Rghc-timings and I see !!! Chasing dependencies: finished in 157.20 milliseconds, allocated 528.112 megabytes This seems fine, anything sub-second is ok. But then I see a bunch of home modules in CodeGen that I was not expecting: !!! CodeGen [My.Module.Dependency]: finished in 3335.62 milliseconds, allocated 270.615 megabytes So it looks like the targetAllowObjCode is being ignored... is there any way to force it? Actually I'd prefer to fail fast than to ever compile or codegen a dependency module. I know that it should be possible to load the module a lot faster because if I make a small change in the file under inspection and ask cabal to recompile the module it is super fast (less than a second). Could somebody who understands how incremental/partial compiles work please help me out? PS: If this textual description is confusing, I could put together a minimal reproduction and example project but it will take me some time to do that. -- Best regards, Sam

I think the only path for loading a dependency that doesn't involve loading
object code of some kind is the {-# SOURCE #-} hack as part of .hs-boot
files, which isn't general enough to be reused here as I understand it. A
decent chunk of the compiler would need to be duplicated to avoid this, and
it might use a fair amount of memory and end up generating at least part of
the object into memory.
Also recall that if any TH or quasiquotation is involved, it'll need to
load object code in support of that; and it might well need to prepare for
this in the general case rather than again having to duplicate a bunch of
code to support different no-TH and TH paths.
Cabal will build all that stuff the first time and then reuse it the next,
so it's not quite the same thing. Since you told ghc no object code, it
discards what it generates here and may not use existing compiled modules;
or you may have specified settings incompatible with any it did find.
In short, you may want to rethink this; ghc is a compiler, not an IDE, and
doesn't quite work the way you had hoped.
On Tue, Oct 8, 2019 at 10:15 AM Sam Halliday
Hello all,
I am writing an interactive tool using the ghc api. It is able to load and typecheck a source file in a user's package.
I obtain the flags that cabal uses to compile the user's package via the hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool, then I `setTargets' all the home modules (with targetAllowObjCode=True).
I use HscNothing and NoLink because I only want access to the trees, I don't want to produce any output files.
For the file that I wish to inspect, I `removeTarget' the module and `addTarget` it again but this time providing the full path to the file and don't allow object code.
Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume that the file under inspection only has a module definiton and no imports or top levels.
Functionally, my code is working great and I am able to do what I want with the typechecked tree.
However, load is very slow (~10 seconds user time) on large projects. Here is a cpu time trace of my program (milliseconds):
main 1 parse flags 93 load 20436 typecheck 20437
I can enable a bit more ghc timing info via -Rghc-timings and I see
!!! Chasing dependencies: finished in 157.20 milliseconds, allocated 528.112 megabytes
This seems fine, anything sub-second is ok.
But then I see a bunch of home modules in CodeGen that I was not expecting:
!!! CodeGen [My.Module.Dependency]: finished in 3335.62 milliseconds, allocated 270.615 megabytes
So it looks like the targetAllowObjCode is being ignored... is there any way to force it? Actually I'd prefer to fail fast than to ever compile or codegen a dependency module.
I know that it should be possible to load the module a lot faster because if I make a small change in the file under inspection and ask cabal to recompile the module it is super fast (less than a second).
Could somebody who understands how incremental/partial compiles work please help me out?
PS: If this textual description is confusing, I could put together a minimal reproduction and example project but it will take me some time to do that.
-- Best regards, Sam _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- brandon s allbery kf8nh allbery.b@gmail.com

Thanks Brandon,
Brandon Allbery
Cabal will build all that stuff the first time and then reuse it the next, so it's not quite the same thing. Since you told ghc no object code,
Sorry, I meant that I used targetAllowObjCode=True for everything, except the file under inspection. Do you mean that if I used targetAllowObjCode=False for just one module it will invalidate the object code for everything it depends on? That is unexpected.
In short, you may want to rethink this; ghc is a compiler, not an IDE, and doesn't quite work the way you had hoped.
How would you suggest rethinking it? Bare in mind that the api is working exactly the way I want from a functional point of view (just slow) with HscNothing... and seems to work exactly the way I want with HscInterpreted (but with all the ghci caveats like unboxed tuples etc). -- Best regards, Sam

It's doing what you — but not ghc — consider "extra work", though. ghc
expects to be compiling code, and doesn't have a separate code path for
"load symbols from an external module by parsing its source code" instead
of "load symbols from an external module by loading its .hsc file and
object code", aside from HscInterpreted.
On Tue, Oct 8, 2019 at 10:37 AM Sam Halliday
Thanks Brandon,
Brandon Allbery
writes: Cabal will build all that stuff the first time and then reuse it the next, so it's not quite the same thing. Since you told ghc no object code,
Sorry, I meant that I used targetAllowObjCode=True for everything, except the file under inspection. Do you mean that if I used targetAllowObjCode=False for just one module it will invalidate the object code for everything it depends on? That is unexpected.
In short, you may want to rethink this; ghc is a compiler, not an IDE, and doesn't quite work the way you had hoped.
How would you suggest rethinking it? Bare in mind that the api is working exactly the way I want from a functional point of view (just slow) with HscNothing... and seems to work exactly the way I want with HscInterpreted (but with all the ghci caveats like unboxed tuples etc).
-- Best regards, Sam
-- brandon s allbery kf8nh allbery.b@gmail.com

Brandon Allbery
It's doing what you — but not ghc — consider "extra work", though. ghc expects to be compiling code, and doesn't have a separate code path for "load symbols from an external module by parsing its source code" instead of "load symbols from an external module by loading its .hsc file and object code", aside from HscInterpreted.
I'm confused: it sounds like you saying that only HscInterpreted can load symbols of dependencies from object code. Then how does cabal+ghc do this when I make a change to one file in my project and do a recompile of the package? BTW, I am seeing modules going through CodeGen that are not part of the file's dependency graph... LoadUpTo is behaving more like LoadAll. -- Best regards, Sam

It reuses the .hi files already built for other modules. Those aren't in
the source directory but under a build directory. If they don't exist
there, it will build the dependencies to create them.
On Tue, Oct 8, 2019 at 10:57 AM Sam Halliday
Brandon Allbery
writes: It's doing what you — but not ghc — consider "extra work", though. ghc expects to be compiling code, and doesn't have a separate code path for "load symbols from an external module by parsing its source code" instead of "load symbols from an external module by loading its .hsc file and object code", aside from HscInterpreted.
I'm confused: it sounds like you saying that only HscInterpreted can load symbols of dependencies from object code. Then how does cabal+ghc do this when I make a change to one file in my project and do a recompile of the package?
BTW, I am seeing modules going through CodeGen that are not part of the file's dependency graph... LoadUpTo is behaving more like LoadAll.
-- Best regards, Sam
-- brandon s allbery kf8nh allbery.b@gmail.com

Brandon Allbery
It reuses the .hi files already built for other modules. Those aren't in the source directory but under a build directory. If they don't exist there, it will build the dependencies to create them.
The .hi files exist in the target directory and my tool has informed the ghc api about that location, but it's not using them and I don't know why... I guess I'm asking "how can I make ghc use the .hi files instead of compiling the .hs files?". It seems to work fine when I use HscInterpreted instead of HscNothing. BTW I tried using targetAllowObjCode=True for everything, but it makes no difference. -- Best regards, Sam

A quick follow-up to this, Rahul Muttinieni gave me some advice to try
out
HscInterpreted / LinkInMemory
instead of
HscNothing / NoLink
and now I am no longer seeing home modules being compiled, and
everything is a lot faster. Woohoo!
But I have no idea why this speeds things up... my code isn't using
TemplateHaskell so HscNothing should really mean "don't do any codegen".
Something is causing the HscNothing to be ignored. I'd still really like
to get to the bottom of this so if anybody knows how the batch compiler
is able to avoid recompiling home modules then please let me know... I
would like to continue using HscNothing instead of HscInterpreted.
Sam Halliday
Hello all,
I am writing an interactive tool using the ghc api. It is able to load and typecheck a source file in a user's package.
I obtain the flags that cabal uses to compile the user's package via the hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool, then I `setTargets' all the home modules (with targetAllowObjCode=True).
I use HscNothing and NoLink because I only want access to the trees, I don't want to produce any output files.
For the file that I wish to inspect, I `removeTarget' the module and `addTarget` it again but this time providing the full path to the file and don't allow object code.
Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume that the file under inspection only has a module definiton and no imports or top levels.
Functionally, my code is working great and I am able to do what I want with the typechecked tree.
However, load is very slow (~10 seconds user time) on large projects. Here is a cpu time trace of my program (milliseconds):
main 1 parse flags 93 load 20436 typecheck 20437
I can enable a bit more ghc timing info via -Rghc-timings and I see
!!! Chasing dependencies: finished in 157.20 milliseconds, allocated 528.112 megabytes
This seems fine, anything sub-second is ok.
But then I see a bunch of home modules in CodeGen that I was not expecting:
!!! CodeGen [My.Module.Dependency]: finished in 3335.62 milliseconds, allocated 270.615 megabytes
So it looks like the targetAllowObjCode is being ignored... is there any way to force it? Actually I'd prefer to fail fast than to ever compile or codegen a dependency module.
I know that it should be possible to load the module a lot faster because if I make a small change in the file under inspection and ask cabal to recompile the module it is super fast (less than a second).
Could somebody who understands how incremental/partial compiles work please help me out?
PS: If this textual description is confusing, I could put together a minimal reproduction and example project but it will take me some time to do that.
-- Best regards, Sam
-- Best regards, Sam

Are you writing interface files (-fwrite-interface)? It makes no sense
for HscInterpreted to be faster than HscNothing.
Cheers,
Matt
On Tue, Oct 8, 2019 at 3:30 PM Sam Halliday
A quick follow-up to this, Rahul Muttinieni gave me some advice to try out
HscInterpreted / LinkInMemory
instead of
HscNothing / NoLink
and now I am no longer seeing home modules being compiled, and everything is a lot faster. Woohoo!
But I have no idea why this speeds things up... my code isn't using TemplateHaskell so HscNothing should really mean "don't do any codegen". Something is causing the HscNothing to be ignored. I'd still really like to get to the bottom of this so if anybody knows how the batch compiler is able to avoid recompiling home modules then please let me know... I would like to continue using HscNothing instead of HscInterpreted.
Sam Halliday
writes: Hello all,
I am writing an interactive tool using the ghc api. It is able to load and typecheck a source file in a user's package.
I obtain the flags that cabal uses to compile the user's package via the hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool, then I `setTargets' all the home modules (with targetAllowObjCode=True).
I use HscNothing and NoLink because I only want access to the trees, I don't want to produce any output files.
For the file that I wish to inspect, I `removeTarget' the module and `addTarget` it again but this time providing the full path to the file and don't allow object code.
Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume that the file under inspection only has a module definiton and no imports or top levels.
Functionally, my code is working great and I am able to do what I want with the typechecked tree.
However, load is very slow (~10 seconds user time) on large projects. Here is a cpu time trace of my program (milliseconds):
main 1 parse flags 93 load 20436 typecheck 20437
I can enable a bit more ghc timing info via -Rghc-timings and I see
!!! Chasing dependencies: finished in 157.20 milliseconds, allocated 528.112 megabytes
This seems fine, anything sub-second is ok.
But then I see a bunch of home modules in CodeGen that I was not expecting:
!!! CodeGen [My.Module.Dependency]: finished in 3335.62 milliseconds, allocated 270.615 megabytes
So it looks like the targetAllowObjCode is being ignored... is there any way to force it? Actually I'd prefer to fail fast than to ever compile or codegen a dependency module.
I know that it should be possible to load the module a lot faster because if I make a small change in the file under inspection and ask cabal to recompile the module it is super fast (less than a second).
Could somebody who understands how incremental/partial compiles work please help me out?
PS: If this textual description is confusing, I could put together a minimal reproduction and example project but it will take me some time to do that.
-- Best regards, Sam
-- Best regards, Sam _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Matthew Pickering
Are you writing interface files (-fwrite-interface)? It makes no sense for HscInterpreted to be faster than HscNothing.
Nope, not writing anything like that (I just checked the ghc flags from hie-bios to confirm)... and I agree that this makes no sense. -- Best regards, Sam

I already mentioned needing .hi (I may have said hsc, whoops; Haskell
Interface files) from dependencies; you really want to turn that part on,
at least. And possibly ensure your other options are compatible with
existing .hi files, so they can be loaded directly. I think the .o isn't
used until link time, which should be irrelevant for you; but you really do
want those .hi files, otherwise it must compile the dependency module to
generate one.
On Tue, Oct 8, 2019 at 10:51 AM Sam Halliday
Matthew Pickering
writes: Are you writing interface files (-fwrite-interface)? It makes no sense for HscInterpreted to be faster than HscNothing.
Nope, not writing anything like that (I just checked the ghc flags from hie-bios to confirm)... and I agree that this makes no sense.
-- Best regards, Sam _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- brandon s allbery kf8nh allbery.b@gmail.com

Brandon Allbery
you really do want those .hi files, otherwise it must compile the dependency module to generate one.
Right, exactly! But I thought that's what targetAllowObjCode=True was doing, is it not? Is there another setting that I'm missing? Should I use that for all my modules and not just the dependencies? -- Best regards, Sam

If they are loading each other, they likewise need .hi files. .o files are
optional if you aren't linking them.
On Tue, Oct 8, 2019 at 10:59 AM Sam Halliday
Brandon Allbery
writes: you really do want those .hi files, otherwise it must compile the dependency module to generate one.
Right, exactly! But I thought that's what targetAllowObjCode=True was doing, is it not? Is there another setting that I'm missing?
Should I use that for all my modules and not just the dependencies?
-- Best regards, Sam
-- brandon s allbery kf8nh allbery.b@gmail.com

I'm not sure if I'm doing the same thing as you, but I use a GHC repl
for my program. It loads a 200-300 modules in under a second, and is
able to reload changed ones dynamically, just like ghci.
The source is https://github.com/elaforge/karya/blob/work/Cmd/ReplGhc.hs,
see 'parse_flags' and its call in 'interpreter'.
The main thing is getting ghc to load the .o files, but if ghci will
do it, then the ghc API will do it. You just have to get the flags to
be the same, and ghc is pretty opaque about why it doesn't want to
load. There is a -ddump-something flag but it doesn't say what flags
actually changed. I actually wound up patching ghc to add that
feature.
On Tue, Oct 8, 2019 at 7:15 AM Sam Halliday
Hello all,
I am writing an interactive tool using the ghc api. It is able to load and typecheck a source file in a user's package.
I obtain the flags that cabal uses to compile the user's package via the hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool, then I `setTargets' all the home modules (with targetAllowObjCode=True).
I use HscNothing and NoLink because I only want access to the trees, I don't want to produce any output files.
For the file that I wish to inspect, I `removeTarget' the module and `addTarget` it again but this time providing the full path to the file and don't allow object code.
Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume that the file under inspection only has a module definiton and no imports or top levels.
Functionally, my code is working great and I am able to do what I want with the typechecked tree.
However, load is very slow (~10 seconds user time) on large projects. Here is a cpu time trace of my program (milliseconds):
main 1 parse flags 93 load 20436 typecheck 20437
I can enable a bit more ghc timing info via -Rghc-timings and I see
!!! Chasing dependencies: finished in 157.20 milliseconds, allocated 528.112 megabytes
This seems fine, anything sub-second is ok.
But then I see a bunch of home modules in CodeGen that I was not expecting:
!!! CodeGen [My.Module.Dependency]: finished in 3335.62 milliseconds, allocated 270.615 megabytes
So it looks like the targetAllowObjCode is being ignored... is there any way to force it? Actually I'd prefer to fail fast than to ever compile or codegen a dependency module.
I know that it should be possible to load the module a lot faster because if I make a small change in the file under inspection and ask cabal to recompile the module it is super fast (less than a second).
Could somebody who understands how incremental/partial compiles work please help me out?
PS: If this textual description is confusing, I could put together a minimal reproduction and example project but it will take me some time to do that.
-- Best regards, Sam _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Thanks Evan, Evan Laforge writes:
Yes what you're doing is very similar. I'm also adding args_left as modules because they tend to be RTS and home modules. But it looks like you always use False for object code. You're also using ghcMode so I'll investigate if I need to force a mode.
ghc is pretty opaque about why it doesn't want to load.
:-D -- Best regards, Sam
participants (4)
-
Brandon Allbery
-
Evan Laforge
-
Matthew Pickering
-
Sam Halliday