GHC-generated executables size

Hi, I was playing around with ghc again, and I was wondering what makes the executables so large and how I could make them smaller (strip works, but is there anything more I can do?) More specifically, I am compiling a program that uses the GTK+ bindings, HDBC, and some things from Prelude. The program simply displays a window, and reads and writes values from/into a data base file. Not much, really. Anyway, the program size is 20MB without stripping, and 10MB after stripping ... Any hints? Thanks and good night for now .. Christian

Hi Christian,
I was playing around with ghc again, and I was wondering what makes the executables so large and how I could make them smaller (strip works, but is there anything more I can do?) More specifically, I am compiling a program that uses the GTK+ bindings, HDBC, and some things from Prelude. The program simply displays a window, and reads and writes values from/into a data base file. Not much, really. Anyway, the program size is 20MB without stripping, and 10MB after stripping ...
GHC links statically. There is some effort to bring dynamic linking to GHC: http://hackage.haskell.org/trac/ghc/wiki/SharedLibraries/PlatformSupport?red... You'll have to ask more knowledgeable people about its status and if it's recommended for current projects. Since the GTK libraries are so vast, even including one call (which forces the libs to be linked into the executable) will dramatically increase the statically linked executable's size. Regards, Aleks

On Saturday 16 October 2010 12:26:00, . wrote:
Hi, I was playing around with ghc again, and I was wondering what makes the executables so large and how I could make them smaller (strip works, but is there anything more I can do?) More specifically, I am compiling a program that uses the GTK+ bindings, HDBC, and some things from Prelude. The program simply displays a window, and reads and writes values from/into a data base file. Not much, really. Anyway, the program size is 20MB without stripping, and 10MB after stripping ...
Any hints?
Two things spring to mind (in addition to the static linking mentioned by Aleksandar). 1) If you didn't compile the packages with -split-objs, when you use one function from a module, the entire object file for the module is linked in. For packages with many modules or many dependencies, that adds up pretty fast. If you set split-objs: True in your ~/.cabal/config, packages installed via cabal-install (the cabal executable) will be built with -split-objs and only the needed functions will be linked in (at least if you compile your programmes with optimisations, I don't know whether -O0 uses split object files or the monolithic ones). (Downside: building the packages takes longer, duh; and you need more disk space for monolithic+split object files, duh again). 2) If it's not (only) that, it's probably the same effect as discussed in http://hackage.haskell.org/trac/ghc/ticket/4387 Simon (PJ) says: "Every module has a module-initialisation routine. Apart from initialising the module, it calls the module-initialisation routine for each imported module. So if M imports module SpecConstr from package ghc, then the module-initialisatin routine for M will call the initialisation routine for SpecConstr. Even though nothing from SpecConstr is ultimately used." So if you import a module (you don't even need to use anything from it) which transitively imports a lot of modules, you get a ton of module- initialisation routines. People are thinking about how to handle this best (since it affects the vector package, on which a lot of other packages depend, it's not unimportant).
Thanks and good night for now .. Christian

Hi Daniel, thanks for the explanations. I have tried reinstalling with cabal --reinstall gtk, having set split-objs: True in ~/.cabal/config before. However, the compile yielded a single .o file again, and recompiling and re-linking my little program does not change its size ... Any other idea what I might be doing wrong? Also, I saw in the ghc documentatio about -split-objs that one should only use it "if you know exactly what you're doing". Do you know what the caveats are? Thanks again, Christian On Sat, 2010-10-16 at 13:23 +0200, Daniel Fischer wrote:
On Saturday 16 October 2010 12:26:00, . wrote:
Hi, I was playing around with ghc again, and I was wondering what makes the executables so large and how I could make them smaller (strip works, but is there anything more I can do?) More specifically, I am compiling a program that uses the GTK+ bindings, HDBC, and some things from Prelude. The program simply displays a window, and reads and writes values from/into a data base file. Not much, really. Anyway, the program size is 20MB without stripping, and 10MB after stripping ...
Any hints?
Two things spring to mind (in addition to the static linking mentioned by Aleksandar).
1) If you didn't compile the packages with -split-objs, when you use one function from a module, the entire object file for the module is linked in. For packages with many modules or many dependencies, that adds up pretty fast.
If you set
in your ~/.cabal/config, packages installed via cabal-install (the cabal executable) will be built with -split-objs and only the needed functions will be linked in (at least if you compile your programmes with optimisations, I don't know whether -O0 uses split object files or the monolithic ones). (Downside: building the packages takes longer, duh; and you need more disk space for monolithic+split object files, duh again).
2) If it's not (only) that, it's probably the same effect as discussed in http://hackage.haskell.org/trac/ghc/ticket/4387
Simon (PJ) says: "Every module has a module-initialisation routine. Apart from initialising the module, it calls the module-initialisation routine for each imported module. So if M imports module SpecConstr from package ghc, then the module-initialisatin routine for M will call the initialisation routine for SpecConstr. Even though nothing from SpecConstr is ultimately used."
So if you import a module (you don't even need to use anything from it) which transitively imports a lot of modules, you get a ton of module- initialisation routines. People are thinking about how to handle this best (since it affects the vector package, on which a lot of other packages depend, it's not unimportant).
Thanks and good night for now .. Christian

On Saturday 16 October 2010 15:03:52, . wrote:
Hi Daniel, thanks for the explanations. I have tried reinstalling with cabal --reinstall gtk, having set split-objs: True in ~/.cabal/config before.
However, the compile yielded a single .o file again,
Yes, there's the single .o file for the package, the question is, what's in the archive (libHSxxx.a). Rule of thumb, if it's more than twice as large as the HSxxx.o, it contains the split object files (you can verify by looking at it with nm).
and recompiling and re-linking my little program does not change its size ...
Possibly the packages have been built with -spilt-objs originally, or your GHC doesn't support split-objs, in either case, sorry for the inconvenience :-/ gtk contains apparently hundreds of thousands of modules, so it may be the module-initialisation functions. Quick test: $ nm yourexecutable | grep stginit | wc -l
Any other idea what I might be doing wrong?
Nothing, probably. What ghc version are you using, on which platform (OS, arch), which packages does your executable need?
Also, I saw in the ghc documentatio about -split-objs that one should only use it "if you know exactly what you're doing". Do you know what the caveats are?
Doesn't work on all platforms (Sparc seems to have some problems iirc), slower compile times. That's what I'm aware of.
Thanks again, Christian

On Sat, 2010-10-16 at 21:09 +0200, Daniel Fischer wrote:
On Saturday 16 October 2010 15:03:52, . wrote:
Hi Daniel, thanks for the explanations. I have tried reinstalling with cabal --reinstall gtk, having set split-objs: True in ~/.cabal/config before.
However, the compile yielded a single .o file again,
Yes, there's the single .o file for the package, the question is, what's in the archive (libHSxxx.a). Rule of thumb, if it's more than twice as large as the HSxxx.o, it contains the split object files (you can verify by looking at it with nm).
Ah .. ok, stupid me ;) the archive is more than twice the size, and ar tells me about a lot of object files inside it. So compiling with -split-objs appears to have worked out. I have seen, however, that I have 2 versions of gtk (0.11.2 and 0.11.0) on the system. I will try to find out which one was actually used when I called "ghc --make guiMain"; how would I go about that?
and recompiling and re-linking my little program does not change its size ...
Possibly the packages have been built with -spilt-objs originally, or your GHC doesn't support split-objs, in either case, sorry for the inconvenience :-/
gtk contains apparently hundreds of thousands of modules, so it may be the module-initialisation functions. Quick test:
$ nm yourexecutable | grep stginit | wc -l
I tried that, there are almost 900 matches.
Any other idea what I might be doing wrong?
Nothing, probably.
What ghc version are you using, on which platform (OS, arch), which packages does your executable need?
I am using Ubuntu 10.10, ghc 6.12.1, on a 4-core AMD Phenom. The program I am compiling needs these packages: gtk, Text.Parsec, Time, Database.HDBC.Sqlite3, Locale, Data.ByteString.Lazy. --Christian

On Saturday 16 October 2010 21:32:27, . wrote:
I have seen, however, that I have 2 versions of gtk (0.11.2 and 0.11.0) on the system. I will try to find out which one was actually used when I called "ghc --make guiMain"; how would I go about that?
Normally, ghc picks the later version. When compiling, you can see which version is used by giving the appropriate verbosity flag (-v2 or greater). After the fact, the only way I know is $ nm executable | grep stginit_gtk | more (or less) and then you'll see the z-encoded package version, lines like 080ca5dc T __stginit_gtkzm0zi11zi2_Stuff for gtk-0.11.2.
and recompiling and re-linking my little program does not change its size ...
Possibly the packages have been built with -spilt-objs originally, or your GHC doesn't support split-objs, in either case, sorry for the inconvenience
:-/
gtk contains apparently hundreds of thousands of modules, so it may be the module-initialisation functions. Quick test:
$ nm yourexecutable | grep stginit | wc -l
I tried that, there are almost 900 matches.
Hm, shouldn't nearly take you to 20M.
Any other idea what I might be doing wrong?
Nothing, probably.
What ghc version are you using, on which platform (OS, arch), which packages does your executable need?
I am using Ubuntu 10.10, ghc 6.12.1, on a 4-core AMD Phenom. The program I am compiling needs these packages: gtk, Text.Parsec, Time, Database.HDBC.Sqlite3, Locale, Data.ByteString.Lazy.
No obvious suspect. Depending on how curious you are, you could split off small sub-programmes to see what's taking a lot of space. But that's going to be tedious. You could ask on IRC (#haskell) whether anybody knows about huge executables with gtk, or on haskell-cafe, on the gtk2hs mailing list, or you could open a ticket at http://hackage.haskell.org/trac/gtk2hs/ or http://hackage.haskell.org/trac/ghc/newticket?type=bug whichever you consider more likely to be responsible (you needn't create an account for either bug-tracker, both have guest accounts with the guest password in plain view - though you may need to look for it a bit). But first ask elsewhere.
--Christian

Ok, after recompiling gtk again and hiding gtk-0.11.0 (which was still monolithic), it worked. The executable is now, after stripping, a mere 2.7MB in size (which is still large for what it does, but much better). Thanks to all who answered for your help! Christian On Sat, 2010-10-16 at 22:18 +0200, Daniel Fischer wrote:
On Saturday 16 October 2010 21:32:27, . wrote:
I have seen, however, that I have 2 versions of gtk (0.11.2 and 0.11.0) on the system. I will try to find out which one was actually used when I called "ghc --make guiMain"; how would I go about that?
Normally, ghc picks the later version. When compiling, you can see which version is used by giving the appropriate verbosity flag (-v2 or greater). After the fact, the only way I know is
$ nm executable | grep stginit_gtk | more (or less)
and then you'll see the z-encoded package version, lines like
080ca5dc T __stginit_gtkzm0zi11zi2_Stuff
for gtk-0.11.2.
and recompiling and re-linking my little program does not change its size ...
Possibly the packages have been built with -spilt-objs originally, or your GHC doesn't support split-objs, in either case, sorry for the inconvenience
:-/
gtk contains apparently hundreds of thousands of modules, so it may be the module-initialisation functions. Quick test:
$ nm yourexecutable | grep stginit | wc -l
I tried that, there are almost 900 matches.
Hm, shouldn't nearly take you to 20M.
Any other idea what I might be doing wrong?
Nothing, probably.
What ghc version are you using, on which platform (OS, arch), which packages does your executable need?
I am using Ubuntu 10.10, ghc 6.12.1, on a 4-core AMD Phenom. The program I am compiling needs these packages: gtk, Text.Parsec, Time, Database.HDBC.Sqlite3, Locale, Data.ByteString.Lazy.
No obvious suspect. Depending on how curious you are, you could split off small sub-programmes to see what's taking a lot of space. But that's going to be tedious.
You could ask on IRC (#haskell) whether anybody knows about huge executables with gtk, or on haskell-cafe, on the gtk2hs mailing list, or you could open a ticket at http://hackage.haskell.org/trac/gtk2hs/ or http://hackage.haskell.org/trac/ghc/newticket?type=bug whichever you consider more likely to be responsible (you needn't create an account for either bug-tracker, both have guest accounts with the guest password in plain view - though you may need to look for it a bit). But first ask elsewhere.
--Christian

On Sunday 17 October 2010 13:10:32, . wrote:
Ok, after recompiling gtk again and hiding gtk-0.11.0 (which was still monolithic), it worked.
Good.
The executable is now, after stripping, a mere 2.7MB in size (which is still large for what it does, but much better).
Well, the executable contains the runtime, so it's naturally much larger than a small C programme. Compare with $ cat helloWorld.hs module Main (main) where main :: IO () main = putStrLn "Hello, World!" $ ghc --make helloWorld.hs [1 of 1] Compiling Main ( helloWorld.hs, helloWorld.o ) Linking helloWorld ... $ ls -l helloWorld -rwxr-xr-x 1 dafis users 618581 17. Okt 13:53 helloWorld I suppose on a 64-bit system, you get about twice the numbers, so 2.7MB for a programme using gtk isn't exorbitant.
Thanks to all who answered for your help!
Christian
Cheers, Daniel

2010/10/17 Daniel Fischer
$ cat helloWorld.hs module Main (main) where
main :: IO () main = putStrLn "Hello, World!" $ ghc --make helloWorld.hs [1 of 1] Compiling Main ( helloWorld.hs, helloWorld.o ) Linking helloWorld ... $ ls -l helloWorld -rwxr-xr-x 1 dafis users 618581 17. Okt 13:53 helloWorld
jhc makes remarkably small executables, this example takes 11268 bytes, 5756 when stripped. Sad that many libraries, and gtk2hs don't work with it. David.

On Tuesday 19 October 2010 13:23:40, David Virebayre wrote:
2010/10/17 Daniel Fischer
: $ cat helloWorld.hs module Main (main) where
main :: IO () main = putStrLn "Hello, World!" $ ghc --make helloWorld.hs [1 of 1] Compiling Main ( helloWorld.hs, helloWorld.o ) Linking helloWorld ... $ ls -l helloWorld -rwxr-xr-x 1 dafis users 618581 17. Okt 13:53 helloWorld
jhc makes remarkably small executables, this example takes 11268 bytes, 5756 when stripped.
Yes, JHC doesn't put a large runtime into the executables (and, as a whole programme compiler can remove more dead code than GHC can). Stripping GHC's helloWorld: -rwxr-xr-x 1 dafis users 377240 19. Okt 13:55 helloWorld
Sad that many libraries, and gtk2hs don't work with it.
What makes in unusable for me is that it doesn't yet have arbitrary precision integers (if that changes, the library problem might step in). I hope it gets proper Integers and better library support soon, it's quite an exciting project.
David.
Cheers, Daniel

On Sat, 16 Oct 2010 12:26:00 +0200, .
Hi, I was playing around with ghc again, and I was wondering what makes the executables so large and how I could make them smaller (strip works, but is there anything more I can do?)
UPX [0] compresses executables and DLLs quite well (use it after "strip" for the best results). Regards, Henk-Jan van Tuyl [0] http://upx.sourceforge.net/ -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html --
participants (5)
-
.
-
Aleksandar Dimitrov
-
Daniel Fischer
-
David Virebayre
-
Henk-Jan van Tuyl