inter module optimizations

28 Mar 2007

      I had posted some data on inter-module optimizations that I had 
calculated when splitting my program from one computational module to 
many different ones.

Tim Chevalier suggested that my calculation could be interesting to the 
people here.

So I made the effort of preparing the various versions of my code and re 
doing the analysis better.
Unfortunately I had already began renaming things without doing a darcs 
record, so in the split version some function names are different.

I have a tar.bz archive of 21KB, but I did not know if it is considered 
rude to send attachments, but if someone is interested I can send him 
the file.

Basically it mainly boils down to non-inlining of some important 
functions on a newtype (
    type LatLocI = Word32
    newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
), because specialization should not be an issue as I had already given 
specific signatures to my functions.

Also worth noting is that using the profiling with -O2 compilation makes 
one thing that inlining (or using a single module) makes the program 
slower, whereas the opposite is true. I think that the profiling 
overhead are incorrectly evaluated.
I know that with -O2 one cannot expect profiling to be good, but it 
would be nice if it wouldn't be so misleading

Here some data (obtained with a script that is also in the tar.bz archive)

******** allInOne:
original program, monolithic main computational module
* timings of -O2 executable
7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+894minor)pagefaults 0swaps
* timings of the executable with profiling
        total time  =       15.25 secs   (305 ticks @ 50 ms)
        total alloc = 5,888,786,120 bytes  (excludes profiling overheads)
******** splitModule NoReexport NoInline directives:
split computational module, no export list for split modules
* timings of -O2 executable
10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
 * timings of the executable with profiling
        total time  =       11.85 secs   (237 ticks @ 50 ms)
        total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
******** splitModule Reexport NoInline directives:
computational module, no export list for split modules, old module 
reexport using export list
 * timings of -O2 executable
8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
* timings of the executable with profiling
         total time  =       12.20 secs   (244 ticks @ 50 ms)
        total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
******** splitModule NoReexport Inline directives:
split computational module, no export list for split modules, explicit 
inline directives
* timings of -O2 executable
6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+895minor)pagefaults 0swaps
 * timings of the executable with profiling
        total time  =       18.80 secs   (376 ticks @ 50 ms)
        total alloc = 5,374,883,312 bytes  (excludes profiling overheads)
*************

Fawzi

Fawzi Mohamed

dons＠cse.unsw.edu.au

Fawzi Mohamed

tags

participants (2)