Odd profiling results

Hi all, I do some work on Ben Lippmeier's DDC compiler and when profiling DDC compiling a program with 11 source code files (much like ghc --make), we get a memory usage graph like this: http://www.mega-nerd.com/tmp/ddc-heap-usage-20101231.png We have no particular problem with the 11 peaks (one for each source file) but wonder what the hell is going on in the periods when the memory usage is flat. On a similar run the RTS reports that GC time was less than 10% of total runtime. This profiling information was captured on Debian on x86-64 system where DDC was compiled using ghc-6.12.1 (Debian package). Clues? Thanks, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

http://www.mega-nerd.com/tmp/ddc-heap-usage-20101231.png
We have no particular problem with the 11 peaks (one for each source file) but wonder what the hell is going on in the periods when the memory usage is flat.
The peaks I am guessing are largely attributable to parsing the source files. Then, once the source has been converted to an AST, the DDC compiler is presumably doing some analysis before moving on to the next file? I think these are the well-behaved flat bits. These phases do not allocate anything fresh, so they are not creating new data-structures, but perhaps are propagating static information around the AST. Something like a type/effect analysis maybe? Regards, Malcolm

Malcolm Wallace wrote:
The peaks I am guessing are largely attributable to parsing the source files. Then, once the source has been converted to an AST, the DDC compiler is presumably doing some analysis before moving on to the next file? I think these are the well-behaved flat bits. These phases do not allocate anything fresh, so they are not creating new data-structures, but perhaps are propagating static information around the AST. Something like a type/effect analysis maybe?
Thanks Malcolm, I think you may indeed be correct. It should also be pretty easy to test that hypothesis and that is my next task. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

On 04/01/2011 21:20, Erik de Castro Lopo wrote:
Malcolm Wallace wrote:
The peaks I am guessing are largely attributable to parsing the source files. Then, once the source has been converted to an AST, the DDC compiler is presumably doing some analysis before moving on to the next file? I think these are the well-behaved flat bits. These phases do not allocate anything fresh, so they are not creating new data-structures, but perhaps are propagating static information around the AST. Something like a type/effect analysis maybe?
Thanks Malcolm, I think you may indeed be correct.
It should also be pretty easy to test that hypothesis and that is my next task.
Or the C compiler, perhaps? Cheers, Simon

Simon Marlow wrote:
Or the C compiler, perhaps?
Thanks for the suggesion Simon. This one was actually easier to test that Malcolm's suggestion. To test it, I ran the same test under Oprofile [0], the Linux kernel based profiler. OProfile is really cool because it profiles the whole system, from user space all the way down to the most insignificant kernel driver (eg allows you to see if a kernel driver is causing your user space code to do something weird). My first run showed the GHC RTS profiling code being one of the biggest runtime costs. After recompiling without profiling the biggest CPU hogs were as follows (editied slightly for clarity and severely truncated for brevity): samples % image name app name symbol name --------------------------------------------------------------------- 128328 7.7305 ddc ddc evacuate 97663 5.8832 ddc ddc s1PI_info 91177 5.4925 ddc ddc stg_ap_pp_fast 72926 4.3931 ddc ddc s1PV_info 70650 4.2559 ddc ddc s1PJ_info 62956 3.7925 ddc ddc s19S_info 51240 3.0867 ddc ddc stg_ap_p_fast 50197 3.0239 ddc ddc stg_upd_frame_info 47797 2.8793 ddc ddc stg_PAP_apply 39764 2.3954 ddc ddc sNr_info 39351 2.3705 ddc ddc stg_ap_p_info 38339 2.3095 ddc ddc stg_ap_pp_info 30283 1.8242 cc1 cc1 /usr/lib/gcc/x86_64-linux-gnu/4.4/cc1 29733 1.7911 ddc ddc scavenge_block 27062 1.6302 ddc ddc base_GHCziBase_zdfOrdChar3_info 23620 1.4229 ddc ddc s1PK_info 21671 1.3055 ddc ddc s19R_info 19043 1.1471 ddc ddc base_GHCziClasses_DZCOrd_con_info 18090 1.0897 ddc ddc sAD_info 16500 0.9940 ddc ddc sAE_info 14978 0.9023 ddc ddc sxb_info 13862 0.8350 ddc ddc stg_ap_0_fast 12975 0.7816 ddc ddc stg_IND_STATIC_info 12553 0.7562 ddc ddc base_GHCziClasses_DZCOrd_static_info The C compiler is down there at %1.82 of total run time and definitely not the culprit. I've been discussing this with Ben as well and on the basis of this plot that Ben captured: http://deluge.ouroborus.net/~benl/tmp/ddc-comp.pdf we ought to be looking at scrapeRecursive. Cheers, Erik [0] http://oprofile.sourceforge.net/ -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
participants (3)
-
Erik de Castro Lopo
-
Malcolm Wallace
-
Simon Marlow