
Tim Chevalier wrote:
When you build your own code with -prof, GHC automatically links in profiling versions of the standard libraries. However, its profiling libraries were not built with -auto-all (the reason is that adding cost centres interferes with optimization). To build the libraries with -auto-all, you would need to build GHC from sources, which is not for the faint of heart. However, the results of doing that aren't usually very enlightening anyway -- for example, foldr might be called from many different places, but you might only care about a single call site (and then you can annotate that call site).
Hmm, okay -- that makes some sense.
Just from looking, I would guess this is the culprit:
termToStr t il = {-# SCC "termToStr" #-} ((:) ("t " ++ t ++ " " ++ (foldl ilItemToStr "" il)))
If you want to be really sure, you can rewrite this as:
termToStr t il = {-# SCC "termToStr" #-} ((:) ("t " ++ t ++ " " ++ ({-# SCC "termToStr_foldl" #-} foldl ilItemToStr "" il)))
and that will give you a cost centre measuring the specific cost of the invocation of foldl.
I did that and found out that it accounts for only about 0.6 percent of the running time. Changing fold to fold' does improve it, though overall it's not that significant (again, since it's not the bottleneck). I just realized that most of the time is spent inside 'serialize' and not inherited as I originally claimed. Here is how my current code and profiling output look like: http://hpaste.org/10329 How do I figure out what exactly in 'serialize' takes so much time? -- Vlad Skvortsov, vss@73rus.com, http://vss.73rus.com