
Duncan Coutts
I'm looking for some advice on profiling and any suggestion on what might be going on with this program.
One suggestion might be to serialise (key,value) pairs to file as they are first encountered, rather than waiting until they are all inside FiniteMaps. That would eliminate the time you are currently spending on lookups. (A subsequent run would then need to do the insertion of binary (key,value)s, rather than having them already ordered, but at least you save the textual parsing cost there.)
A major problem no doubt is space use. For the large gtk/gtk.h, when I run with +RTS -B to get a beep every major garbage collection, the serialisation phase beeps continuously while the file grows. Occasionally it seems to freeze for 10s of seconds, not dong any garbage collection and not doing any file output but using 100% CPU, then it carries on outputting and garbage collecting furiously. I don't know how to work out what's going on when it does that.
One guess might be generational collection: fast beeps are for the current generation, pauses are older generations? Regards, Malcolm