
On Mon, 2004-12-20 at 21:56 +1100, Manuel M T Chakravarty wrote:
For some reason (as yet undiscovered) the serialisation is very slow and memory hungry. On my machine it takes 16 seconds to parse all of gtk/gtk but 45 seconds to serialise all that to disk.
My only *guess* would be that to serialise, you force some/all of the semantic analysis of the C AST that usually only occurs lazily for those parts of the header that are needed for the binding of the currently compiled .chs file. It depends on what information exactly you serialise.
Actually it turns out not to be that. It was my first suspicion too, so I generated DeepSeq instances for everything (with DrIFT) and ran that before serialising. I inserted timing points in key places. It turned out that the DeepSeq took very little time at all (some time, so the deepSeq was actually working) but the serialisation still took forever. It seems that the serialisation allocates enormous amounts of garbage which is why it takes so long. Simon M reckons that ghc's Binary module should run in constant space (well, log stack space) when the right optimisations are used. I'll probably have to analyse the optimised core code so see what's really going on, if it is doing allocation anywhere. Duncan