
On Mon, 2005-03-14 at 11:43 +1100, Manuel M T Chakravarty wrote:
Duncan,
To be honest, I am not too keen on using files to buffer the AST and symbol tables, as it adds another level of complexity.
Sadly this is quite true. I was commenting to some other Haskell hackers that there's probably a paper or so in designing a framwork / techniques for writing external algorithms in Haskell (ones where the dataset is not expected to fit in main memory).
Moreover, an operation that is performed whenever C declarations are analysed is declaration chasing, where name analysis (using functions from CTrav.hs) follows declarations involving typedefs. Depending on where the types are defined, access can be non-local and may slow everything down quite a bit.
Right, as I recall one of the maps was read-write but the others were write only.
Do you have any idea which data structures take most of the space? Or is it just that after expanding the header files of GTK widgets, the resulting C pre-processor output is so large that it just takes a lot of space to hold it?
I know the preprocessed header is large (765K for gtk 2.4) but I think c2hs's use is more than one would expect for that. I tried running c2hs on that 765K gtk.i file just now without any +RTS -M650m -RTS heap limit. I had to kill it after it had allocated 1.3Gb on my machine which has 1Gb of RAM and brought everything to a crawl. With a memory limit in place it can complete using 'only' 650Mb. So that's a 10x memory use compared to the original file. I have not been able to figure out exactly which bit of the data structure is taking so much space. I've found GHC's space profiling tools just don't tell me that (or I don't understand the profiling output enough). I suspect that there is a great deal of the AST that is kept but is never used. But I cannot pinpoint anything. I don't think it is the strings themselves. Using a sharing symbol table (like ghc uses) and packed strings made an insignificant difference in my tests. The space profiling does show that the finite maps take a very large proprotion of the space compared to the AST (but maybe I'm misreading the profile graphs since the maps are also retainers for bits of the AST) Sorry this isn't teribly helpful. Duncan