
Hi all, For people building gtk2hs, we've found that the large amount of heap space required by c2hs to be a problem. It means people with older machines with less than about 400Mb of RAM cannot build gtk2hs. For recent versions of Gtk, parsing and name analysis requires 350m of heap space. (ie, runing c2hs +RTS -M340m -RTS will run out of heap, but c2hs +RTS -M350m -RTS will be ok). So we've been pondering how to reduce the heap requirements. The key point is that we do not want to have to keep the whole of the AST + symbol maps in memory at once. For the parsing phase, this should not be a problem, the parsers works declaration by declaration, the only thing that is accumulated is the set of typedef names. There are two options here, we could write out each declaration one at a time to another file using the binary serialisation framework. Alternatively, if the list of declarations could be returned lazily by the parser then that should work ok. The harder bit is the name analysis. It reads the declaration list in a linear pattern (so it should work well with a lazy parser or a list of declarations deserialised one by one out of a file). The CTagNS namespace and CDefTable seem to be write only; which is good as they could be written out to file immediately. The CShadowNS is not generated during the name analysis phase. The CObjNS namespace is trickier since it is both written and used for lookups. We could live with keeping this one in memory or alternatively it should be possible to both write the map bit by bit and do random reads for the lookups. The lookups themselves do not retain any heap since they immediately write the value out into another map. The name analysis phase actually doesn't use many map lookup/insert operations. If each of these could be re-defined locally to work in the local NA monad and then the NA monad extended to know the files we are reading from/to then in the runNS we could switch between doing lookups/inserts from in heap FiniteMaps or to/from files. runNA :: NA a -> Either AttrC Files -> a -> CST s (Either AttrC Files) My point is that we wouldn't need to change any existing code paths. The use of intermediate files could even be controlled by a --conserve-memory flag or something (since it would probably slow down the cases where currently everything fits into memory). Just looking for feedback; particularly from Manuel as to whether he thinks this is a plan worth pursuing. Duncan