
On Sun, 2005-05-22 at 17:23 +1000, Manuel M T Chakravarty wrote:
Am Donnerstag, den 19.05.2005, 14:32 +0100 schrieb Duncan Coutts:
On Thu, 2005-05-19 at 16:26 +1000, André Pang wrote:
On 19/05/2005, at 12:15 AM, Gour wrote:
btw, gtk2hs devs have problem with space leaks in c2hs, ie. one requires over 1GB of RAM to process gtk2 headers.
First of all, I am not convinced that we are having a space leak in c2hs.
Yes I think that's right. I did quite a bit of profiling work last year and I didn't notice anything that looked to me like a space leak.
Let's look at what c2hs does. It runs cpp over a header, which for GTK+ gives one enormous file with C declarations. c2hs needs to read the whole thing, as due to the nature of C, it is impossible to judge a priori which declarations are relevant for the binding at hand.
This just needs a lot of space.
This is true, it does just have to keep track of a great deal of information. Still, I wonder if there is something going on that we don't quite understand. The serialised dataset for c2hs when processing the Gtk 2.6 headers is 9.7Mb (this figure does include string sharing but this should be mostly happening when in the heap too and even if it isn't, it's only a 2x space blowup). I know that when represented in the ghc heap it will take more space than this because of all the pointers (and finite maps rather than simple lists) but that factor wouldn't account for the actual minimum heap requirements which is about 30 times bigger than the serialised format. Actually, that could be verified experimentally by unserialising the dataset and making sure it is all in memory by using deepSeq (this would be necessary since we lazily deserialise the dataset).
It is probably possible to come up with a more efficient representation of the AST, but that would probably be quite some work to implement.
Our considered opinion (Axel and myself) is that c2hs's memory consumption is a very difficult thing to fix and any fix we might be able to come up with would likely be very invasive and so Manuel would not be very keen on the idea.
I don't mind it being invasive if
(1) it is not gtk2hs specific (ie, it must be generally useful) and (2) and doesn't conflict with other features and/or the basic structure.
Both reasonable. We'll keep that in mind if we try for a heap reduction patch. I think if I were to try this again, I'd try and make the name analysis phase into an external algorithm, keeping as much of the various finite maps in external files for most of the time.
One approach I havn't tried but might bear some fruit is to check that c2hs is actually using all the data it collects, or if in fact much of the AST goes unused in which case it could be eliminated. However I don't imagine that this would give any enourmous savings (ie enough to process the Gtk 2.x headers on a machine with 256Mb or RAM).
I am sure lots of the AST isn't used, but we won't know until after most work is done. To do it's work, c2hs needs all declarations on which any symbols bound from Haskell directly or indirectly depend.
I know there will be lots of symbols that each particular .chs file will not use. I meant bits that wouldn't possibly ever be used for any possible .chs file. But that's also why I said it probably wouldn't be much of a saving.
I do have another idea however which I would like to get some feedback upon...
Basically the idea is that we want to to only run c2hs on the developers machine and distribute the resulting .hs files.
(snip)
I do not yet know if this approach will work fully, I'm still asessing its feasability. If it does turn out to be a workable approach, I'd be keen to discuss with Manuel wether he might accept such a feature (controlled by some command line flag) into the main c2hs.
I don't know what you mean by the cpp context. Moreover, I would like a clear story on what cpp directives are passed through and what are interpreted.
It's easiest to explain with an example: We have some bit of code in a .chs file that is compiled conditionaly based on some cpp test: #ifdef USE_GCLOSUE_SIGNALS_IMPL connectGeneric :: GObjectClass obj => ... snip ... {# call g_signal_connect_closure #} #else ... etc ... and with my patched c2hs it ouputs this FFI imports to the end of the .hs file: #ifdef USE_GCLOSUE_SIGNALS_IMPL foreign import ccall safe " g_signal_connect_closure" g_signal_connect_closure :: ((Ptr ()) -> ((Ptr CChar) -> ((Ptr GClosure) -> (CInt -> (IO CULong))))) #endif So what it does is use the existing code that collects the cpp directives but then instead of doing the buisness of building a .h file from them and discarding the cpp directives from the list of fragments to be output, it keeps them in the output .hs file. Then when going over the .chs fragments expanding all the hooks, it maintains a stack of the cpp directives (ie push when we encounter an #if and pop when we see #endif) so when we get to a call hook we need to expand we know the "cpp context". So we pass this "cpp context" down to the code that generates the deferred code for the foreign import ccall declearations and use that to reconstruct the cpp conditional directives and surround the ffi import declaration.
What I don't like about this approach is that it is to an extent gtk2hs specific. Let me explain. Not needing c2hs on user machines would be a Good Thing. Supporting this only for the subset of features used by gtk2hs (ie, no set and get hooks) is bad.
Yes that is not great for a feature to go into mainline c2hs. Perhaps this 'do cpp after chs' mode should output .hsc files to be further processed by hsc2hs. That way it could output #offset & #size macros where c2hs would normally output numeric constants. Duncan