Re: [C2hs] Re: support for 6.4

22 May 2005

      On Sun, 2005-05-22 at 17:23 +1000, Manuel M T Chakravarty wrote:
...
Am Donnerstag, den 19.05.2005, 14:32 +0100 schrieb Duncan Coutts:
...
On Thu, 2005-05-19 at 16:26 +1000, André Pang wrote:
...
On 19/05/2005, at 12:15 AM, Gour wrote:
...
btw, gtk2hs devs have problem with space leaks in c2hs, ie. one
requires over 1GB of RAM to process gtk2 headers.
First of all, I am not convinced that we are having a space leak in
c2hs.
Yes I think that's right. I did quite a bit of profiling work last year
and I didn't notice anything that looked to me like a space leak.
...
Let's look at what c2hs does.  It runs cpp over a header, which
for GTK+ gives one enormous file with C declarations.  c2hs needs to
read the whole thing, as due to the nature of C, it is impossible to
judge a priori which declarations are relevant for the binding at hand.
...
This just needs a lot of space.
This is true, it does just have to keep track of a great deal of
information.

Still, I wonder if there is something going on that we don't quite
understand. The serialised dataset for c2hs when processing the Gtk 2.6
headers is 9.7Mb (this figure does include string sharing but this
should be mostly happening when in the heap too and even if it isn't,
it's only a 2x space blowup). I know that when represented in the ghc
heap it will take more space than this because of all the pointers (and
finite maps rather than simple lists) but that factor wouldn't account
for the actual minimum heap requirements which is about 30 times bigger
than the serialised format.

Actually, that could be verified experimentally by unserialising the
dataset and making sure it is all in memory by using deepSeq (this would
be necessary since we lazily deserialise the dataset).
...
It is probably possible to come up with
a more efficient representation of the AST, but that would probably be
quite some work to implement.
...
...
Our considered opinion (Axel and myself) is that c2hs's memory
consumption is a very difficult thing to fix and any fix we might be
able to come up with would likely be very invasive and so Manuel would
not be very keen on the idea.
I don't mind it being invasive if
(1) it is not gtk2hs specific (ie, it must be generally useful) and
(2) and doesn't conflict with other features and/or the basic structure.
Both reasonable. We'll keep that in mind if we try for a heap reduction
patch.

I think if I were to try this again, I'd try and make the name analysis
phase into an external algorithm, keeping as much of the various finite
maps in external files for most of the time.
...
...
One approach I havn't tried but might bear some fruit is to check that
c2hs is actually using all the data it collects, or if in fact much of
the AST goes unused in which case it could be eliminated. However I
don't imagine that this would give any enourmous savings (ie enough to
process the Gtk 2.x headers on a machine with 256Mb or RAM).
I am sure lots of the AST isn't used, but we won't know until after most
work is done.  To do it's work, c2hs needs all declarations on which any
symbols bound from Haskell directly or indirectly depend.
I know there will be lots of symbols that each particular .chs file will
not use. I meant bits that wouldn't possibly ever be used for any
possible .chs file. But that's also why I said it probably wouldn't be
much of a saving.
...
...
I do have another idea however which I would like to get some feedback
upon...
Basically the idea is that we want to to only run c2hs on the developers
machine and distribute the resulting .hs files.
(snip)
...
...
I do not yet know if this approach will work fully, I'm still asessing
its feasability. If it does turn out to be a workable approach, I'd be
keen to discuss with Manuel wether he might accept such a feature
(controlled by some command line flag) into the main c2hs.
I don't know what you mean by the cpp context.  Moreover, I would like a
clear story on what cpp directives are passed through and what are
interpreted.
It's easiest to explain with an example:

We have some bit of code in a .chs file that is compiled conditionaly
based on some cpp test:

#ifdef USE_GCLOSUE_SIGNALS_IMPL

connectGeneric :: GObjectClass obj =>
... snip ...
    {# call g_signal_connect_closure #}

#else
... etc ...

and with my patched c2hs it ouputs this FFI imports to the end of
the .hs file:

#ifdef  USE_GCLOSUE_SIGNALS_IMPL
foreign import ccall safe " g_signal_connect_closure"
  g_signal_connect_closure :: ((Ptr ()) -> ((Ptr CChar) -> ((Ptr
GClosure) -> (CInt -> (IO CULong)))))
#endif

So what it does is use the existing code that collects the cpp
directives but then instead of doing the buisness of building a .h file
from them and discarding the cpp directives from the list of fragments
to be output, it keeps them in the output .hs file.

Then when going over the .chs fragments expanding all the hooks, it
maintains a stack of the cpp directives (ie push when we encounter an
#if and pop when we see #endif) so when we get to a call hook we need to
expand we know the "cpp context". So we pass this "cpp context" down to
the code that generates the deferred code for the foreign import ccall
declearations and use that to reconstruct the cpp conditional directives
and surround the ffi import declaration.
...
What I don't like about this approach is that it is to an extent gtk2hs
specific.  Let me explain.  Not needing c2hs on user machines would be a
Good Thing.  Supporting this only for the subset of features used by
gtk2hs (ie, no set and get hooks) is bad.
Yes that is not great for a feature to go into mainline c2hs. Perhaps
this 'do cpp after chs' mode should output .hsc files to be further
processed by hsc2hs. That way it could output #offset & #size macros
where c2hs would normally output numeric constants.

Duncan