Re: [C2hs] Re: support for 6.4

23 May 2005

      On Mon, 2005-05-23 at 00:42 +0100, Duncan Coutts wrote:
...
On Sun, 2005-05-22 at 17:23 +1000, Manuel M T Chakravarty wrote:
...
...
This just needs a lot of space.
This is true, it does just have to keep track of a great deal of
information.
Still, I wonder if there is something going on that we don't quite
understand. The serialised dataset for c2hs when processing the Gtk 2.6
headers is 9.7Mb (this figure does include string sharing but this
should be mostly happening when in the heap too and even if it isn't,
it's only a 2x space blowup). I know that when represented in the ghc
heap it will take more space than this because of all the pointers (and
finite maps rather than simple lists) but that factor wouldn't account
for the actual minimum heap requirements which is about 30 times bigger
than the serialised format.
Actually, that could be verified experimentally by unserialising the
dataset and making sure it is all in memory by using deepSeq (this would
be necessary since we lazily deserialise the dataset).
...
From my brief experiment the 9.7 Mb file when deserialised into the heap
takes just over 50Mb of heap space and top reported 47Mb RSS.
I tried another experiment and found that the parsing phase by itself
required over 250Mb of heap space. By the time it got to the name
analysis it requires over 350Mb.

So from that it looks to me that the parser could be improved. The
lexer/parser could be swapped out for another implementation without
affecting any other module.

Perhaps we should look at one based on Alex & Happy. Happy can do
monadic parsers which would allow it to maintain the set of identifiers
needed when parsing C. Alex & Happy can produce pure Haskell98 code (or
ghc specific code for better performance) so the portability of c2hs
would not be affected - unlike our binary serialisation patches which
are use various ghc'isms.

Duncan