
Duncan Coutts:
On Mon, 2005-05-23 at 00:42 +0100, Duncan Coutts wrote: From my brief experiment the 9.7 Mb file when deserialised into the heap takes just over 50Mb of heap space and top reported 47Mb RSS.
I tried another experiment and found that the parsing phase by itself required over 250Mb of heap space. By the time it got to the name analysis it requires over 350Mb.
So from that it looks to me that the parser could be improved. The lexer/parser could be swapped out for another implementation without affecting any other module.
Perhaps we should look at one based on Alex & Happy. Happy can do monadic parsers which would allow it to maintain the set of identifiers needed when parsing C. Alex & Happy can produce pure Haskell98 code (or ghc specific code for better performance) so the portability of c2hs would not be affected - unlike our binary serialisation patches which are use various ghc'isms.
An Alex/Happy parser would be an option if it improves matters significantly. If you or anybody else has a go at it, please follow the C language definition closely, as does the existing lexer and parser (with comments stating to which sections in K&R the individual productions relate). Moreover, the module c2hs/base/syntax/ParserMonad.hs already provides a monad suitable for Happy. Manuel