
All, over the weekend I had another stab at fixing issues in the c2hs C parser. I've been annoyed for some time that the basic typedef problem is not solved. I know some of the other problems are the various GNU extensions. Recall that there is this annoying context-dependency in the C grammar, that parsing depends on if an identifier had been declared as a typedef'ed identifier withing the enclosing scope. There's two things the current grammar gets wrong. One is that it doesn't accept identifiers & typedefs correctly in all the right situations. This was because my attempts to add them in the right places led to huge numbers of shift/reduce conflicts in the grammar. The other is that it doesn't do nested scopes properly, it assumes only one global scope. So currently, typedef'ed names to not go out of scope when they should do. So I started with James A. Roskind's C grammar for YACC. Google for it, there's lots of info on it. Anyway, I ported that to happy and indeed it does build with just the one expected shift/reduce conflict (for if/then/else). So the plan I suppose, would be to integrate this with the current lexer. This involves adding the semantic actions at just the right points to have typedef'ed names added and removed from the typedef'ed name set at just the right times and then testing the parser on a bunch of nasty torture cases. There are some suggested by Roskind here: http://compilers.iecc.com/comparch/article/92-01-056 eg, try this one: typedef int A, B(A); which should be equivalent to: typedef int A; typedef int B(int); /* B's type is a function taking an int and returning an int */ Terrifying stuff :-) I was also looking at a cheap automatic test for the parser. The idea is just to try and parse every .h file in /usr/include. We can filter out just the ones that compile on their own with gcc (since some need extra -I dirs or -D defines). At the moment on my machine with the current c2hs parser I get: 225 headers could be parsed ok 38 headers failed with parse errors Of the failures, most are related to __attribute__ of some sort. Some are C99 features like restrict or _Bool. There are a few otherwise uncategorised parse errors. We could probably extend this style of automatic testing to standard C packages by providing a c2hs wrapper that pretends to be gcc. Eg we'd use something like: export CC="c2hs-ccwrapper" export LD="c2hs-ldwrapper" ./configure make to make the build system work ok we'd need to produce dummy .o files etc. We might be able to test vast amounts of C code this way. Something slightly less ambitious might be to test the .h files for all pkg-config packages, since pkg-config provides all the necessary -I & -D flags. Testing the correctness of c2hs's c type sizeof calculation and struct member offsetof calculations could be done in a similar way. After parsing we could extract all the types, generate a .c file that gets gcc to tests the sizeof and compares that to the size c2hs calculates. Duncan

Hi Duncan, On Mon, 2006-10-09 at 00:26 +0100, Duncan Coutts wrote:
over the weekend I had another stab at fixing issues in the c2hs C parser. I've been annoyed for some time that the basic typedef problem is not solved. I know some of the other problems are the various GNU extensions.
Recall that there is this annoying context-dependency in the C grammar, that parsing depends on if an identifier had been declared as a typedef'ed identifier withing the enclosing scope.
There's two things the current grammar gets wrong. One is that it doesn't accept identifiers & typedefs correctly in all the right situations. This was because my attempts to add them in the right places led to huge numbers of shift/reduce conflicts in the grammar. The other is that it doesn't do nested scopes properly, it assumes only one global scope. So currently, typedef'ed names to not go out of scope when they should do.
So I started with James A. Roskind's C grammar for YACC. Google for it, there's lots of info on it. Anyway, I ported that to happy and indeed it does build with just the one expected shift/reduce conflict (for if/then/else).
So the plan I suppose, would be to integrate this with the current lexer. This involves adding the semantic actions at just the right points to have typedef'ed names added and removed from the typedef'ed name set at just the right times and then testing the parser on a bunch of nasty torture cases. There are some suggested by Roskind here:
http://compilers.iecc.com/comparch/article/92-01-056
eg, try this one: typedef int A, B(A);
which should be equivalent to:
typedef int A; typedef int B(int); /* B's type is a function taking an int and returning an int */
Terrifying stuff :-)
I was also looking at a cheap automatic test for the parser. The idea is just to try and parse every .h file in /usr/include. We can filter out just the ones that compile on their own with gcc (since some need extra -I dirs or -D defines).
At the moment on my machine with the current c2hs parser I get:
225 headers could be parsed ok 38 headers failed with parse errors
Of the failures, most are related to __attribute__ of some sort. Some are C99 features like restrict or _Bool. There are a few otherwise uncategorised parse errors.
We could probably extend this style of automatic testing to standard C packages by providing a c2hs wrapper that pretends to be gcc. Eg we'd use something like: export CC="c2hs-ccwrapper" export LD="c2hs-ldwrapper" ./configure make
to make the build system work ok we'd need to produce dummy .o files etc. We might be able to test vast amounts of C code this way. Yeah, that'd certainly make sense. I have to manually trim the includes on my system when writing .chs files because of system headers that c2hs can't parse because of weird attributes.
Cheers,
Jelmer
--
Jelmer Vernooij
participants (2)
-
Duncan Coutts
-
Jelmer Vernooij