yet another C parser

8 Oct 2006

      All,

over the weekend I had another stab at fixing issues in the c2hs C
parser. I've been annoyed for some time that the basic typedef problem
is not solved. I know some of the other problems are the various GNU
extensions.

Recall that there is this annoying context-dependency in the C grammar,
that parsing depends on if an identifier had been declared as a
typedef'ed identifier withing the enclosing scope.

There's two things the current grammar gets wrong. One is that it
doesn't accept identifiers & typedefs correctly in all the right
situations. This was because my attempts to add them in the right places
led to huge numbers of shift/reduce conflicts in the grammar. The other
is that it doesn't do nested scopes properly, it assumes only one global
scope. So currently, typedef'ed names to not go out of scope when they
should do.

So I started with James A. Roskind's C grammar for YACC. Google for it,
there's lots of info on it. Anyway, I ported that to happy and indeed it
does build with just the one expected shift/reduce conflict (for
if/then/else).

So the plan I suppose, would be to integrate this with the current
lexer. This involves adding the semantic actions at just the right
points to have typedef'ed names added and removed from the typedef'ed
name set at just the right times and then testing the parser on a bunch
of nasty torture cases. There are some suggested by Roskind here:

http://compilers.iecc.com/comparch/article/92-01-056

eg, try this one:
typedef int A, B(A);

which should be equivalent to:

typedef int A;
typedef int B(int);
/* B's type is a function taking an int and returning an int */

Terrifying stuff :-)

I was also looking at a cheap automatic test for the parser. The idea is
just to try and parse every .h file in /usr/include. We can filter out
just the ones that compile on their own with gcc (since some need extra
-I dirs or -D defines).

At the moment on my machine with the current c2hs parser I get:

225 headers could be parsed ok
38 headers failed with parse errors

Of the failures, most are related to __attribute__ of some sort. Some
are C99 features like restrict or _Bool. There are a few otherwise
uncategorised parse errors.

We could probably extend this style of automatic testing to standard C
packages by providing a c2hs wrapper that pretends to be gcc. Eg we'd
use something like:
export CC="c2hs-ccwrapper"
export LD="c2hs-ldwrapper"
./configure
make

to make the build system work ok we'd need to produce dummy .o files
etc. We might be able to test vast amounts of C code this way.

Something slightly less ambitious might be to test the .h files for all
pkg-config packages, since pkg-config provides all the necessary -I & -D
flags.

Testing the correctness of c2hs's c type sizeof calculation and struct
member offsetof calculations could be done in a similar way. After
parsing we could extract all the types, generate a .c file that gets gcc
to tests the sizeof and compares that to the size c2hs calculates.

Duncan

Duncan Coutts

Jelmer Vernooij

tags

participants (2)