
Hi Manuel, On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive.
To be honest, I am not entirely convinced that the basic idea of modifying the token stream to rewrite identifier tokens is that expensive. I suspect it is the naive way in which morphTypeNames does the job that's the problem.
The set of new type names passed to morphTypeNames after each parseCExtDEclList is quite small. So small that using lists and elem is faster than using Sets and elemSet (I tried this).
Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration.
I guess, it would be possible to interleave this with the parser, but what I like about the current setup is that it is more modular. I don't think that it is a problem to traverse the token stream.
True, it looks like it would be difficult to interleave in the manner I suggested anyway because as far as I can tell, the parser combinators don't make it easy to do a conditional parse - conditional on some internal monad state. Swierstra & Duponcheel's combinator isn't a monad right? And doesn't allow you to thread state from earlier in the token stream to later.
The problem with the current setup is that morphTypeNames is interleaved with parseCExtDeclList; ie, with each call to parseCExtDEclList that reads so a typedef one more morphTypeNames-filter is added to the token stream. In other words, after N typedefs, each token passes through N instances of morphTypeNames. Instead, we want a single instance of morphTypeNames that collects all the typedef'd names in a set and rewrites token accordingly. If the set of represented as a balanced tree (ie, using the "Set" module in "base/general/Sets.hs") we be able to remove a factor of log N from the asymptotic complexity. This would be a much more localised change than changing the parser state.
What do you think?
If it can be done with a single pass of morphTypeNames that simply accumulates a set of type names that need to be changed, I imagine that would have similar speed improvements to the thing I suggested. It's not clear how that might be accomplished in the current scheme where we don't know the typedef'ed names until after parsing each declaration. How difficult is it to pick out typedef'ed names from the token stream (ie without full parsing)? Then we could do it in a single lazy pass over the token stream between the lexer and the parser. Perhaps it could be done in the lexer itself, if the lexer is monadic. Duncan