Re: [C2hs] Profiling c2hs

12 May 2004

      Hi Manuel,

On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
...
...
At the moment
this is done by after parsing each declaration (which might contain
typedefs) modifying the remaining stream of tokens changing all
identifiers that have been discovered to be typedefs from ordinary
identifier tokens to typedef-identifier tokens. This is expensive.
To be honest, I am not entirely convinced that the basic idea of
modifying the token stream to rewrite identifier tokens is that
expensive.  I suspect it is the naive way in which morphTypeNames does
the job that's the problem.
The set of new type names passed to morphTypeNames after each
parseCExtDEclList is quite small. So small that using lists and elem is
faster than using Sets and elemSet (I tried this).
...
...
Perhaps it would be possible to add this set of typedef identifiers to
the parser state monad and thread it through to the parsers for cid and
typedefName which could consult this parser state and match
conditionally on the identifier token being in the typedef-identifier
set or not. (cidOrTN would not need to to a lookup) This would avoid
traversing the token stream after parsing each declaration.
I guess, it would be possible to interleave this with the parser, but
what I like about the current setup is that it is more modular.  I don't
think that it is a problem to traverse the token stream.
True, it looks like it would be difficult to interleave in the manner I
suggested anyway because as far as I can tell, the parser combinators
don't make it easy to do a conditional parse - conditional on some
internal monad state. Swierstra & Duponcheel's combinator isn't a monad
right? And doesn't allow you to thread state from earlier in the token
stream to later.
...
The problem with the current setup is that morphTypeNames is interleaved
with parseCExtDeclList; ie, with each call to parseCExtDEclList that
reads so a typedef one more morphTypeNames-filter is added to the token
stream.  In other words, after N typedefs, each token passes through N
instances of morphTypeNames.  Instead, we want a single instance of
morphTypeNames that collects all the typedef'd names in a set and
rewrites token accordingly.  If the set of represented as a balanced
tree (ie, using the "Set" module in "base/general/Sets.hs") we be able
to remove a factor of log N from the asymptotic complexity.  This would
be a much more localised change than changing the parser state.
What do you think?
If it can be done with a single pass of morphTypeNames that simply
accumulates a set of type names that need to be changed, I imagine that
would have similar speed improvements to the thing I suggested.

It's not clear how that might be accomplished in the current scheme
where we don't know the typedef'ed names until after parsing each
declaration.

How difficult is it to pick out typedef'ed names from the token stream
(ie without full parsing)? Then we could do it in a single lazy pass
over the token stream between the lexer and the parser.

Perhaps it could be done in the lexer itself, if the lexer is monadic.

Duncan

Re: [C2hs] Profiling c2hs

Duncan Coutts