
All, I was writing a demo app to add to gtk2hs's small collection; it's a viewer for ghc's time profile log files, and I needed a large profile file to test it on so I thought I'd try profiling c2hs. Here's what I found: Looking at allocations (which corresponds quite closely with time): Note that these are % of the total cost, not % of their parent cost 11.7% is taken by the C & chs lexers which is not unreasonable. 77.4% is taken by the CParser Within the CParser: 72.0% is taken by parseCExtDeclList, within which 65.5% is taken by morphTypeNames (and 58.2% of the runtime) 31.4% alloc & 26.5% time was taken just by calls to Prelude.elem within morphTypeNames. In this particular run of c2hs, in which it was processing some large gtk+ headers, it called morphTypeNames over 55 million times. I narrowed this down a bit more and found that it is the call to morphTypeNames from parseCExtDeclList that takes the time, the call from parseCHeader is insignificant. Furthermore, it is the recursive calls to morphTypeNames that is expensive; parseCExtDeclList only called morphTypeNames about 1000 times (compared to 55 million recursive calls). So what's going and and what can we do? If I understand the code correctly, the reason we are doing this morphTypeNames is because we can't distinguish typedef'ed identifiers in the parser, we need to remember which identifiers were typedefs and then treat them differently in the parser later if they were. At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive. Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration. Sound reasonable? Duncan

On 5 May 2004, at 06:08, Duncan Coutts wrote:
All,
I was writing a demo app to add to gtk2hs's small collection; it's a viewer for ghc's time profile log files, and I needed a large profile file to test it on so I thought I'd try profiling c2hs. Here's what I found:
Looking at allocations (which corresponds quite closely with time): Note that these are % of the total cost, not % of their parent cost
11.7% is taken by the C & chs lexers which is not unreasonable. 77.4% is taken by the CParser
Within the CParser: 72.0% is taken by parseCExtDeclList, within which 65.5% is taken by morphTypeNames (and 58.2% of the runtime) 31.4% alloc & 26.5% time was taken just by calls to Prelude.elem within morphTypeNames.
In this particular run of c2hs, in which it was processing some large gtk+ headers, it called morphTypeNames over 55 million times.
I narrowed this down a bit more and found that it is the call to morphTypeNames from parseCExtDeclList that takes the time, the call from parseCHeader is insignificant. Furthermore, it is the recursive calls to morphTypeNames that is expensive; parseCExtDeclList only called morphTypeNames about 1000 times (compared to 55 million recursive calls).
So what's going and and what can we do?
If I understand the code correctly, the reason we are doing this morphTypeNames is because we can't distinguish typedef'ed identifiers in the parser, we need to remember which identifiers were typedefs and then treat them differently in the parser later if they were. At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive.
Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration.
Sound reasonable?
It might be worth a shot. Eliminating this function call would certainly speed up things, however, c2hs seems to be slow due to its huge memory footprint at runtime which probably doesn't change (although it might). Nice find, Axel.

On Wed, 2004-05-05 at 14:08, Duncan Coutts wrote:
All,
I was writing a demo app to add to gtk2hs's small collection; it's a viewer for ghc's time profile log files, and I needed a large profile file to test it on so I thought I'd try profiling c2hs. Here's what I found:
Looking at allocations (which corresponds quite closely with time): Note that these are % of the total cost, not % of their parent cost
11.7% is taken by the C & chs lexers which is not unreasonable. 77.4% is taken by the CParser
Within the CParser: 72.0% is taken by parseCExtDeclList, within which 65.5% is taken by morphTypeNames (and 58.2% of the runtime) 31.4% alloc & 26.5% time was taken just by calls to Prelude.elem within morphTypeNames.
In this particular run of c2hs, in which it was processing some large gtk+ headers, it called morphTypeNames over 55 million times.
Hmm, I knew that there was a bottleneck in the parser, but I never got around to profiling it properly. Thanks for doing that. To be honest, I am a bit surprised that morphTypeNames is the culprit, as I benchmarked it a bit when writing the code initially and it didn't appear to be a problem, but obviously I overlooked something.
I narrowed this down a bit more and found that it is the call to morphTypeNames from parseCExtDeclList that takes the time, the call from parseCHeader is insignificant. Furthermore, it is the recursive calls to morphTypeNames that is expensive; parseCExtDeclList only called morphTypeNames about 1000 times (compared to 55 million recursive calls).
So what's going and and what can we do?
If I understand the code correctly, the reason we are doing this morphTypeNames is because we can't distinguish typedef'ed identifiers in the parser, we need to remember which identifiers were typedefs and then treat them differently in the parser later if they were.
Exactly.
At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive.
To be honest, I am not entirely convinced that the basic idea of modifying the token stream to rewrite identifier tokens is that expensive. I suspect it is the naive way in which morphTypeNames does the job that's the problem.
Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration.
I guess, it would be possible to interleave this with the parser, but what I like about the current setup is that it is more modular. I don't think that it is a problem to traverse the token stream. The problem with the current setup is that morphTypeNames is interleaved with parseCExtDeclList; ie, with each call to parseCExtDEclList that reads so a typedef one more morphTypeNames-filter is added to the token stream. In other words, after N typedefs, each token passes through N instances of morphTypeNames. Instead, we want a single instance of morphTypeNames that collects all the typedef'd names in a set and rewrites token accordingly. If the set of represented as a balanced tree (ie, using the "Set" module in "base/general/Sets.hs") we be able to remove a factor of log N from the asymptotic complexity. This would be a much more localised change than changing the parser state. What do you think? Manuel PS: Axel, you are right that the large memory footprint is the problem, but in a lazy language that is usually pretty closely correlated with runtime...and I knew for quite a while that the problem is in the parser. So, Duncan's find seems very relevant.

Hi Manuel, On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive.
To be honest, I am not entirely convinced that the basic idea of modifying the token stream to rewrite identifier tokens is that expensive. I suspect it is the naive way in which morphTypeNames does the job that's the problem.
The set of new type names passed to morphTypeNames after each parseCExtDEclList is quite small. So small that using lists and elem is faster than using Sets and elemSet (I tried this).
Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration.
I guess, it would be possible to interleave this with the parser, but what I like about the current setup is that it is more modular. I don't think that it is a problem to traverse the token stream.
True, it looks like it would be difficult to interleave in the manner I suggested anyway because as far as I can tell, the parser combinators don't make it easy to do a conditional parse - conditional on some internal monad state. Swierstra & Duponcheel's combinator isn't a monad right? And doesn't allow you to thread state from earlier in the token stream to later.
The problem with the current setup is that morphTypeNames is interleaved with parseCExtDeclList; ie, with each call to parseCExtDEclList that reads so a typedef one more morphTypeNames-filter is added to the token stream. In other words, after N typedefs, each token passes through N instances of morphTypeNames. Instead, we want a single instance of morphTypeNames that collects all the typedef'd names in a set and rewrites token accordingly. If the set of represented as a balanced tree (ie, using the "Set" module in "base/general/Sets.hs") we be able to remove a factor of log N from the asymptotic complexity. This would be a much more localised change than changing the parser state.
What do you think?
If it can be done with a single pass of morphTypeNames that simply accumulates a set of type names that need to be changed, I imagine that would have similar speed improvements to the thing I suggested. It's not clear how that might be accomplished in the current scheme where we don't know the typedef'ed names until after parsing each declaration. How difficult is it to pick out typedef'ed names from the token stream (ie without full parsing)? Then we could do it in a single lazy pass over the token stream between the lexer and the parser. Perhaps it could be done in the lexer itself, if the lexer is monadic. Duncan

Duncan, On Wed, 2004-05-12 at 18:54, Duncan Coutts wrote:
On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
At the moment this is done by after parsing each declaration (which might contain typedefs) modifying the remaining stream of tokens changing all identifiers that have been discovered to be typedefs from ordinary identifier tokens to typedef-identifier tokens. This is expensive.
To be honest, I am not entirely convinced that the basic idea of modifying the token stream to rewrite identifier tokens is that expensive. I suspect it is the naive way in which morphTypeNames does the job that's the problem.
The set of new type names passed to morphTypeNames after each parseCExtDEclList is quite small. So small that using lists and elem is faster than using Sets and elemSet (I tried this).
Sure, I wouldn't propose this for the small sets that we have at the moment, only if we can combine all morphTypeNames into one.
Perhaps it would be possible to add this set of typedef identifiers to the parser state monad and thread it through to the parsers for cid and typedefName which could consult this parser state and match conditionally on the identifier token being in the typedef-identifier set or not. (cidOrTN would not need to to a lookup) This would avoid traversing the token stream after parsing each declaration.
I guess, it would be possible to interleave this with the parser, but what I like about the current setup is that it is more modular. I don't think that it is a problem to traverse the token stream.
True, it looks like it would be difficult to interleave in the manner I suggested anyway because as far as I can tell, the parser combinators don't make it easy to do a conditional parse - conditional on some internal monad state. Swierstra & Duponcheel's combinator isn't a monad right? And doesn't allow you to thread state from earlier in the token stream to later.
Yes.
The problem with the current setup is that morphTypeNames is interleaved with parseCExtDeclList; ie, with each call to parseCExtDEclList that reads so a typedef one more morphTypeNames-filter is added to the token stream. In other words, after N typedefs, each token passes through N instances of morphTypeNames. Instead, we want a single instance of morphTypeNames that collects all the typedef'd names in a set and rewrites token accordingly. If the set of represented as a balanced tree (ie, using the "Set" module in "base/general/Sets.hs") we be able to remove a factor of log N from the asymptotic complexity. This would be a much more localised change than changing the parser state.
What do you think?
If it can be done with a single pass of morphTypeNames that simply accumulates a set of type names that need to be changed, I imagine that would have similar speed improvements to the thing I suggested.
It's not clear how that might be accomplished in the current scheme where we don't know the typedef'ed names until after parsing each declaration.
How difficult is it to pick out typedef'ed names from the token stream (ie without full parsing)? Then we could do it in a single lazy pass over the token stream between the lexer and the parser.
Perhaps it could be done in the lexer itself, if the lexer is monadic.
The lexer isn't monadic either. It uses a self-optimisation technique similar to that in the parser combinators. I had a closer look now and we need a bit of cooperation from the parser library; so, I extended Parsers.execParser to take a token converter as an additional argument. In the C declaration parser, this token converter changes from declaration to declaration, as more and more typedef'd names are collected. (They are collected in a Set now.) I attach a patch that modifies base/syntax/Parsers.hs and c2hs/c/CParser.hs accordingly. As the patched functions didn't change for quite a while, this patch should be quite independent of the version of c2hs that you have got. Could you please profile the patched parser again and see whether its space and runtime behaviour changed? Cheers, Manuel

On Sat, 2004-05-15 at 09:32, Manuel M T Chakravarty wrote:
I had a closer look now and we need a bit of cooperation from the parser library; so, I extended Parsers.execParser to take a token converter as an additional argument. In the C declaration parser, this token converter changes from declaration to declaration, as more and more typedef'd names are collected. (They are collected in a Set now.)
I attach a patch that modifies base/syntax/Parsers.hs and c2hs/c/CParser.hs accordingly. As the patched functions didn't change for quite a while, this patch should be quite independent of the version of c2hs that you have got. Could you please profile the patched parser again and see whether its space and runtime behaviour changed?
Here's the stats: Original before patch: time (non-profiling, -O) : 1m9 time (profiling) : user 2m51 total alloc : 3.4 G time spent in parseCExtDeclList : 72% morphTypeNames : 54% alloc spent in parseCExtDeclList : 70% morphTypeNames : 63% With Manuel's patch: time (non-profiling, -O) : user 0m41 time (profiling) : user 1m35 total alloc : 1.4 G time spent in parseCExtDeclList : 43% morphTypeNames : 7% alloc spent in parseCExtDeclList : 26% morphTypeNames : 8% So that's a significant change. Cheers Manuel. I've rebuilt gtk2hs with this and it doesn't seem to break anything. If you're happy with this patch, I'll apply it in our c2hs fork. BTW I was wondering what the --keep option does, I was hoping it would cache the information extracted from the C header files, but I guess not. Would this be very difficult? As you know, gtk2hs has a fork of c2hs with a patch to accept multiple .chs files, which saves time by saving the header information in memory. It would probably be almost as quick if this information was cached in a file. This would have the advantage that people wouldn't need to change their build systems to process multiple .chs files in one go (as gtk2hs has done) which is rather a pain. If we could do this, it would get us much closer to getting gtk2hs back to using the mainline version of c2hs. (The only other change as I recall is a quick fix to make ForeignPtrs do what we want. I understand you're weren't happy with it as a long-term solution) Duncan
participants (3)
-
Axel Simon
-
Duncan Coutts
-
Manuel M T Chakravarty