
Hi, I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell? Many thanks, Chris.

On Tue, Jan 24, 2012 at 4:06 AM, Christopher Brown
Hi,
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
I'm not aware of one. When it comes to parsing C++, I've always been a fan of this essay: http://www.nobugs.org/developer/parsingcpp/ It's a hobbyist's tale of looking into parsing C++ and then an explanation of why he gave up. It's older, so perhaps the state-of-the-art has advanced since then. Antoine

On 24 Jan 2012, at 11:06, Christopher Brown wrote:
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
There is a yaccable grammar http://www.parashift.com/c++-faq-lite/compiler-dependencies.html#faq-38.11 You might run it through a parser generator that outputs Haskell code. http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools Hans

On Tue, Jan 24, 2012 at 2:06 AM, Christopher Brown
Hi,
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
I don't think one exists. I've heard it's quite difficult to get template parsing working in an efficient manner. My understanding is that "real" C++ compilers use the Edison Design Group's parser: http://www.edg.com/index.php?location=c_frontend For example, the Intel C++ compiler uses the edg front-end: http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler I thought even microsoft's compiler (which is surprisingly c++ compliant) uses it but I can't find details on google about that. There is at least one open source project using it, rose, so it's not unthinkingable to use it from Haskell: http://rosecompiler.org/ Rose has had working haskell bindings in the past but they have bit rotted a bit. With rose you get support for much more than parsing C++. You also get C and Fortran parsers as well as a fair bit of static analysis. The downside is that rose is a big pile of C++ itself and is hard to compile on some platforms. If you made a BSD3 licensed, fully functional, efficient C++ parser that would be great. If you made it so that it preserves comments and the input well enough to do source to source transformations (unparsing) that would be very useful. I often wish I had rose implemented in Haskell instead of C++. Jason

Hi Everyone, Thanks for everyone's kind responses: very helpful so far! I fully appreciate and understand how difficult writing a C++ parser is. However I may need one for our new Paraphrase project, where I may be targeting C++ for writing a refactoring tool. Obviously I don't want to start writing one myself, hence I was asking if anyone new about an already existing implementation. Rose looks interesting, I'll check that out, thanks! Chris. On 24 Jan 2012, at 14:40, Jason Dagit wrote:
On Tue, Jan 24, 2012 at 2:06 AM, Christopher Brown
wrote: Hi,
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
I don't think one exists. I've heard it's quite difficult to get template parsing working in an efficient manner.
My understanding is that "real" C++ compilers use the Edison Design Group's parser: http://www.edg.com/index.php?location=c_frontend
For example, the Intel C++ compiler uses the edg front-end: http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler
I thought even microsoft's compiler (which is surprisingly c++ compliant) uses it but I can't find details on google about that.
There is at least one open source project using it, rose, so it's not unthinkingable to use it from Haskell: http://rosecompiler.org/
Rose has had working haskell bindings in the past but they have bit rotted a bit. With rose you get support for much more than parsing C++. You also get C and Fortran parsers as well as a fair bit of static analysis. The downside is that rose is a big pile of C++ itself and is hard to compile on some platforms.
If you made a BSD3 licensed, fully functional, efficient C++ parser that would be great. If you made it so that it preserves comments and the input well enough to do source to source transformations (unparsing) that would be very useful. I often wish I had rose implemented in Haskell instead of C++.
Jason

On Tue, Jan 24, 2012 at 6:54 AM, Christopher Brown
Hi Everyone,
Thanks for everyone's kind responses: very helpful so far!
I fully appreciate and understand how difficult writing a C++ parser is. However I may need one for our new Paraphrase project, where I may be targeting C++ for writing a refactoring tool. Obviously I don't want to start writing one myself, hence I was asking if anyone new about an already existing implementation.
Rose looks interesting, I'll check that out, thanks!
I did some more digging after sending my email. I didn't learn about GLR parser when I was in school, but that seems to be what the cool compilers use these days. Then I discovered that Happy supports GLR, that is happy! Next I found that GLR supposedly makes C++ parsing much easier than LALR, "The reason I wrote Elkhound is to be able to write a C++ parser. The parser is called Elsa, and is included in the distribution below." The elsa documentation should give you a flavor for what needs to be done when making sense of C++: http://scottmcpeak.com/elkhound/sources/elsa/index.html NB: I don't think it's been seriously worked on since 2005 so I assume it doesn't match the latest C++ spec. The grammar that elsa parses is here, one warning is that it doesn't reject all invalid programs (eg., it errs on the side of accepting too much): http://scottmcpeak.com/elkhound/sources/elsa/cc.gr I think the path of least resistance is pure rose without the haskell support. Having said that, I think the most fun direction would be converting the elsa grammar to happy. It's just that you'll have a lot of work (read: testing, debugging, performance tuning, and then adding vendor features) to do. One side benefit is that you'll know much more about the intricacies of C++ when you're done than if you use someone else's parser. Jason

Hi Jason, Thanks very much for you thoughtful response. I am intrigued about the Happy route: as I have never really used Happy before, am I right in thinking I could take the .gr grammar, feed it into Happy to generate a parser, or a template for a parser, and then go from there? Chris. On 24 Jan 2012, at 15:16, Jason Dagit wrote:
On Tue, Jan 24, 2012 at 6:54 AM, Christopher Brown
wrote: Hi Everyone,
Thanks for everyone's kind responses: very helpful so far!
I fully appreciate and understand how difficult writing a C++ parser is. However I may need one for our new Paraphrase project, where I may be targeting C++ for writing a refactoring tool. Obviously I don't want to start writing one myself, hence I was asking if anyone new about an already existing implementation.
Rose looks interesting, I'll check that out, thanks!
I did some more digging after sending my email. I didn't learn about GLR parser when I was in school, but that seems to be what the cool compilers use these days. Then I discovered that Happy supports GLR, that is happy!
Next I found that GLR supposedly makes C++ parsing much easier than LALR, "The reason I wrote Elkhound is to be able to write a C++ parser. The parser is called Elsa, and is included in the distribution below." The elsa documentation should give you a flavor for what needs to be done when making sense of C++: http://scottmcpeak.com/elkhound/sources/elsa/index.html
NB: I don't think it's been seriously worked on since 2005 so I assume it doesn't match the latest C++ spec.
The grammar that elsa parses is here, one warning is that it doesn't reject all invalid programs (eg., it errs on the side of accepting too much): http://scottmcpeak.com/elkhound/sources/elsa/cc.gr
I think the path of least resistance is pure rose without the haskell support. Having said that, I think the most fun direction would be converting the elsa grammar to happy. It's just that you'll have a lot of work (read: testing, debugging, performance tuning, and then adding vendor features) to do. One side benefit is that you'll know much more about the intricacies of C++ when you're done than if you use someone else's parser.
Jason
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Jan 24, 2012 at 8:40 AM, Christopher Brown
Hi Jason,
Thanks very much for you thoughtful response.
I am intrigued about the Happy route: as I have never really used Happy before, am I right in thinking I could take the .gr grammar, feed it into Happy to generate a parser, or a template for a parser, and then go from there?
That's the basic idea although the details will be harder than that. Happy is a parser generator (like Bison, Yacc, and ANTLR). Happy and elsa will have very different syntax for their grammar definitions. You could explore taking the elkhound source and instead of generating C++ you could generate the input for happy, if that makes sense. A translation by hand would probably be easiest. I would highly recommend making a few toy parsers with Happy + Alex (alex is like lex or flex) to get a feel for it before trying to use the grammar from elsa. A quick google search pointed me at these examples: http://darcs.haskell.org/happy/examples/ Jason

There is also the DMS from Ira Baxter's company Semantic Design's.
This is an industry proven refactoring framework that handles C++ as
well as other languages.
I think the Antlr C++ parser may have advanced since the article
Antoine Latter link to, but personally I'd run a mile before trying to
do any source transformation of C++ even if someone were waving a very
large cheque at me.
On 24 January 2012 14:54, Christopher Brown
Hi Everyone,
Thanks for everyone's kind responses: very helpful so far!
I fully appreciate and understand how difficult writing a C++ parser is. However I may need one for our new Paraphrase project, where I may be targeting C++ for writing a refactoring tool.

Hi all,
Just to add to the list - Qt Creator contains a pretty nice (and
incremental) C++ parser.
Cheers,
Dave
On Wed, Jan 25, 2012 at 5:06 AM, Stephen Tetley
There is also the DMS from Ira Baxter's company Semantic Design's. This is an industry proven refactoring framework that handles C++ as well as other languages.
I think the Antlr C++ parser may have advanced since the article Antoine Latter link to, but personally I'd run a mile before trying to do any source transformation of C++ even if someone were waving a very large cheque at me.
On 24 January 2012 14:54, Christopher Brown
wrote: Hi Everyone,
Thanks for everyone's kind responses: very helpful so far!
I fully appreciate and understand how difficult writing a C++ parser is. However I may need one for our new Paraphrase project, where I may be targeting C++ for writing a refactoring tool.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Jan 24, 2012 at 2:06 AM, Christopher Brown
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
Check out clang: http://clang.llvm.org/ and http://hackage.haskell.org/package/LibClang The clang API is in C++ and will do just about everything you'd ever want to do with C/ObjC/C++ source. -n

I have written a C++ parser in Scheme, with a Parsec-style parser
combinator library. It can parse a large portion of C++ and I use it
to do structural comparison between ASTs. I made some macros so that
the parser combinators look like the grammar itself.
It's code is at:
http://github.com/yinwang0/ydiff/blob/master/parse-cpp.ss
A demo of the parse tree based comparison tool is at:
http://www.cs.indiana.edu/~yw21/demos/d8-3404-d8-8424.html
The bit of information I can tell you about parsing C++:
- C++'s grammar is not that bad if you see the consistency in it.
Parsing a major portion of C++ is not hard. I made the parser in two
days. It can parse most of Google's V8 Javascript compiler code. I
just need to fix some corner cases later.
- It is better to delay semantic checks to a later stage. Don't put
those into the parser. Parse a larger language first, and then walk
the parse tree to eliminate semantically wrong programs.
- Don't try translating from the formal grammar or parser generator
files for C++. They contain years of bugs and patches and you will
probably be confused looking at them. I wrote the parser just by
looking at some example C++ programs.
Cheers,
Yin
On Tue, Jan 24, 2012 at 5:06 AM, Christopher Brown
Hi,
I have stumbled across language-c on hackage and I was wondering if anyone is aware if there exists a full C++ parser written in Haskell?
Many thanks, Chris.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, Feb 1, 2012 at 12:42 PM, Yin Wang
I have written a C++ parser in Scheme, with a Parsec-style parser combinator library. It can parse a large portion of C++ and I use it to do structural comparison between ASTs. I made some macros so that the parser combinators look like the grammar itself.
It's code is at:
http://github.com/yinwang0/ydiff/blob/master/parse-cpp.ss
A demo of the parse tree based comparison tool is at:
http://www.cs.indiana.edu/~yw21/demos/d8-3404-d8-8424.html
The bit of information I can tell you about parsing C++:
Thank you for the interesting response and example code (that I haven't had a chance to look at yet). How much support do you have for templates? Jason

I haven't dealt explicitly with templates. I treat them as type
parameters (element $type-parameter). I don't check that they have
been declared at all. As explained, these are semantic checks and
should be deferred until type checking stage ;-)
Cheers,
Yin
On Wed, Feb 1, 2012 at 4:07 PM, Jason Dagit
On Wed, Feb 1, 2012 at 12:42 PM, Yin Wang
wrote: I have written a C++ parser in Scheme, with a Parsec-style parser combinator library. It can parse a large portion of C++ and I use it to do structural comparison between ASTs. I made some macros so that the parser combinators look like the grammar itself.
It's code is at:
http://github.com/yinwang0/ydiff/blob/master/parse-cpp.ss
A demo of the parse tree based comparison tool is at:
http://www.cs.indiana.edu/~yw21/demos/d8-3404-d8-8424.html
The bit of information I can tell you about parsing C++:
Thank you for the interesting response and example code (that I haven't had a chance to look at yet). How much support do you have for templates?
Jason
participants (8)
-
Antoine Latter
-
Christopher Brown
-
David Laing
-
Hans Aberg
-
Jason Dagit
-
Nathan Howell
-
Stephen Tetley
-
Yin Wang