
On Wed, May 6, 2015 at 6:08 AM, Herbert Valerio Riedel
Hello *,
As you may be aware, GHC's `{-# LANGUAGE CPP #-}` language extension currently relies on the system's C-compiler bundled `cpp` program to provide a "traditional mode" c-preprocessor.
This has caused several problems in the past, since parsing Haskell code with a preprocessor mode designed for use with C's tokenizer has caused already quite some problems[1] in the past. I'd like to see GHC 7.12 adopt an implemntation of `-XCPP` that does not rely on the shaky system-`cpp` foundation. To this end I've created a wiki page
https://ghc.haskell.org/trac/ghc/wiki/Proposal/NativeCpp
to describe the actual problems in more detail, and a couple of possible ways forward. Ideally, we'd simply integrate `cpphs` into GHC (i.e. "plan 2"). However, due to `cpp`s non-BSD3 license this should be discussed and debated since affects the overall-license of the GHC code-base, which may or may not be a problem to GHC's user-base (and that's what I hope this discussion will help to find out).
So please go ahead and read the Wiki page... and then speak your mind!
Thanks for writing this up, btw! It's nice to put the mumblings we've had for a while down 'on paper'.
Thanks, HVR
[1]: ...does anybody remember the issues Haskell packages (& GHC) encountered when Apple switched to the Clang tool-chain, thereby causing code using `-XCPP` to suddenly break due to subtly different `cpp`-semantics?
There are two (major) differences I can list, although I can only provide some specific examples OTTOMH: 1) Clang is more strict wrt language specifications. For example, GCC is lenient and allows a space between a macro identifier and the parenthesis denoting a parameter list; so saying 'FOO (x, y)' is valid with GCC (where FOO is a macro), but not with Clang. Sometimes this trips up existing code, but I've mostly seen it in GHC itself. 2) The lexing rules for C and Haskell simply are not the same in general. For example, what should "FOO(a' + b')" parse to? Well, in Haskell, 'prime' is a valid component from an identifier and in this case the parse should be "a prime + b prime", but in C the ' character is identified as beginning the start of a single-character literal, and a strict preprocessor like Clang's will reject that. In practice, I think people have mostly just avoided arcane lexer behaviors that don't work, and the only reason this was never a problem was because GCC or some variant was always the 'standard' C compiler GHC could rely on.
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/