
Hi there, Just a few thoughts on preprocessing issues. First I should say that I have not really kept up with the Cabal development, so apologies if I say something totally obvious or something that just wouldn't work with Cabal. Simon Marlow wrote:
Malcolm Wallace wrote:
Isaac Jones wrote:
Since Cabal is pretty new, this won't break any existing Cabal packages, and when converting non-Cabal packages to Cabal, there is some work to do anyway, so why not just adopt this as one extra rule to follow? This is just a suggestion - I'm in two minds whether it is a good idea myself, but it is at least worth considering the possibility.
And I suppose the literate version would be .lcpphs? (unlit first, then cpp, then Haskell). It would be more consistent and arguably correct, but I'm not sure that we should do it.
While arguably correct, such suffixes look pretty awkward to me. In the Yampa build system we simply adopted the suffix ".cpp" to indicate that C preprocessing was necessary. That would give ".hs.cpp" and ".lhs.cpp" respectively. One could argue about the correctness of that, but it is at least simple, compositional, and plays well with other suffixes. The literate convention is also specified in the Haskell 98 specification, whereas CPP and other preprocessing is not, as far as I can remember. From that perspective, the ".cpp" convention is not totally unreasonable either. Either way, personally I mostly see advantages of adopting suffixes to indicate the need for preprocessing. For example: * It is clear to anyone who's looking at the sources which files needs to be preprocessed. This is particularly important for CPP processing since CPP does not really understand Haskell, and there thus are traps for the unwary. * Suffixes makes it very easy to preprocess only selected files, which again is a particularly good idea when CPP is involved. Of course, there are other ways of doing preprocessing selectively. Maybe Cabal has such mechanisms, making this (mostly) a non-issue? (Indeed, the Yampa build system did provide an alternative way as well). For example, it is sometimes necessary to pass specific flags to the compiler for specific source files only, and if Cabal already supports that, then I guess passing "-E" selectively would just be a special case.
Another solution is to adopt a new extension for plain Haskell, say .phs. The conversion from .hs to .phs is either via CPP or just 'cat', depending on some setting somewhere. Also, I recommend that we use the compiler itself for preprocessing:
ghc -E foo.hs -o foo.phs because only the compiler knows what the values for the preprocessor symbols __HASKELL__, __GLASGOW_HASKELL__, i386_TARGET_ARCH etc. should be. Otherwise we'll have to run the compiler during ./setup configure to find out the values of these symbols (isn't that what hmake does? What about when a new compiler comes along?).
Yes, that's probably true. Malcolm Wallace wrote:
You are right that the compiler is best placed to define pp symbols, so this is all very well, but neither nhc98 nor Hugs currently have the -E option to stop immediately after pp. And come to think of it, the only real reason to have cpp done separately at all is because Hugs does not have a preprocessor call builtin, like ghc and nhc98 do. So maybe the best solution is to ship Hugs with -F"cpphs.hugs" enabled by default? Then no separate extension would be required, and Cabal could just defer all cpp-ing to the compiler.
In the Yampa build system we took the approach that installing a library for use by Hugs meant running all the preprocessing at installation time and thus installing preprocessed sources. I think that was the right approach. It simplifies for the end-user, in particular when a multitude of pre-processing is involved. E.g. they don't need to pass the right flags to Hugs and they don't need to worry about having the preprocessors in their paths etc. (The installation of a library could be system-wide, e.g. the person doing the installation might not be the same as the one actually using it later.) Additionally, there is a performance benefit, which potentially could be significant depending on what preprocessors that are involved. Similar arguments would apply if one for some reason wanted to install libraries for GHCi in source form.
Another thought occurs to me. Does anyone use cpp markings in conjunction with any other preprocessors? For instance, cpp + Happy, cpp + DRiFT? What ordering applies there? I'm inclined to think that it would nearly always be cpp first, other preprocessors second, but perhaps not? After all, the cpp markings would probably still be conditioned on the end compiler, not on the intermediate pp?
If one adopts a convention that indicates the preprocessing to be done by a simple suffix, then I think that would allow the programmer to control the ordering if necessary, avoiding building in speculative assumptions in Cabal? Speaking of suffixes and preprocessing, I've encountered another problem in the context of Yampa that might be worth rising. Originally (well, still, actually), we used Ross Patterson's arrow pre-processor for the arrow syntactic sugar. We then adopted the convention that the suffix ".as" was for "arrowized Haskell source", and ".las" for "literate arrowized Haskell source". I don't think this choice of prefixes was particularly brilliant, but this does not really matter. However, we now have the situation that GHC supports the arrow syntax directly. This begs the question of how to arrange things if one want to distribute arrowized code that also should work for other compilers/interpreters, since preprocessing still would be necessary for those other systems. In particular, which suffix should one use for the arrowized files in question? While I guess one could stick to ".hs" and then resort to various build-system trickery to get the preprocessing done when necessary, it seems to me that a more straightforward solution might be to agree on a suffix that indicates that the Arrow syntax is used (say ".arr"). Systems that do support the arrow syntax could then accept e.g. ".hs.arr" as a synonym to ".hs", or, if necessary, could look at the extension for enabling the syntactic extension. This solution is not without its problems, though, and I'm not sure what the best approach would be. But the issue is similar to some systems having built-in CPP support and others not, and it might make sense to adopt a similar solution. Of course, if arrow support is in the works for the other compilers, this last problem might not be so much of an issue. Best regards, /Henrik -- Henrik Nilsson School of Computer Science and Information Technology The University of Nottingham nhn@cs.nott.ac.uk This message has been scanned but we cannot guarantee that it and any attachments are free from viruses or other damaging content: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.