making c2hs undrstand line pragmas

Manuel, I've got a patch to c2hs to make it do something with C style line pragmas in .chs files, eg: # 1 "gtk/Graphics/UI/Gtk/TreeList/TreeStore.chs.pp" These are produced by the C preprocessor. Currently c2hs chokes on these and so people have to use the -P option to cpp to suppress them. They are actually rather useful if the code in fact does need preprocessing (as most of gtk2hs's .chs files do) because they point to the original file name and source locations. For example it means that ghc's errors will report accurate locations in the .chs.pp file rather than reporting locations in the .chs file. c2hs already produces accurate Haskell line pragmas {-# LINE ... #-} in the .hs files it produces for exactly that reason. I want to extend that to the case that the .chs file itself has had a preprocessor used on it. One other reason to preserve the original file name is that haddock can now include links to the source files, and it uses the line pragmas to find the original source file. It doesn't do much good however if haddock links to a non-existant .chs file when the real original file was .chs.pp. So here's an example; the Gtk2Hs docs with source code links to our darcs repository: http://haskell.org/gtk2hs/docs/devel/ For example the link at the top of this page: http://haskell.org/gtk2hs/docs/devel/Graphics-UI-Gtk-Abstract-Widget.html points to: http://darcs.haskell.org/gtk2hs/gtk/Graphics/UI/Gtk/Abstract/Widget.chs.pp which is right. Of course, without the patch it'd point to the non-existant file Widget.chs. The way my patch works is to make the lexer recognise the line directives and update the current position. However to get the line directives emitted correctly we also have to insert a special token into the token stream. When it comes to finally printing the token stream, this special token puts the printer into a state in which it will add a Haskell {-# LINE ... #-} pragma before the next Haskell source fragment. The only thing that's wrong is that c2hs doesn't recognise cpp directives as the first line in a .chs file. You can see why this is so from the code below: cpp :: CHSLexer cpp = directive where directive = string "\n#" +> alt ('\t':inlineSet)`star` epsilon `lexmeta` \(_:_:dir) pos s -> -- strip off the "\n#" case dir of ... etc It's requires a cpp directive to start with a newline followed by a '#' character. I'm not sufficiently familiar with the style of c2hs's chs lexer to figure out how to fix this. Perhaps it can be done by checking if we're at the beginning of a line in a different way. Perhaps it can be done by checking the current column rather than looking for a '\n' character. -------------- So in summary: Manuel, so if you don't complain about how this change works I'll commit it in the next few days. And my my question is how to fix this issue with recognising cpp directives on the first line. Duncan

Duncan, I think this is still an outstanding issue.
I've got a patch to c2hs to make it do something with C style line pragmas in .chs files, eg:
# 1 "gtk/Graphics/UI/Gtk/TreeList/TreeStore.chs.pp"
These are produced by the C preprocessor. Currently c2hs chokes on these and so people have to use the -P option to cpp to suppress them. They are actually rather useful if the code in fact does need preprocessing (as most of gtk2hs's .chs files do) because they point to the original file name and source locations. For example it means that ghc's errors will report accurate locations in the .chs.pp file rather than reporting locations in the .chs file.
c2hs already produces accurate Haskell line pragmas {-# LINE ... #-} in the .hs files it produces for exactly that reason. I want to extend that to the case that the .chs file itself has had a preprocessor used on it.
One other reason to preserve the original file name is that haddock can now include links to the source files, and it uses the line pragmas to find the original source file. It doesn't do much good however if haddock links to a non-existant .chs file when the real original file was .chs.pp.
Ok, I see how this would be useful. Please push your patch.
The only thing that's wrong is that c2hs doesn't recognise cpp directives as the first line in a .chs file. You can see why this is so from the code below:
cpp :: CHSLexer cpp = directive where directive = string "\n#" +> alt ('\t':inlineSet)`star` epsilon `lexmeta` \(_:_:dir) pos s -> -- strip off the "\n#" case dir of
... etc
It's requires a cpp directive to start with a newline followed by a '#' character.
I'm not sufficiently familiar with the style of c2hs's chs lexer to figure out how to fix this. Perhaps it can be done by checking if we're at the beginning of a line in a different way. Perhaps it can be done by checking the current column rather than looking for a '\n' character.
Yes, that's an awkward bit in the code that has bothered me before. The lexer combinators have not neat way to check for characters appearing in a particular column. I see only two ways to proceed: * We could match on # alone and then check in the action what column we are in and do different things in dependence on this. I don't like this, as it messes up the longest match rule and might be fragile. * We can prepend a '\n' character to the source file before starting the lexing process by changing the triple passed to execLexer in the function lexCHS (and we must then also adjust the initial value of `pos' to still get accurate line numbers). This is a bit of a kludge, but it seems to be the more robust solution to me. Manuel

On Wed, 2006-05-24 at 12:02 -0400, Manuel M T Chakravarty wrote:
I think this is still an outstanding issue.
I've got a patch to c2hs to make it do something with C style line pragmas in .chs files, eg:
# 1 "gtk/Graphics/UI/Gtk/TreeList/TreeStore.chs.pp"
Ok, I see how this would be useful. Please push your patch.
Ok, will do.
The only thing that's wrong is that c2hs doesn't recognise cpp directives as the first line in a .chs file. You can see why this is so from the code below:
cpp :: CHSLexer cpp = directive where directive = string "\n#" +> alt ('\t':inlineSet)`star` epsilon `lexmeta` \(_:_:dir) pos s -> -- strip off the "\n#" case dir of
... etc
It's requires a cpp directive to start with a newline followed by a '#' character.
I'm not sufficiently familiar with the style of c2hs's chs lexer to figure out how to fix this. Perhaps it can be done by checking if we're at the beginning of a line in a different way. Perhaps it can be done by checking the current column rather than looking for a '\n' character.
Yes, that's an awkward bit in the code that has bothered me before. The lexer combinators have not neat way to check for characters appearing in a particular column. I see only two ways to proceed:
* We could match on # alone and then check in the action what column we are in and do different things in dependence on this. I don't like this, as it messes up the longest match rule and might be fragile. * We can prepend a '\n' character to the source file before starting the lexing process by changing the triple passed to execLexer in the function lexCHS (and we must then also adjust the initial value of `pos' to still get accurate line numbers). This is a bit of a kludge, but it seems to be the more robust solution to me.
Ok, I'll push my current patch and then we can look at one of these potential solutions. Duncan
participants (2)
-
Duncan Coutts
-
Manuel M T Chakravarty