Retrieving Haddock comments with haskell-src-exts

I'm writing a small tool to help to analyse Haddock comments in Haskell source files to help me to indicate any potential breakage to documentation in existing source files. Currently I'm doing the parsing with the GHC's ‘parser’ function with Opt_Haddock set and I filter out everything I don't need. There are problems with this approach: determining any extensions and options used is vital for valid parsing of the files and a large amount of source files outright fails to parse without this information. Fortunately, haskell-src-exts exists and it can deal with all (or most) of this for me. Unfortunately, there doesn't seem to be any way to get it to recognise Haddock comments and the only option available is all comments or no comments at all. It's not easily possible to stitch these together without analysing the whole parse result. Currently I'm thinking of parsing out extensions and pragmas used using haskell-src-exts and then feeding those to GHC, effectively parsing the second time. Is there a way to avoid this? There's ‘lexTokenStream’ but I believe it has similar problems to ‘parser’, that is, needing to know the extensions beforehand. -- Mateusz K.

Hi Mateusz,
haskell-src-exts is not haddock-aware I'm afraid, so I don't have any real
solution for you. The one you mention, i.e. going through the whole parse
result and stiching things together manually seems like the best bet if you
want to use haskell-src-exts throughout.
In the longer run, it would be nice to have haddock support in
haskell-src-exts, so ideas regarding what kind of interface you would like
to see are most welcome. :-)
Cheers, Niklas
On Wed, Aug 14, 2013 at 4:57 PM, Mateusz Kowalczyk
I'm writing a small tool to help to analyse Haddock comments in Haskell source files to help me to indicate any potential breakage to documentation in existing source files.
Currently I'm doing the parsing with the GHC's ‘parser’ function with Opt_Haddock set and I filter out everything I don't need. There are problems with this approach: determining any extensions and options used is vital for valid parsing of the files and a large amount of source files outright fails to parse without this information.
Fortunately, haskell-src-exts exists and it can deal with all (or most) of this for me. Unfortunately, there doesn't seem to be any way to get it to recognise Haddock comments and the only option available is all comments or no comments at all. It's not easily possible to stitch these together without analysing the whole parse result.
Currently I'm thinking of parsing out extensions and pragmas used using haskell-src-exts and then feeding those to GHC, effectively parsing the second time. Is there a way to avoid this?
There's ‘lexTokenStream’ but I believe it has similar problems to ‘parser’, that is, needing to know the extensions beforehand. -- Mateusz K.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Mateusz,
haskell-src-exts is not haddock-aware I'm afraid, so I don't have any real solution for you. The one you mention, i.e. going through the whole parse result and stiching things together manually seems like the best bet if you want to use haskell-src-exts throughout. The main problem with this approach is that we get comments (and their SrcLoc) as a separate list. While it's trivial to check whether the comment starts with a ‘|’ or ‘^’ and it's not even hard to find any lines immediately following it, Haddock comments can appear in weird
On 14/08/13 19:02, Niklas Broberg wrote: positions and trail until a line of code is encountered. Right now, it's rather hard to tell -- | Foo -- bar and -- | Foo someCode = undefined -- bar apart. I think this is the only option at the moment if I'm to use haskell-src-exts.
In the longer run, it would be nice to have haddock support in haskell-src-exts, so ideas regarding what kind of interface you would like to see are most welcome. :-)
For my use, I do not care about anything but Haddock comments. I don't care where or why they appear, what they are attached to &c. With my current method, I just throw all this information away (it's given by GHC). For general use, I imagine that it'd be useful for the program to combine multi-line Haddock comments into a single one and attach it to whatever it is documenting. I'm not sure how haskell-src-exts is implemented but I don't think it's plausible to do this without going out to GHC and asking it for the information. Incidentally, this is precisely the problem I'm trying to solve with help of haskell-src-exts.
Cheers, Niklas
Thanks! -- Mateusz K.

Hi again,
Hmm. I see the difficulty here, and eventually I would want to have support
for this, but alas, not yet. If you come up with any solution that doesn't
involve GHC (or only marginally so), I'd love to hear it.
Cheers, Niklas
On Wed, Aug 14, 2013 at 8:57 PM, Mateusz Kowalczyk
Hi Mateusz,
haskell-src-exts is not haddock-aware I'm afraid, so I don't have any real solution for you. The one you mention, i.e. going through the whole parse result and stiching things together manually seems like the best bet if you want to use haskell-src-exts throughout. The main problem with this approach is that we get comments (and their SrcLoc) as a separate list. While it's trivial to check whether the comment starts with a ‘|’ or ‘^’ and it's not even hard to find any lines immediately following it, Haddock comments can appear in weird
On 14/08/13 19:02, Niklas Broberg wrote: positions and trail until a line of code is encountered. Right now, it's rather hard to tell
-- | Foo
-- bar and
-- | Foo someCode = undefined -- bar
apart. I think this is the only option at the moment if I'm to use haskell-src-exts.
In the longer run, it would be nice to have haddock support in haskell-src-exts, so ideas regarding what kind of interface you would
like
to see are most welcome. :-) For my use, I do not care about anything but Haddock comments. I don't care where or why they appear, what they are attached to &c. With my current method, I just throw all this information away (it's given by GHC).
For general use, I imagine that it'd be useful for the program to combine multi-line Haddock comments into a single one and attach it to whatever it is documenting. I'm not sure how haskell-src-exts is implemented but I don't think it's plausible to do this without going out to GHC and asking it for the information. Incidentally, this is precisely the problem I'm trying to solve with help of haskell-src-exts.
Cheers, Niklas
Thanks!
-- Mateusz K.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Niklas Broberg
writes: Hmm. I see the difficulty here, ...
On Wed, Aug 14, 2013 at 8:57 PM, Mateusz Kowalczyk wrote: ... The main problem with this approach is that we get comments (and their SrcLoc) as a separate list.
Hi Niklas, Mateusz, It seems that haskell-src-exts has done quite a lot of the hard work (impressive!). Function parseFileWithComments returns the AST annotated with SrcLoc for each major node, and with a separate list of comments annotated with SrcSpan. So it should be possible to reconstitute the source and intersperse the comments between the nodes(?) -- or detect where a comment spans code. (I'm also looking at the discussion about comments on Niklas's announcement of 1.14.0 ) Could there be a style of prettyPrint that carries the list of comments (as a continuation) alongside walking the AST, and consumes/outputs each comment where the comment's Span falls between the nodes' Loc? Would this need too much lookahead? (And thank you to Adam for introducing me to the joys of source-munging. http://www.haskell.org/pipermail/haskell-cafe/2013-August/108426.html .) AntC
participants (3)
-
AntC
-
Mateusz Kowalczyk
-
Niklas Broberg