
One proposal was to allow both something like
{--- -}
and
-- -- -- ... --
This is OK, but maybe a little too subtle. E.g. one might be forgiven for wondering why not writing ---- or {- --.
On the other hand, generalizing Simon's tagged proposal seems to work well:
{- @DOC ... -}
and
-- @DOC -- .. --
Please - if we're going to adopt this kind of style, don't make it needlessly divergent from existing conventions. There's no reason to use {- @DOC -} instead of {-# DOC #-}. The end of line version could be: --# DOC -- Documentation goes here... -- But, some people clearly prefer a style which is easier on the eye, in which case the {--- -} and --- are fine by me. Cheers, Simon

On Tue, 13 Feb 2001, Simon Marlow wrote:
--# DOC
This is not a legal comment but an operator. I like --- One problem with documentation comments: sometimes they are before the definition of the commented thing, but sometimes after. They are before functions (which are long) but might come aftre type definitions (which are short). We should probably require them to come before. This has sometimes the effect that the comment repeats the name of the commented thing. -- Marcin 'Qrczak' Kowalczyk

One problem with documentation comments: sometimes they are before the definition of the commented thing, but sometimes after. They are before functions (which are long) but might come aftre type definitions (which are short). We should probably require them to come before.
I don't see much difficulty in allowing both before and after positions, leaving it to the tools to re-format appropriately. Why introduce a restriction? Regards, Malcolm

On Tue, 13 Feb 2001 malcolm-hs@cs.york.ac.uk wrote:
I don't see much difficulty in allowing both before and after positions, leaving it to the tools to re-format appropriately. Why introduce a restriction?
Because it's not clear to which entity the comment refers. Do we want simething like this?: ---------------------------------- type Entity1 = Foo --- Comment for Entity2 type Entity2 = Bar ---------------------------------- type Entity1 = Foo --- Comment for Entity1 type Entity2 = Bar ---------------------------------- type Entity1 = Foo --- Ambiguous comment type Entity2 = Bar ---------------------------------- type Entity1 = Foo --- Ambiguous comment type Entity2 = Bar ---------------------------------- -- Marcin 'Qrczak' Kowalczyk

On Tue, 13 Feb 2001 malcolm-hs@cs.york.ac.uk wrote:
One problem with documentation comments: sometimes they are before the definition of the commented thing, but sometimes after. They are before functions (which are long) but might come aftre type definitions (which are short). We should probably require them to come before.
I don't see much difficulty in allowing both before and after positions, leaving it to the tools to re-format appropriately. Why introduce a restriction?
I guess tools will get confused by data A = A Int --- -- bla -- type B = String --- -- blub newtype C = C Bool Is "bla" and "blub" in 'before' or 'after' position? I don't like restrictions like disallowing empty lines before/after the comments to help the tool find the right comments. I'd prefer a convention I've suggested before: Association of comments and source is done by either placing the comment before (or somewhere at the beginning of) the thing to be documented, or - alternatively - by explicitily mentioning the thing to document in the comment (like "@DOC A" or somehting similar). This combination allows - comments next to the source - separate documentation in different files Putting comments _after_ the definition is then - of course - possible by giving a "@DOC". Maybe I'm missing something here. So, what documentation scenarios are not at all/not appropriately met by this dual approch? Here's an example for my suggestion: --- -- documentation with end-of-line comments type B = ... {--- For those who like the other style. -} newtype C = ... f :: Int -> Bool -> String f ... = ... ... g :: A -> B -> C g a b --- -- documentation in Jan's style = .... ----------- --- @DOC f -- Here's the very long documentation for @f@ which I didn't -- want to put into the source file.... Regards, Armin

On Tue, 13 Feb 2001, Armin Groesslinger wrote:
I guess tools will get confused by
data A = A Int
--- -- bla --
type B = String
--- -- blub
newtype C = C Bool
Is "bla" and "blub" in 'before' or 'after' position? I don't like restrictions like disallowing empty lines before/after the comments to help the tool find the right comments.
Right. The above examples indicate that not only the tools but also the human beings will be confused by the above sequence. You did not give a clue about the meaning of the "bla" and "blub" and hence I am not any wiser what they are referring to. Only when "blub" contextually refers to either entity "B" or entity "C" can a human reader make such association. But that proves the point that your example has the low level of readability: any additional time spent on sorting these things out in the reader's mind is a waste of his time. So this is not only a matter of "helping the tool find the right comments" but also a matter of readability of the sources. And that means that blank lines serve their purpose too and should be used with great care. No ambiguities arise when the comments are tightly coupled with entities they describe - whether they are before, after, or just at the beginning of the entities they belong to. By the way, I sometimes use "Jan's style" commenting even for datatype definitions, especially when I have something important to say there. This helps with very tight binding of the two. And when I move the things around the comments are not lost by some accident. newtype C -- Some description .. = C Bool I suggest that both of your example comments "bla" and "blub" should be treated as floating comments: either as the decorations - which should be ignored alltogether, or as some sort of headers that have some global important meaning for the writer of the module. Look at the Hugs Prelude for example. There are plenty of such separators, grouping together the things that belong to numbers, to lists, etc. They are usually the "one liners" but on occasion you can also find a longer global description, as in module Random. Personally I use such separators to indicate categories of functions, which is helpful when number of functions in the module exceeds some readability level. It helps to find the things and give extra meaning to the things inside the category. Those who viewed the samples I posted few days ago should have the clear idea how it works. Currently the extractor relies on an extra cue (at the moment it is --: Blah, but I was reminded that this is illegal according to the Report. Easy to change though). Categories are helpful things, unless one uses submodules instead. They exist in Smalltalk, Objective C, Eiffel. ISE Eiffel goes even one step further and insists on standard ctegory names all across their libraries. But the usage of headers in Hugs' Prelude goes beyond categories of functions; they also separate groups of methods, or combinations of datatypes and the functions, etc. Should the documention standard respect the module authors' wishes and provide at least some sort of support for 'categories'? I think, it should. Jan

On Tue, 13 Feb 2001, Jan Skibinski wrote:
By the way, I sometimes use "Jan's style" commenting even for datatype definitions, especially when I have something important to say there. This helps with very tight binding of the two. And when I move the things around the comments are not lost by some accident.
newtype C -- Some description .. = C Bool
Sounds reasonable to me :-)
Categories are helpful things, unless one uses submodules instead. They exist in Smalltalk, Objective C, Eiffel. ISE Eiffel goes even one step further and insists on standard ctegory names all across their libraries.
But the usage of headers in Hugs' Prelude goes beyond categories of functions; they also separate groups of methods, or combinations of datatypes and the functions, etc.
Should the documention standard respect the module authors' wishes and provide at least some sort of support for 'categories'? I think, it should.
So do I. Especially when generating external documentation the user doesn't see the whole module structure and "internal" modules will appear as part of wrapper modules, so there's definitely a need for more structuring as the grouping in modules can give. Regards, Armin

Tue, 13 Feb 2001 05:58:49 -0500 (EST), Jan Skibinski
Should the documention standard respect the module authors' wishes and provide at least some sort of support for 'categories'? I think, it should.
I agree. Sections of a module could be separated by "lots of dashes" with a section title together with them (either on the same line or on the next). We can allow a tool to distinguish chapters/sections/subsections by decreasing numbers of dashes. Don't define which number corresponds to which level - just anything above a threshold (e.g. >= 8) marks a part, and different lengths can mark different levels (if the tool supports different levels). Let's say that comments marked by "--" are ignored by the documentation tool, unless it's a pretty printer of the whole source. Comments marked by "---" are basic descriptions of entities: each comment refers to an entity if it's adjacent to it, or is a standalone comment if it doesn't. Comments marked by eight dashes or more are section headers. (Or four?) These rules (after precise definitions of the details) are enough to have a concrete tool which produces a summary of a module (extracting type signatures etc. from the source), formatted in any format (HTML, LaTeX, text). Anything more specific will be compatible with this simple view: we can add keywords for more precise control, but comments with keywords can be emitted as is by a simple tool. Keywords don't require special punctuation. They can be spelled in ALL CAPS so that pure English text is not taken as a keyword by mistake. For example EXPORTED (so that we can distinguish what should go into the interface documentation and what should go only to browsable internals), and custom keywords to be specified with an invocation of a sophisticated doc tool to have project-specific subsetting of the documentation. A remaining thing is hyperlinks to different entities or modules (or fully qualified entities). It's easy to have SEE as a keyword, but we also want to mark an identifier used as a part of a sentence, e.g. To clarify, @doesDirectoryExist@ returns True if a file system object exist, and it's a directory. @doesFileExist@ returns True if the file system object exist, but it's not a directory (i.e., for every other file system object that is not a directory.) (this is a real quotation - I don't suggest to use @ in particular). The same concept can be uniformly applied to quoted chunks of code. In this case a tool can make hyperlinks to names it recognizes if it wishes. Let me stop at this point, and clarify one detail about basic recognition of kinds of comments. For a block of comment lines the form of the first line matters, so it can be equivalently written ---- Overview of the foo process ---- ---- Blah blah blah ---- blah blah blah or ---- Overview of the foo process -- -- Blah blah blah -- blah blah blah -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK

On 13 Feb 2001, Marcin 'Qrczak' Kowalczyk wrote:
Should the documention standard respect the module authors' wishes and provide at least some sort of support for 'categories'? I think, it should.
I agree. Sections of a module could be separated by "lots of dashes" with a section title together with them (either on the same line or on the next).
Most of what you suggest seems acceptable. Few minor points though: ---------------------------------------------------- ---- Subsection blah ---------------------------------------------------- The first and the third line are just visual ASCII hints but they have no real meaning. A formatter will use its own means of presentation, so they could be safely stripped off. My extractor currently arbitrarily ignores any top level sequence of 8 dashes or more.
A remaining thing is hyperlinks to different entities or modules (or fully qualified entities). It's easy to have SEE as a keyword, but we also want to mark an identifier used as a part of a sentence, e.g.
To clarify, @doesDirectoryExist@ returns True if a file system object exist, and it's a directory. @doesFileExist@ returns True if the file system object exist, but it's not a directory (i.e., for every other file system object that is not a directory.)
OK, I see your point about the SEE business. But the second part of my previous suggestion can be rephrased using your caps idea: To clarify, FUNCTION doesDirectoryExist returns True ... or To clarify, FUNCTION 'doesDirectoryExist' returns True ... The first version carries the semantics only, while the second one has an additional formatting information -- which can be nevertheless considered redundant. BTW, I see Malcolm's point about single quote; but (') just looks lighter in plain ascii and it maybe worthwhile supporting it by a careful parsing. Alternatively, a simple rule: "Double the single quote, or escape it (\'), if you want it printed" would do. Smalltalk uses the doubling rule (their strings are in single quotes, and double quotes signify comments), and its users are happy enough with it. Underscores are fine, but they are also ambiguous since some of the function names use them as well. And I bet that some of us dislike Hungarian notation and use underscores aboundantly. But the same doubling or escape rules could be applied here as well. Jan

Tue, 13 Feb 2001 11:04:58 -0500 (EST), Jan Skibinski
---------------------------------------------------- ---- Subsection blah ---------------------------------------------------- The first and the third line are just visual ASCII hints but they have no real meaning. A formatter will use its own means of presentation, so they could be safely stripped off. My extractor currently arbitrarily ignores any top level sequence of 8 dashes or more.
This is exactly how my proposal treats it. A sequence of comment lines is logically a single comment, so after stripping the comment mark from each line we get ["", "Subsection blah", ""]. Empty paragraphs are removed. The first comment mark had more than 8 (or 4?) dashes so it's a section header. Finally we get <H1>Subsection blah</H1>.
OK, I see your point about the SEE business. But the second part of my previous suggestion can be rephrased using your caps idea:
To clarify, FUNCTION doesDirectoryExist returns True ... or To clarify, FUNCTION 'doesDirectoryExist' returns True ...
Hmm, although Manuel's (EXPORTED) looks nice - it's like a graphical icon - emphasizing *that* word in a sentence looks not so nice :-( When talking about Haskell on mailing lists, I tend to write "defined in class Monad" or "imported from module List", even though "defined in the Monad class" and "imported from the List module" is probably more correct English. But the phrases "class Monad" and "module List" are used in the actual Haskell's syntax, so this can be treated as more like a Haskell's keyword than an English word. So perhaps this way of disambiguating identifiers would do. It works for modules, classes, and types (where type 'Int' is a type constructor and type 'a' is a type variable). There is no keyword for value variables, but this is the obvious default for lowercase identifiers. There is no keyword for value constructors, but we can make it the default for uppercase identifiers, such that if there is both a type called Id and a value constructor called Id, they can be referred to as type 'Id' and 'Id' respectively.
BTW, I see Malcolm's point about single quote; but (') just looks lighter in plain ascii and it maybe worthwhile supporting it by a careful parsing.
Technically it can work without escaping: 'class'' is unambiguously a quoted class-prime. An identifier cannot _begin_ with a prime. Primed identifiers are IMHO rarely used at the toplevel so they are rarely referenced - it should not be a problem in practice that the reference looks confusing. They are most common for local variables and variables in function argument patterns: (x:+y) / (x':+y') = (x*x''+y*y'') / d :+ (y*x''-x*y'') / d where x'' = scaleFloat k x' y'' = scaleFloat k y' k = - max (exponent x') (exponent y') d = x'*x'' + y'*y'' Unfortunately 'x' can be used as a character literal in a comment. We can probably live with that: toplevel definitions are rarely single-letter identifiers. What about operators? They can be one character long and we probably more often talk about punctuation as character constants than letters as character constants, so '^' is really ambiguous. Live with that and treat it as a hyperlink if ^ is defined? Or use (^) instead? Single quotes work also for short expressions embedded in the text, but not for multiline code snippets. I remember Turbo Pascal's convention for help files: indented text is presented verbatim, and text written at the first column is formatted into lines to the window's width, ignoring the original line breaks. I'm sure it will be good idea for comments, as in the example above. There is no need for markup yet! This fails only when someone tries to write enumerated or bulleted list. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK

qrczak@knm.org.pl (Marcin 'Qrczak' Kowalczyk) wrote,
Let me stop at this point, and clarify one detail about basic recognition of kinds of comments. For a block of comment lines the form of the first line matters, so it can be equivalently written
---- Overview of the foo process ---- ---- Blah blah blah ---- blah blah blah
or
---- Overview of the foo process -- -- Blah blah blah -- blah blah blah
Yes, that is useful and not unlike comment conventions of, eg, Elisp. Cheers, Manuel

13 Feb 2001 17:02:52 GMT, Marcin 'Qrczak' Kowalczyk
---- Overview of the foo process ---- ---- Blah blah blah ---- blah blah blah
or
---- Overview of the foo process -- -- Blah blah blah -- blah blah blah
I meant triple dashes here (the proposal evolved during being written). The form with all comments mark the same is the canonical one. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK

Please - if we're going to adopt this kind of style, don't make it needlessly divergent from existing conventions. There's no reason to use {- @DOC -} instead of {-# DOC #-}.
I agree, let's not be divergent. {- @DOC -} is just as ugly as {-# DOC #-}. And I would still be extremely reluctant to use either of them. :-( The capital letters and #/@ are just too ... big and black! (Your fonts may differ.)
The end of line version could be:
--# DOC -- Documentation goes here... --
Sorry, that breaks Haskell'98. --# is a valid operator symbol.
But, some people clearly prefer a style which is easier on the eye, in which case the {--- -} and --- are fine by me.
Triple dashes are nice because they * have low visual density (not too obtrusive) * look the same for nested/end-of-line varieties without breaking H'98 Regards, Malcolm

Tue, 13 Feb 2001 12:18:13 +0000, malcolm-hs@cs.york.ac.uk
which case the {--- -} and --- are fine by me.
Triple dashes are nice because they * have low visual density (not too obtrusive) * look the same for nested/end-of-line varieties without breaking H'98
For me the analogue of --- is {-- -}, because the analogue of -- is {- -}. -- __("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK

qrczak@knm.org.pl (Marcin 'Qrczak' Kowalczyk) wrote,
Tue, 13 Feb 2001 12:18:13 +0000, malcolm-hs@cs.york.ac.uk
pisze: which case the {--- -} and --- are fine by me.
Triple dashes are nice because they * have low visual density (not too obtrusive) * look the same for nested/end-of-line varieties without breaking H'98
For me the analogue of --- is {-- -}, because the analogue of -- is {- -}.
True. Manuel
participants (7)
-
Armin Groesslinger
-
Jan Skibinski
-
malcolm-hs@cs.york.ac.uk
-
Manuel M. T. Chakravarty
-
Marcin 'Qrczak' Kowalczyk
-
qrczak@knm.org.pl
-
Simon Marlow