RFC: Properly stated origin of code contributions

Hi, GHC's Git history has (mostly) a good track record of having properly attributed authorship information in the recent past; Some time ago I've even augmented the .mailmap file to fix-up some of the pre-Git meta-data which had mangled author/committer meta-data (try 'git shortlog -sn' if you're curious) However, I just noticed that http://git.haskell.org/ghc.git/commitdiff/322810e32cb18d7749e255937437ff2ef9... landed recently, which did change a significant amount of code, but at the same time the author looks like a pseudonym to me (and apologies if I'm wrong). Other important projects such as Linux or Samba, just to name two examples, reject contributions w/o a clearly stated origin, and explicitly reject anonymous/pseudonym contributions (as part of their "Developer's Certificate of Origin" policy[1] which involves a bit more than merely stating the real name) I believe the GHC project should consider setting some reasonable ground-rules for contributions to be on the safe side in order to avoid potential copyright (or similiar) issues in the future, as well as giving confidence to commercial users that precautions are taken to avoid such issues. Comments? Cheers, hvr [1]: See http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Document...

| I believe the GHC project should consider setting some reasonable | ground-rules for contributions to be on the safe side in order to | avoid potential copyright (or similiar) issues in the future, as well | as giving confidence to commercial users that precautions are taken to | avoid such issues. I agree with that. We could list the policy on https://ghc.haskell.org/trac/ghc/wiki/WorkingConventions, or a page linked from there. One possibility would be to add a "Contributors" section to the GHC Team page https://ghc.haskell.org/trac/ghc/wiki/TeamGHC, and ask anyone submitting a patch to add an entry describing themselves (including their real name) to that page. By "contributor" I means someone who is submitting a patch but is not yet a committer. We could have separate sub-pages for committers and contributors. That would give a way to celebrate contributors, as well as a way to identify them. Simon | -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Herbert Valerio Riedel | Sent: 30 October 2014 08:13 | To: ghc-devs | Subject: RFC: Properly stated origin of code contributions | | Hi, | | GHC's Git history has (mostly) a good track record of having properly | attributed authorship information in the recent past; Some time ago | I've even augmented the .mailmap file to fix-up some of the pre-Git | meta-data which had mangled author/committer meta-data (try 'git | shortlog -sn' if you're curious) | | However, I just noticed that | | | http://git.haskell.org/ghc.git/commitdiff/322810e32cb18d7749e255937437 | ff2ef99dca3f | | landed recently, which did change a significant amount of code, but at | the same time the author looks like a pseudonym to me (and apologies | if I'm wrong). | | Other important projects such as Linux or Samba, just to name two | examples, reject contributions w/o a clearly stated origin, and | explicitly reject anonymous/pseudonym contributions (as part of their | "Developer's Certificate of Origin" policy[1] which involves a bit | more than merely stating the real name) | | I believe the GHC project should consider setting some reasonable | ground-rules for contributions to be on the safe side in order to | avoid potential copyright (or similiar) issues in the future, as well | as giving confidence to commercial users that precautions are taken to | avoid such issues. | | Comments? | | Cheers, | hvr | | [1]: See | http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Do | cumentation/SubmittingPatches | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://www.haskell.org/mailman/listinfo/ghc-devs

Comments? +1
I believe the GHC project should consider setting some reasonable ground-rules for contributions to be on the safe side in order to avoid potential copyright (or similiar) issues in the future, as well as giving confidence to commercial users that precautions are taken to avoid such issues. Projects like Scala and Clojure require filling in a "Contributor [License] Agreement". I have not bothered to investigate the exact purpose. My guess is that it is supposed to prevent situations
However: like "un-authorized" commiting code into the project. (Meaning: employee of company M commits code into the project but then the company says that person was not allowed to do that, beacuse the code is patented or sth and requests that the code is withdrawn or sues the project.) Somehow I feel that introducing such contributor licenses into GHC would scare away some contributors. But then again doing that could prevent some potential problems. Janek

On Thu, Oct 30, 2014 at 5:00 AM, Jan Stolarek
Projects like Scala and Clojure require filling in a "Contributor [License] Agreement". I have not bothered to investigate the exact purpose.
In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators). -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Hi, Am Donnerstag, den 30.10.2014, 09:04 -0400 schrieb Brandon Allbery:
On Thu, Oct 30, 2014 at 5:00 AM, Jan Stolarek
wrote: Projects like Scala and Clojure require filling in a "Contributor [License] Agreement". I have not bothered to investigate the exact purpose. In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators).
Given that the Linux kernel doesn’t require (paper-signed) CLAs, I do think it scales very well, and does not seem to scare off commercial users.
In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators).
As long we can properly assume that contributors license the code to us under the terms of the GHC license (which we seem to do), we got what we need. No need to hold the copyright in a single place. It’s too late for that anyways. Please avoid introducing unnecessary bureaucracy into the contributing process, especially not due to legal fear, cased from FUD and smattering. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Indeed. A cla is overkill for ghc. More over, a good CLA merely documents
that I'm granting license under BSD compatible terms, ownership transfer is
inappropriate and abusive.
At MOST, "this work is my own and I grant license for its use In ghc using
the BSD license" is plenty. And even that might be overkill.
I'm happy to ask the IP lawyers in my family for some opinions on this but
I think what we are doing now is fine.
On Oct 30, 2014 9:51 AM, "Joachim Breitner"
Hi,
Am Donnerstag, den 30.10.2014, 09:04 -0400 schrieb Brandon Allbery:
On Thu, Oct 30, 2014 at 5:00 AM, Jan Stolarek
wrote: Projects like Scala and Clojure require filling in a "Contributor [License] Agreement". I have not bothered to investigate the exact purpose. In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators).
Given that the Linux kernel doesn’t require (paper-signed) CLAs, I do think it scales very well, and does not seem to scare off commercial users.
In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators).
As long we can properly assume that contributors license the code to us under the terms of the GHC license (which we seem to do), we got what we need. No need to hold the copyright in a single place. It’s too late for that anyways.
Please avoid introducing unnecessary bureaucracy into the contributing process, especially not due to legal fear, cased from FUD and smattering.
Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On Thu, Oct 30, 2014 at 11:25 AM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
I'm happy to ask the IP lawyers in my family for some opinions on this but I think what we are doing now is fine.
As Joachim already noted, it's a bit late to switch course for GHC; you'd have to track down every past contributor. (I've been involved with projects that needed to do that; if at all possible, avoid it.) The reality, as I understand it (note that I Am Not A Lawyer(tm) but have experience with projects that have had to face the question), is that there's complex interactions between copyright law and contract law (not to mention questions of how contract law affects contributions to an open project). And both have a certain "valid until proven otherwise" aspect, which often makes it wisest to not change what's already working well enough --- especially since even asking a lawyer "on the clock" can potentially have legal implications on the whole project (but only if someone actually challenges in court and brings it up). As a result, the FUD's kinda built into the legal structure. :/ (My earlier response is not incompatible with this; the question I was answering was why a project might go with a CLA. In reality, whether the answer is *relevant* to a project is certainly open to question. One difference between the situation with GHC and the situation with Scala or Perl 6 is that the latter are also defining a language specification, which may have implications if there is a plan to submit it to an official standards body at some point. For ghc, that rests on the language committee, not the GHC developers.) If it really bothers you, probably best to ask someone like the EFF. Almost certainly do *not* formally ask a lawyer (informal is fine) --- they are going to concentrate on the worst case, mainly because even asking for a formal evaluation suggests that there is a need to worry about the worst case. Otherwise, leave well enough alone. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

tl;dr I think we're fine :)
long version; asking people the first time they contribute to confirm that
their work is their own, and they can and do grant bsd license is about all
thats needed.
i'm not sure how language standards relate to this topic, but i'll ask you
about that out of band I guess ;)
On Thu, Oct 30, 2014 at 11:45 AM, Brandon Allbery
On Thu, Oct 30, 2014 at 11:25 AM, Carter Schonwald < carter.schonwald@gmail.com> wrote:
I'm happy to ask the IP lawyers in my family for some opinions on this but I think what we are doing now is fine.
As Joachim already noted, it's a bit late to switch course for GHC; you'd have to track down every past contributor. (I've been involved with projects that needed to do that; if at all possible, avoid it.)
The reality, as I understand it (note that I Am Not A Lawyer(tm) but have experience with projects that have had to face the question), is that there's complex interactions between copyright law and contract law (not to mention questions of how contract law affects contributions to an open project). And both have a certain "valid until proven otherwise" aspect, which often makes it wisest to not change what's already working well enough --- especially since even asking a lawyer "on the clock" can potentially have legal implications on the whole project (but only if someone actually challenges in court and brings it up). As a result, the FUD's kinda built into the legal structure. :/
(My earlier response is not incompatible with this; the question I was answering was why a project might go with a CLA. In reality, whether the answer is *relevant* to a project is certainly open to question. One difference between the situation with GHC and the situation with Scala or Perl 6 is that the latter are also defining a language specification, which may have implications if there is a plan to submit it to an official standards body at some point. For ghc, that rests on the language committee, not the GHC developers.)
If it really bothers you, probably best to ask someone like the EFF. Almost certainly do *not* formally ask a lawyer (informal is fine) --- they are going to concentrate on the worst case, mainly because even asking for a formal evaluation suggests that there is a need to worry about the worst case. Otherwise, leave well enough alone.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators).
I wouldn't want a copyright-assignment system (since that allows the project to re-license when it wants, for example) but an inbound=outbound agreement (that is, an explicit agreement from contributors to have their contributions released under the license of the project) is not an unreasonable thing to do. -- Stephen Paul Weber, @singpolyma See http://singpolyma.net for how I prefer to be contacted edition right joseph

yup, agreed
-Carter
On Thu, Oct 30, 2014 at 1:14 PM, Stephen Paul Weber
In the absence of a license agreement, the contribution is usually owned by the submitter and not the project (copyright, see Berne convention). This doesn't scale very well. A signed CLA allows the project to demonstrate that the submitter has agreed to transfer ownership of the contribution to the project('s administrators). I wouldn't want a copyright-assignment system (since that allows the project to re-license when it wants, for example) but an inbound=outbound agreement (that is, an explicit agreement from contributors to have their contributions released under the license of the project) is not an unreasonable thing to do. -- Stephen Paul Weber, @singpolyma See http://singpolyma.net for how I prefer to be contacted edition right joseph

I hate to spam this with possibly-tangential requests, but I also have a peeve:
Can we get a standardized copyright/comment header across all our
files? It seems as if every single file in the compiler (and RTS) has
different header text, mentioning different people or groups, some
from 10 years ago or more and others just recently added. This is
somewhat related to this RFC but also somewhat not, I feel.
Ideally it would be nice if we could have, say, an AUTHORS.txt file
containing the names of all those people who have committed to GHC
(essentially like 'git shortlog -sn' shows), which we would ask users
to add their name into, and then if we could standardize all the
headers to follow a known convention, and give boilerplate for people
to copy.
On Thu, Oct 30, 2014 at 12:16 PM, Carter Schonwald
yup, agreed
-Carter
On Thu, Oct 30, 2014 at 1:14 PM, Stephen Paul Weber
wrote:
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Can we get a standardized copyright/comment header across all our files? It seems as if every single file in the compiler (and RTS) has different header text, mentioning different people or groups, some from 10 years ago or more and others just recently added. This is somewhat related to this RFC but also somewhat not, I feel.
Ideally it would be nice if we could have, say, an AUTHORS.txt file containing the names of all those people who have committed to GHC (essentially like 'git shortlog -sn' shows), which we would ask users to add their name into, and then if we could standardize all the headers to follow a known convention, and give boilerplate for people to copy. I posted a thread about this earlier, and the only response I got was
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 30/10/14 21:55, Austin Seipp wrote: that it doesn't matter... For what it's worth I agree with you. - -- Alexander alexander@plaimi.net https://secure.plaimi.net/~alexander -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlRTZPsACgkQRtClrXBQc7U5oAD/SORgSMrSBML4ULjyG5HdnKEx Qd5vubaiRPVTitRaG50A/AwOm7SHquSydUIVcLtm3qlvbHBO2Z0FnecT7rAQKTQY =MoV8 -----END PGP SIGNATURE-----

Hi Austin, On 2014-10-30 at 21:55:14 +0100, Austin Seipp wrote: [...]
Can we get a standardized copyright/comment header across all our files? It seems as if every single file in the compiler (and RTS) has different header text, mentioning different people or groups, some from 10 years ago or more and others just recently added. This is somewhat related to this RFC but also somewhat not, I feel. [...]
Could you draft up a standard-header as a suggestion somewhere, maybe on the Wiki? The current situation is suboptimal, as it's unclear where the threshold for adding yourself as an author to a module header is (whitespace/indentation cleanups, fixing/writing docs, removing lines, adding a 5-line function in a 500 line module, ...?), and it's a bit unfair to those that have contributed far more to a module but haven't bothered to add themselves to the module header. So I'd welcome a standard approach. Would there be a single AUTHORS file for the code in ghc.git, or multiple ones (one for the compiler proper, one for base, ghc-prim, template-haskell, integer-*, ...?) Cheers, hvr

The current situation is suboptimal, as it's unclear where the threshold for adding yourself as an author to a module header is (whitespace/indentation cleanups, fixing/writing docs, removing lines, adding a 5-line function in a 500 line module, ...?), and it's a bit unfair to those that have contributed far more to a module but haven't bothered to add themselves to the module header.
For these reasons, and due to the power of git and Phab, I think, author annotations in module headers do not fulfill their original purpose particularly well any more. Neither the purpose of stating who is responsible/interested in a code fragment, nor the purpose of giving people credit. Could we get rid of them? The annotations already in the code would stay in git history and any future authors are recorded in git and Phab messages. We just need to make sure to mention original authors in git commit messages, if for whatever reason the original commit creator git metadata would be lost (or manually revert the loss of info via git options). BTW, this relates to the pseudonymous contributions discussion. The git/Phab history only helps to the extent that people identify themselves. One less place with one less version of contributor names/pseudonyms should actually help accounting for all contributors on the wiki, in .cabal, etc., for the purpose of giving public credit, for legal reasons, etc. Best, Mikolaj

There are good reasons not to require people's "real" name to participate: http://geekfeminism.wikia.com/wiki/Who_is_harmed_by_a_%22Real_Names%22_polic... Simon PJ often advocates to know people's name as part of creating a friendly community. There are good things about this. It also helps exclude people with less privilege, whom we have few enough of already, if it is a policy. I like most things about "Developer's Certificate of Origin", though. -Isaac On 10/30/2014 04:13 AM, Herbert Valerio Riedel wrote:
Hi,
GHC's Git history has (mostly) a good track record of having properly attributed authorship information in the recent past; Some time ago I've even augmented the .mailmap file to fix-up some of the pre-Git meta-data which had mangled author/committer meta-data (try 'git shortlog -sn' if you're curious)
However, I just noticed that
http://git.haskell.org/ghc.git/commitdiff/322810e32cb18d7749e255937437ff2ef9...
landed recently, which did change a significant amount of code, but at the same time the author looks like a pseudonym to me (and apologies if I'm wrong).
Other important projects such as Linux or Samba, just to name two examples, reject contributions w/o a clearly stated origin, and explicitly reject anonymous/pseudonym contributions (as part of their "Developer's Certificate of Origin" policy[1] which involves a bit more than merely stating the real name)
I believe the GHC project should consider setting some reasonable ground-rules for contributions to be on the safe side in order to avoid potential copyright (or similiar) issues in the future, as well as giving confidence to commercial users that precautions are taken to avoid such issues.
Comments?
Cheers, hvr
[1]: See http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Document... _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

On 2014-10-30 at 22:59:45 +0100, Isaac Dupree wrote:
There are good reasons not to require people's "real" name to participate:
http://geekfeminism.wikia.com/wiki/Who_is_harmed_by_a_%22Real_Names%22_polic...
Simon PJ often advocates to know people's name as part of creating a friendly community. There are good things about this. It also helps exclude people with less privilege, whom we have few enough of already, if it is a policy.
I like most things about "Developer's Certificate of Origin", though.
However, if we want to adopt the DCO[1] (as used by Linux Kernel development) as a good-faith (and yet light-weight) attempt to track the origin/accountability of contributions it relies on real names to know who is actually making that assertion. Having the DCO signed off by an obvious pseudonym would defeat the whole point of the DCO imho. Cheers, hvr [1]: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Document...

I agree with herbert, and one solution would be to ask those people who
which to remain pseudonymous to have a named person who's agreed to be
their proxy co-sign the patch or whatever. That i think accomplishes that
same goal :)
On Thu, Oct 30, 2014 at 7:34 PM, Herbert Valerio Riedel
There are good reasons not to require people's "real" name to
On 2014-10-30 at 22:59:45 +0100, Isaac Dupree wrote: participate:
http://geekfeminism.wikia.com/wiki/Who_is_harmed_by_a_%22Real_Names%22_polic...
Simon PJ often advocates to know people's name as part of creating a friendly community. There are good things about this. It also helps exclude people with less privilege, whom we have few enough of already, if it is a policy.
I like most things about "Developer's Certificate of Origin", though.
However, if we want to adopt the DCO[1] (as used by Linux Kernel development) as a good-faith (and yet light-weight) attempt to track the origin/accountability of contributions it relies on real names to know who is actually making that assertion. Having the DCO signed off by an obvious pseudonym would defeat the whole point of the DCO imho.
Cheers, hvr
[1]: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Document... _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (11)
-
Alexander Berntsen
-
Austin Seipp
-
Brandon Allbery
-
Carter Schonwald
-
Herbert Valerio Riedel
-
Isaac Dupree
-
Jan Stolarek
-
Joachim Breitner
-
Mikolaj Konarski
-
Simon Peyton Jones
-
Stephen Paul Weber