Making it easier to contribute non-functional changes

newer
Make lines stricter to fix a space...

Johan Tibell

4 Oct 2010 4 Oct '10

12:03 p.m.

Hi all, During the work on the containers library, I felt that it's too difficult to contribute non-functional changes to the libraries managed by the libraries list. If you look at repos of libraries managed by individuals, rather than the libraries list, you see a smattering of small changes that improve: the internal structure, the docs, the tests, and (if you're lucky) the benchmarks of the library. I'd like to see (and make) more non-functional changes to the core libraries, but currently I feel that the overhead of creating a ticket, writing an email, etc., is too large; what should be a 5-minute change stretches to over two weeks. I argue that's the reason we don't see many small, but important, changes to the core libraries. Proposal: Non-functional changes should be allowed without review by the libraries list. Such changes can be commited directly, if the commiter has access to the repo, or attached to a "please merge" ticket on the bug tracker. Only changes that don't affect the API or performance (in non-trivial ways) should be allowed in without a review. Examples of patches that shouldn't require review: addition of tests/benchmarks, smaller documentation improvements, internal reorganization, updates due to compiler changes, etc. It's possible that once a commit that should have been reviewed gets commited. I suggest we adopt an "ask for forgiveness, not for permission" policy in these cases. We can always roll back changes. N.B. If someone (e.g. Ian) commits patches of this nature on someone else's behalf, that person should not be held responsible if the patches are rolled back later. Comments welcome! Cheers, Johan

Show replies by date

Tyson Whitehead

4 Oct 4 Oct

1:35 p.m.

Hey, Aren't you suppose to be on vacation! : ) BTW, I went through you slides while waiting for my connection. You did a nice job covering a lot of stuff. Hope you enjoy the rest of your time on Canada. Cheers! -Tyson

Ross Paterson

1:45 p.m.

On Mon, Oct 04, 2010 at 12:03:22PM -0400, Johan Tibell wrote:

...

Only changes that don't affect the API or performance (in non-trivial ways) should be allowed in without a review. Examples of patches that shouldn't require review: addition of tests/benchmarks, smaller documentation improvements, internal reorganization, updates due to compiler changes, etc.

It's already the case that such changes are applied by people with commit access. I agree that the libraries submission process is too heavyweight in such cases. But contributors without commit access need to persuade someone who does to commit for them, e.g. by posting on the libraries list.

...

It's possible that once a commit that should have been reviewed gets commited. I suggest we adopt an "ask for forgiveness, not for permission" policy in these cases. We can always roll back changes.

N.B. If someone (e.g. Ian) commits patches of this nature on someone else's behalf, that person should not be held responsible if the patches are rolled back later.

There I disagree -- committers need to make a best effort to be sure of what they are doing, that it doesn't break validate, and that it's not an API change. Breaking the GHC build is a big deal. That means someone asking someone else to commit for them has to demonstrate that it's safe and minor.

Johan Tibell

2:09 p.m.

Hi Ross, Thanks for your feedback. On Mon, Oct 4, 2010 at 1:45 PM, Ross Paterson wrote:

...

It's already the case that such changes are applied by people with commit access. I agree that the libraries submission process is too heavyweight in such cases. But contributors without commit access need to persuade someone who does to commit for them, e.g. by posting on the libraries list.

My suggestion is that a contributor without commit access simply creates a ticket and assigns it to someone with commit access (e.g. Ian). Aside: Asking who has commit access in an email to the libraries list will quickly get you a link to the libraries process (I tried!). It's very unclear who has commit access to the core libraries. The answer is somewhere along the lines of Ian, the Simons, and some other old timers.

...

There I disagree -- committers need to make a best effort to be sure of what they are doing, that it doesn't break validate, and that it's not an API change. Breaking the GHC build is a big deal. That means someone asking someone else to commit for them has to demonstrate that it's safe and minor.

I assume that contributor test their patches, just as they would test them before submitting them to a package not maintained by the libraries list. The committer can still review the patch for things that could e.g. break the build. Like you said, a contributor should note in the ticket how he/she made sure that the patch doesn't break things. Note that the library review process today doesn't really look at whether a change will break things so I don't think that skipping the library list review will change anything with respect to build breakages. -- Johan

Simon Peyton-Jones

5 Oct 5 Oct

7:05 a.m.

| > commit access. I agree that the libraries submission process is too | > heavyweight in such cases. But contributors without commit access need | > to persuade someone who does to commit for them, e.g. by posting on | > the libraries list. * I think it's very important to make the barrier to entry as low as possible, to encourage contributions. Having the procedure written down is a big help http://haskell.org/haskellwiki/Library_submissions. * I think it's a good idea to create a ticket, to save the proposal getting lost. (The above link suggests creating the ticket in the GHC Trac for any library, which is fine with me so long as its clear.) But it would be a good idea to have some Trac links on the above page that show all open library proposals, so its easy to see all the pending ones. * I'd suggest adding a note to the above page saying how to find out who the maintainer is. Yes, I know, look at the .cabal file. But that may not be obvious to the uninitiated, and even if you know about the field you may not know that it is regarded as the Actual Truth. * The page says "This page documents the mechanism by which new code for libraries maintained by the community may be proposed". Just which libraries are those? I imagine that we mean "exactly those libraries whose maintainer is "libraries@haskell.org"." But we should say so! * And then what is the procedure for other libraries? Re-reading the page, it's not really clear whether its scope is "all libraries" or just "libraries@haskell.org libraries". * I think it's unsatisfactory having any libraries whose maintainer is solely "libraries@haskell.org". If I was contributing I'd find that much less motivating (like emails that say "contact@gov.uk"; probably a no-op). I'd much prefer a list of names for each library. Otherwise it all devolves to Ian or Simon somehow. Something like "libraries@haskell.org [Simon Marlow, Johan Tibell]". The list isn't immutable, of course, but it'd make it clear which libraries lack love, and who holds the token for progressing a proposal against that library. And who has commit access. I suppose this is the biggest change I'm suggesting. * I'd like the above page to write down the procedure for transferring maintainership, to handle maintainers who have gone AWOL. Something like "if no reply, send an email to libraries@ and haskell@, saying that you propose to take over maintainership. If no reply in 2 weeks, you can take over." Or whatever. Again this suggestion amounts to widening the scope of the libraries-submission page. * This thread started with discussion about "obvious" changes. I suggest it should be up to the maintainer. The submitter should say whether or not an API change is involved (another thing to add to the submissions page); and the maintainer can decide whether to apply immediately or wait for comment. Simon

Ross Paterson

8:55 a.m.

On Tue, Oct 05, 2010 at 11:05:07AM +0000, Simon Peyton-Jones wrote:

...

* I think it's a good idea to create a ticket, to save the proposal getting lost. (The above link suggests creating the ticket in the GHC Trac for any library, which is fine with me so long as its clear.) But it would be a good idea to have some Trac links on the above page that show all open library proposals, so its easy to see all the pending ones.

There are such links, but they're further down the page, so not so easy to find. And perhaps libraries should have a separate Trac from GHC (if only so save Ian from adding "not GHC" to all those tickets).

...

* The page says "This page documents the mechanism by which new code for libraries maintained by the community may be proposed". Just which libraries are those? I imagine that we mean "exactly those libraries whose maintainer is "libraries@haskell.org"." But we should say so!

It used to say that, but was recently changed: http://haskell.org/haskellwiki/?title=Library_submissions&diff=36838&oldid=36644 At the same time the scope of the process was expanded from proposing changes to library interfaces to proposing new code. That change wasn't discussed here, and I don't agree with it. The changed wording does make it appear that making changes is very cumbersome, but it doesn't match practice: 53 patches were applied to containers in the last 12 months, and only one of them went through the library submissions process. Changes to interfaces are different from non-functional changes, and need a different kind of review. For one thing they are subjective. Also once they get into a release we're stuck with them for at least 12 months and usually much longer. The process needs to apply in the same way to people with and without commit access. If anything that process should be a bit more demanding: I would like to lengthen some of the consideration periods, and move to a "release-candidate" approach. On the other hand we need to make it easier for people without commit access to get minor patches applied. Committers still need to take responsibility for patches they apply, but it's unreasonable to pile all that on Ian, so some restructuring is needed. Maybe a separation between GHC and libraries Tracs would help there too. But mixing these two things will help neither.

Brent Yorgey

4 Oct 4 Oct

4:15 p.m.

On Mon, Oct 04, 2010 at 12:03:22PM -0400, Johan Tibell wrote:

...

Hi all,

During the work on the containers library, I felt that it's too difficult to contribute non-functional changes to the libraries managed by the libraries list. If you look at repos of libraries managed by individuals, rather than the libraries list, you see a smattering of small changes that improve: the internal structure, the docs, the tests, and (if you're lucky) the benchmarks of the library.

I'd like to see (and make) more non-functional changes to the core libraries, but currently I feel that the overhead of creating a ticket, writing an email, etc., is too large; what should be a 5-minute change stretches to over two weeks. I argue that's the reason we don't see many small, but important, changes to the core libraries.

Proposal:

Non-functional changes should be allowed without review by the libraries list. Such changes can be commited directly, if the commiter has access to the repo, or attached to a "please merge" ticket on the bug tracker.

+1 from me. I know I'd probably submit some minor documentation enhancements and the like if the process was a bit easier for such things. -Brent

Sterling Clover

4:34 p.m.

On Oct 4, 2010, at 4:15 PM, Brent Yorgey wrote:

...

On Mon, Oct 04, 2010 at 12:03:22PM -0400, Johan Tibell wrote:

...
Hi all,

Proposal:

Non-functional changes should be allowed without review by the libraries list. Such changes can be commited directly, if the commiter has access to the repo, or attached to a "please merge" ticket on the bug tracker.

+1 from me. I know I'd probably submit some minor documentation enhancements and the like if the process was a bit easier for such things.

Perhaps a slightly less drastic version of this could satisfy everybody. If the patch is committed directly, a note containing it and describing it, if only briefly, could be sent to the libraries list. Likewise, if a ticket is created, a short note can be sent to the libraries list pointing to the ticket. Still no need for discussion, formal approval or the like, but a subscriber to libraries can always speak up if they have a concern, and libraries subscribers can feel confident that they will remain at least as appraised of changes as before. Cheers, Sterl.

Johan Tibell

5:15 p.m.

On Mon, Oct 4, 2010 at 4:34 PM, Sterling Clover wrote:

...

Perhaps a slightly less drastic version of this could satisfy everybody. If the patch is committed directly, a note containing it and describing it, if only briefly, could be sent to the libraries list. Likewise, if a ticket is created, a short note can be sent to the libraries list pointing to the ticket. Still no need for discussion, formal approval or the like, but a subscriber to libraries can always speak up if they have a concern, and libraries subscribers can feel confident that they will remain at least as appraised of changes as before.

A mailing list where all the commits go sounds like a good idea.

Ross Paterson

5:28 p.m.

On Mon, Oct 04, 2010 at 05:15:28PM -0400, Johan Tibell wrote:

...

A mailing list where all the commits go sounds like a good idea.

http://www.haskell.org/pipermail/cvs-libraries/

Ian Lynagh

4:49 p.m.

On Mon, Oct 04, 2010 at 12:03:22PM -0400, Johan Tibell wrote:

...

I argue that's the reason we don't see many small, but important, changes to the core libraries.

FWIW, I graphed the number of patches per year in unix and base here: http://lambda.haskell.org/~igloo/lib_patches.png (Library_submissions started in Oct 2006) but I'm not sure it really tells us much, as it doesn't say how many patches were "maintenance" rather than "development".

...

Non-functional changes should be allowed without review by the libraries list.

As Ross said, simple, obvious improvements are applied without going through the process. A common example is patches that just add haddock docs to previously doc-less functions. Note that some patches are obvious when being created, but are hard to sanity check afterwards. For example, when making a patch that just moves functions around, it's easy to be confident it's safe when creating the patch, but when someone else sees it it just looks like a load of code being added and removed. It's even worse if other changes are made in the same patch. In general, it makes life easier for the committers if the libraries process is used, so it is not down to us to review the patches.

...

or attached to a "please merge" ticket on the bug tracker.

That's not what 'merge' is for. I improved the description a couple of weeks ago, so it's hopefully clearer now: fixed in HEAD; please merge to STABLE Next status will be 'merge' The right state is 'patch': please review Next status will be 'patch'

...

Only changes that don't affect the API or performance (in non-trivial ways) should be allowed in without a review.

But you need some amount of review in order to know what a patch does. For example, it was only when reading a recent patch Sun Aug 29 13:02:45 BST 2010 * Performance improvements to Data.Map that I discovered that it added DEPRECATED pragmas for two functions.

...

It's possible that once a commit that should have been reviewed gets commited. I suggest we adopt an "ask for forgiveness, not for permission" policy in these cases. We can always roll back changes.

IIRC, part of the motivation for the library process was that changes were being made in the HEAD, getting into release, and then people discovered that they caused problems.

...

someone (e.g. Ian)

You mentioned my name a few times; I'm not too keen on proposals that mean more work for me!

...

My suggestion is that a contributor without commit access simply creates a ticket and assigns it to someone with commit access (e.g. Ian).

If you do have a patch that doesn't need to go through the libraries process, the best thing to do is to file a ticket and set its status to 'patch'. Assigning it to me is unlikely to get it applied faster, but means no-one else is likely to look at it.

...

Aside: Asking who has commit access in an email to the libraries list will quickly get you a link to the libraries process (I tried!).

Well, if you want to get a patch committed, then that's normally what you need, so that seems like a reasonable response to me.

...

I assume that contributor test their patches, just as they would test them before submitting them to a package not maintained by the libraries list.

FWIW, it's quite common for submitted patches to break the build (e.g. platform-specific breakage, or new code has warnings, or even just silly mistakes that couldn't possibly compile (e.g. there was one in the past week which added an unmatched "endif" to a makefile)). Thanks Ian

Johan Tibell

5:27 p.m.

On Mon, Oct 4, 2010 at 4:49 PM, Ian Lynagh wrote:

...

FWIW, I graphed the number of patches per year in unix and base here: http://lambda.haskell.org/~igloo/lib_patches.png (Library_submissions started in Oct 2006) but I'm not sure it really tells us much, as it doesn't say how many patches were "maintenance" rather than "development".

Thanks for putting together the graph. Do you have a feeling for how many patches are created by people without commit access? Quite a few patches to these libraries are made by e.g. Simon M.

...

...
Non-functional changes should be allowed without review by the libraries list.

As Ross said, simple, obvious improvements are applied without going through the process. A common example is patches that just add haddock docs to previously doc-less functions.

Does this apply to "obvious" improvements by the general Haskell community or only by a set of core contributes (e.g. GHC developers). If one believes one has such an improvement, is there a description of how to get it committed?

...

Note that some patches are obvious when being created, but are hard to sanity check afterwards. For example, when making a patch that just moves functions around, it's easy to be confident it's safe when creating the patch, but when someone else sees it it just looks like a load of code being added and removed. It's even worse if other changes are made in the same patch.

...

From my experience these things are not being looked at by people on libraries@ so I don't think having a public review of non-functional patches decreases the likelihood of breakages. A contributor could of course ask for a review if he/she thinks it's necessary. The commiter could also have a look to make sure things don't break.

...

In general, it makes life easier for the committers if the libraries process is used, so it is not down to us to review the patches.

I understand that, but it also seems to prevent us from cleaning up our base libraries.

...

...
or attached to a "please merge" ticket on the bug tracker.

That's not what 'merge' is for. I improved the description a couple of weeks ago, so it's hopefully clearer now: fixed in HEAD; please merge to STABLE Next status will be 'merge'

The right state is 'patch': please review Next status will be 'patch'

Got it. Thanks for the clarification.

...

...
Only changes that don't affect the API or performance (in non-trivial ways) should be allowed in without a review.

But you need some amount of review in order to know what a patch does. For example, it was only when reading a recent patch

Sun Aug 29 13:02:45 BST 2010 * Performance improvements to Data.Map

But that patch was sent to the libraries list for review, as it should. I was very much aware what changes the patch made!

...

...
It's possible that once a commit that should have been reviewed gets commited. I suggest we adopt an "ask for forgiveness, not for permission" policy in these cases. We can always roll back changes.

IIRC, part of the motivation for the library process was that changes were being made in the HEAD, getting into release, and then people discovered that they caused problems.

Perhaps we could have a commit mailing list like Sterling suggested?

...

...
someone (e.g. Ian)

You mentioned my name a few times; I'm not too keen on proposals that mean more work for me!

I didn't intend it to! Read that as <whoever actually has commit access and commits stuff>.

...

If you do have a patch that doesn't need to go through the libraries process, the best thing to do is to file a ticket and set its status to 'patch'. Assigning it to me is unlikely to get it applied faster, but means no-one else is likely to look at it.

Sure. That sounds reasonable. As part of this proposal it would be nice if we could clarify who does have commit access to our core libraries. I have no idea and I doubt most people have.

...

...
Aside: Asking who has commit access in an email to the libraries list will quickly get you a link to the libraries process (I tried!).

Well, if you want to get a patch committed, then that's normally what you need, so that seems like a reasonable response to me.

But it will also make sure that people don't bother with smaller patches.

...

...
I assume that contributor test their patches, just as they would test them before submitting them to a package not maintained by the libraries list.

FWIW, it's quite common for submitted patches to break the build (e.g. platform-specific breakage, or new code has warnings, or even just silly mistakes that couldn't possibly compile (e.g. there was one in the past week which added an unmatched "endif" to a makefile)).

libraries@ review doesn't change that, as people there don't normally check for these kind of problems (i.e. they don't actually compile and run the code). We use a process for submitting code that's applicable to well polished libraries, but the libraries the process governs are anything but. For example, there are several of use who would like to make containers into a nice polished library, but I doubt we will actually do so if each patch needs to go through the libraries process. Thanks, Johan

Bas van Dijk

6:20 p.m.

On Mon, Oct 4, 2010 at 11:27 PM, Johan Tibell wrote:

...

Thanks for putting together the graph. Do you have a feeling for how many patches are created by people without commit access? Quite a few patches to these libraries are made by e.g. Simon M.

'darcs show authors' is a nice starting point. Bas

Ian Lynagh

6:32 p.m.

On Mon, Oct 04, 2010 at 05:27:43PM -0400, Johan Tibell wrote:

...

On Mon, Oct 4, 2010 at 4:49 PM, Ian Lynagh wrote:

...
FWIW, I graphed the number of patches per year in unix and base here: http://lambda.haskell.org/~igloo/lib_patches.png (Library_submissions started in Oct 2006) but I'm not sure it really tells us much, as it doesn't say how many patches were "maintenance" rather than "development".

Thanks for putting together the graph. Do you have a feeling for how many patches are created by people without commit access?

No. But anyway, to get any meaningful information you'd need to compare the proportions before and after Oct 2006. And there may be other factors affecting the numbers too, of course.

...

...
...
Non-functional changes should be allowed without review by the libraries list.

As Ross said, simple, obvious improvements are applied without going through the process. A common example is patches that just add haddock docs to previously doc-less functions.

Does this apply to "obvious" improvements by the general Haskell community

Yes; I was thinking of haddock patches submitted by general Haskell community members.

...

...
But you need some amount of review in order to know what a patch does. For example, it was only when reading a recent patch

Sun Aug 29 13:02:45 BST 2010 * Performance improvements to Data.Map

But that patch was sent to the libraries list for review, as it should. I was very much aware what changes the patch made!

I think you've missed my point. Just because someone submits a patch that e.g. "Just adds some haddock docs" doesn't mean they didn't accidentally include a small code change or something too.

...

...
...
Aside: Asking who has commit access in an email to the libraries list will quickly get you a link to the libraries process (I tried!).

Well, if you want to get a patch committed, then that's normally what you need, so that seems like a reasonable response to me.

But it will also make sure that people don't bother with smaller patches.

Well, trivial patches would be sent to libraries@ or put in the GHC trac, so you still don't need to know who has commit access.

...

...
...
I assume that contributor test their patches, just as they would test them before submitting them to a package not maintained by the libraries list.

FWIW, it's quite common for submitted patches to break the build (e.g. platform-specific breakage, or new code has warnings, or even just silly mistakes that couldn't possibly compile (e.g. there was one in the past week which added an unmatched "endif" to a makefile)).

libraries@ review doesn't change that, as people there don't normally check for these kind of problems (i.e. they don't actually compile and run the code).

I spotted the unmatched "endif" without compiling it. I also recently found a segfault in the text package by looking at the source, rather than by compiling/running it with random values. It's probably true that most people don't review as carefully as I do, though. But that point wasn't really about whether review would find problems; rather, it was about the general quality of patches. (I don't mean to put contributors off by pointing this out; just explaining why review is important).

...

We use a process for submitting code that's applicable to well polished libraries, but the libraries the process governs are anything but. For example, there are several of use who would like to make containers into a nice polished library, but I doubt we will actually do so if each patch needs to go through the libraries process.

But on the flipside, there was some anger recently when some containers patches were pushed without review. Thanks Ian

Simon Marlow

5 Oct 5 Oct

8:56 a.m.

On 04/10/2010 22:27, Johan Tibell wrote:

...

Sure. That sounds reasonable. As part of this proposal it would be nice if we could clarify who does have commit access to our core libraries. I have no idea and I doubt most people have.

People who have accounts on darcs.haskell.org, and are in the group 'darcs'. Basically right now this means GHC developers and a few others. We rarely add new accounts on darcs.haskell.org these days because we don't need to: code.haskell.org has taken most of the load off, the "extra libraries" are no longer hosted on darcs.haskell.org, and for the rest 'darcs send' means that patch submissions are possible without commit access. Having said that, we should add accounts for active developers and maintainers of core libraries when it makes sense to do so. In the GHC team we haven't been proactive in asking people whether they would like accounts, but don't take that as a sign that we want to keep commit access for ourselves and prevent anyone else from modifying the code! If you think the world would be a better place if you had commit access, feel free to ask. (ultimately accounts are handed out by the Galois admins, since it's their machine). Cheers, Simon

Ian Lynagh

1:25 p.m.

On Tue, Oct 05, 2010 at 01:56:22PM +0100, Simon Marlow wrote:

...

feel free to ask. (ultimately accounts are handed out by the Galois admins, since it's their machine).

Actually, SoC money was used to buy the current server, but Galois do kindly donate hosting, bandwidth, and sysadmin time. Thanks Ian

5506

Age (days ago)

5507

Last active (days ago)

List overview

Download

15 comments

9 participants

participants (9)

Bas van Dijk
Brent Yorgey
Ian Lynagh
Johan Tibell
Ross Paterson
Simon Marlow
Simon Peyton-Jones
Sterling Clover
Tyson Whitehead