
Bryan Richter writes:
On 7/7/19 7:53 PM, Sven Panne wrote:> Am So., 7. Juli 2019 um 17:06 Uhr schrieb Bryan Richter :
How does the scaling argument reconcile with the massive scope of the Linux kernel, the project for which git was created? I can find some middle ground with the more specific points you made in your email, but I have yet to understand how the scaling argument holds water when Linux trucks along with "4000 developers, 450 different companies, and 200 new developers each release"[1]. What makes Linux special in this regard? Is there some second inflection point?
Well, somehow I saw that example coming... :-D I think the main reason why things work for Linux is IMHO the amount of highly specialized high-quality maintainers, i.e. the people who pick the patches into the (parts of) the releases they maintain, and who do it as their main (sole?) job. In addition they have a brutal review system plus an army of people continuously testing *and* they have Linus.
:D
I would add to your argument that they appear to use git primarily to *keep a record of merges*. Incoming patches have no history whatsoever; they're just individual patches. I guess that could be considered a simpler-to-use version of the fast-forward-only strategy! Perhaps Linux isn't such a great counterexample after all....
Once they have committed patches to some particular history, though, they don't rebase, since that would rewrite important audit history.
I would very much like to turn the question around: I never fully understood why some people like merge-based workflows so much. OK, you can see that e.g. commits A, B, and C together implement feature X, but to be honest: After the feature X landed, probably nobody really cares about the feature's history anymore, you normally care much more about: Which commit broke feature Y? Which commit slowed down things? Which commit introduced a space leak/race condition?
What I *don't* like is rewriting history, for all the reasons I don't like mutable state. As you say, what you're generally interested in is commits. When references to commits (in emails etc.) get invalidated, it adds confusion and extra work. Seeing this happen is what led me to wonder why people even prefer this strategy.
I would reiterate this. In my experience when I'm looking back at GHC's history I'm probably doing so for one of a few possible reasons: * I want to know which patch broke something * I want to know which patch made something slower * I want to know which patch added something In all of these cases I (personally) find a linear history makes reasoning about the progression of changes much easier. Bisection, blame, and performance analysis tools are all much easier when you have only one "past" to worry about.
On top of that, many of the problems people have with merges actually seem to be problems with bad commits, as you yourself hinted. Other concerns seem to be based in unfamiliarity with git's features, or an irrational desire for "pure history". (Merges *are* history!)
One final thing I like about merges is conflict resolution. Resolving conflicts via rebase is something I get wrong 40% of the time. It's hard. Even resolving a conflict during a merge is hard, but it's easier.
I strongly disagree here. In my experience, resolving conflicts via rebase is much easier than doing so via merge (which is one of the reason why I personally use a rebase workflow even outside of GHC). The difference is that during a rebase workflow I can reason about the changes made by each commit individually. I can look at the diff of the original commit (which is generally small, if history was constructed well), refer to the relevant subset of changes from the new commits I'm rebasing on top of, and adapt my changes needing only this "local" state. By contrast during a merge I need to keep both the entirety of my branch as well as every new commit that I'm merging into in my head. Not only is this often plain infeasible (e.g. I can't imagine trying to do this with the recent concurrent GC patches), but you end up with a result that is incoherent since changes that were likely relevant to your feature branch commits end up recorded in the merge commit.
Plus, the eventual merge commit keeps a record of the resolution! (I only learned this recently, since `git log` doesn't show it by default.) Keeping a public record of how a conflict was resolved seems like a huge benefit.
I'm not sure I see the value in this. To me it seems like the merge resolution is just another step in the *development* of the patch. We generally don't preserve such steps in history. We only care about the fully-consistent state of the patch when it is merged. Cheers, - Ben