weblog.masukomi.org

mah-soo-koh-mee

Handling and Avoiding Conflicts in Git
Kudos

July 12, 2008

John Kelvie said:

[To] me the fundamental challenge with existing version control systems is the difficulty of merging change sets from multiple developers across the same set of code. To me, this issue comes down to the diffing/merging functionality provided by the software, and I haven't seen or heard of anything that really improves the state of the art. How does GIT address this? How does it make it easier to do? Are there specific branching and merging tools it provides? Is through the use of more atomic commits (which I could see helping to an extent, but only so far as it allows for changes to be small enough that there is no overlap, thus sidestepping the problem).

There are two concepts that you must understand to really take advantage of git. The first is the index / staging area. A full description would require its own post but for this discussion you can think of it as a temporary branch where you put everything before committing it. You can diff between it and the last commit, between it and the working directory, etc. The second is that a git repo can have (and usually does) have multiple branches in the same location on disk.

The idea of multiple branches in the same folder, is critical to understand when trying to address conflicts. In most version control systems you have one checkout for every branch, each in a different location on your disk, and while this is an option in git, you can also have multiple branches in the same folder, just like the server side of most version control systems.

So, we're going to need an example. Let's assume I've got a git repo with foo.rb. In the master branch foo.rb looks like this:

if (value == "test") puts "I'm in!"
end

On branch_one we'll change that test to

if (value == TEST\_CONSTANT) 

and on branch_two we'll make it

if (value =="test" || value == "test2")

Now, if master, branch_one, and branch_two are all in the same repo we could stay in the same directory and "checkout" whichever branch we wanted to be working on. All the would change in place without you ever leaving the directory. So, if you checkout master and cat foo.rb you'll get the first example. Checkout branch_two and cat foo.rb (without changing directories) and you'll see the last example. I'm sorry if that was obvious, but it's critical that you understand that, and some newbs just aren't aware of it yet.

So, back to John's questions. How does git make branching and merging easier to do, and does it provide any specific tools for managing it? Well, first off the commands are freaking simple:

  • to create a branch in the same repo git branch branch_name or, more commonly git checkout -b branch_name The latter creates the branch then checks it out so you can immediately start working on it.
  • to merge in a branch in the same repo git merge branch_name
  • to merge in a branch from a remote repo you've got git fetch and git pull More on these two in a minute, because they do dramatically change how you manage conflicts.

As for specific tools for managing it. Yes and no. "...if you rather use GUI tools to merge files instead of editing file with conflict markers (like shown in example), you can use git-mergetool, which would call GUI tool of your choice; currently supported out of the box are: kdiff3, tkdiff, meld, xxdiff, emerge, vimdiff, gvimdiff, ecmerge, and opendiff." BUT, regardless of GUI tools, the ability to have multiple branches in the same local repo changes everything.

Here's an example. Same files as above, except I'm using a different repo for each instead of different branches in the same repo, because that's the way it would be when pulling from other people, and since we're dealing with people, we'll say branch_one is Mary's branch and branch_two is Bob's (pretend that the. The first thing you'd do if you were pulling from these guys regularly is to add them to your repos list of remote directories

$ git remote add marys\_branch ../branch\_one/
$ git remote add bobs\_branch ../branch\_two/ 
$ git remote show bobs\_branch marys\_branch

And here's where the difference between pull and fetch come in to play. fetch tells git "hey, go get this remote data, and shove it into a "remote" branch in my repo. pull says, "fetch it, and also merge it with my current branch." Using fetch we can suck in data from two sources, knowing full well it would conflict if we tried to merge it, but still have no problems.

$ git remote update
Updating marys_branch
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../branch_one
* [new branch] master -> marys_branch/master
Updating bobs_branch
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), remote: reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../branch_two
* [new branch] master -> bobs_branch/master

$ cat foo.rb
if (value == "test")
puts "I'm in!"
end

So, now, we've pulled in two people's conflicting changes, but not applied either. So, lets assume I don't suspect a conflict (hopefully they're not normal occourrances for you).

$ git merge marys_branch/master
Updating 7d6f564..5713885
Fast forward
foo.rb | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

$ git merge bobs_branch/master
Auto-merged foo.rb
CONFLICT (content): Merge conflict in foo.rb
Automatic merge failed; fix conflicts and then commit the result.

Damn, a conflict! Who would have guessed? Now, I do what git suggests, fix the problem and commit it, but I'm neither the one who broke it nor the one who knows most about it. So, I'm going to kick this back to Bob or Mary, but, I may want to see what's going on, to help choose who to kick this to.

One thing I can do is see specifically which commits are conflicting:

$ git log --merge
commit f068f89e88e9174b50a0bc5622875dd4d8e21bc8
Author: Kay Rhodes
Date: Sat Jul 12 09:26:44 2008 -0400

switched to testing for test or test2 in branch_two

commit 5713885c9adc5689b6d8222b2650d4b3ad0dbc42
Author: Kay Rhodes
Date: Sat Jul 12 09:25:52 2008 -0400

switch test to TEST_CONSTANT on branch_one

If the commit messages aren't enough info I can actually diff the two branches that conflicted totally separately from anything in MY branch.

$ git diff marys_branch/master bobs_branch/master
diff --git a/foo.rb b/foo.rb
index dac3b0f..220011b 100644
--- a/foo.rb
+++ b/foo.rb
@@ -1,3 +1,3 @@
-if (value == TEST_CONSTANT)
+if (value == "test" || value == "test2")
puts "I'm in!"
end

I could have also diffed the specific commits in conflict if I'd wanted, instead of the entire branches. In this case it would give the same result. But, not knowing what TEST_CONSTANT is (maybe it's a closure, maybe it's a predefined variable) I can either make an educated guess that whatever TEST_CONSTANT is it's probably a better choice from a maintenance perspective and tell Bob to go fix his stuff. Or, I could just send an e-mail to both of them and let them figure it out.

It's also important to point out that the "index in git contains on conflict *all* versions of a file: 'ours' i.e. the version in the branch you merge into, 'theirs' i.e. the version in the branch you are merging, and 'base' i.e. the version in the common ancestor ot the branches, and also version with conflict markers."

Another option is to just fix it myself. The conflicting file currently looks like this:

<<<<<<< HEAD:foo.rb

if (value == TEST_CONSTANT)
=======
if (value == "test" || value == "test2")
>>>>>>> bobs_branch/master:foo.rb
puts "I'm in!"
end 

HEAD is git slang for "the branch you're currently working on" which, in this case, has Mary's changes successfully merged in (actually, by default it's pointing to the last commit on the current branch). So, I decide that all three tests need to be there and make the edit, and then update the index. You need to update the index because git has already added Bob's changes to the index but knows that there was a conflict that needs resolving.

$ git status
foo.rb: needs merge
# On branch master
# Changed but not updated:
# (use "git add ..." to update what will be committed)
#
# unmerged: foo.rb
#
no changes added to commit (use "git add" and/or "git commit -a")

Updating the index is simply a matter of getting the file how you want it and saying:

> $ git add foo.rb

at which point committing would get you a commit message template that started like this:

> Merge commit 'bobs\_branch/master' Conflicts: foo.rb \# \# It looks
> like you may be committing a MERGE. \# If this is not correct, please
> remove the file \# /Users/krhodes/temp/trunk/.git/MERGE\_HEAD \# and
> try again. \#

The basic take-away from all this fetch / pull stuff is that it's generally a good idea to fetch first when dealing with other people's work, because having a local copy of their branch in your repo gives you a number of additional options if things go wrong.

And yes, atomic commits go a huge way towards avoiding the problem of conflicts, and handling them when they crop up. Obviously this applies to centralized systems too. They avoid them by keeping your commits on topic. If one commit is for bug X you generally don't have to worry about it conflicting with a commit for bug Y. And, when you discover that two people you're pulling from both have patches for bug X that conflict you have many more options.

Another key technique for avoiding conflicts is to merge constantly. Every single day you should be pulling any work that's been committed upstream. That way, when things do conflict, you only have to resolve a little tiny thing (if the commits are atomic then it's even tinier), and, if you choose to kick the issue back to someone else it's still fresh in their heads. This, of course, applies to centralized systems to.

Another way that git (and every other distributed system) helps to avoid conflicts is a social change that comes out of the distributed nature of the repos. Because you're constantly pulling in from multiple sources, having atomic commits becomes a requirement to participation. If Bob is a twit and only makes massive sprawling commits you're simply not going to pull from him. It's not like a centralized system where you're forced to take all the changes out there. Whoever is in charge of the main trunk would be stupid to accept Bob's changes because if there was a problem in one of them it'd be a pain in the butt to extract the bits that fixed bug A from the bits that didn't really fix bug B. No, they'd kick it back to Bob, telling him to clean up his stuff and make two separate patches.

This becomes even more important when you're working on a project that manages patches via e-mail. Just check out the git mailing list. People's patches are incredibly focused, enabling other git developers to take just the bits and pieces they choose. Because the patches are uniquely identifiable in git, it doesn't matter if some people take a change and some don't and you, as the maintainer, pull from all of them. Git won't think they're two different but identical changes. It'll know, "Oh, that's change x. I've got that already." Git also doesn't care how a patch got there. It could be from a commit you pulled over the lan, or something someone e-mailed. Because it's guaranteed cryptographically unique, it's the same thing.

[Update 2] Some additional tips from Jakub Narebski

If you want to see what some file looks like on other branch (or for example at specified point of time, for example at tagged revision, marking some released version), you don’t need to use “git checkout ‘branch’ && cat ‘file’”; you can use “git show ‘branch’:'file’” (see “Examples” section in git-show(1) manpage, and “Specifying revisions” section in git-rev-parse(1) manpage).

Below there is example how index looks like during conflicted merge. The example uses yet another way of specifying object names, i.e. “:’stage’:'file’” (we could alternatively use SHA-1 or shortened SHA-1 of object shown in git-ls-files output, e.g. “git show 2f096cc”).

$ git ls-files –abbrev –unmerged
100644 2f096cc 1 foo.rb
100644 89f36fe 2 foo.rb
100644 d3ea75d 3 foo.rb
$ git show :1:foo.rb
if (value == “test”)
puts “I’m in!”
end
$ git show :2:foo.rb
if (value == TEST_CONSTANT)
puts “I’m in!”
end
$ git show :3:foo.rb
if (value == “test” || value == “test2″)
puts “I’m in!”
end 

Try it yourself!
I find that reading how to do things like this is much enhanced by actually having something to test it on, especially when there are so many variables. So, I've uploaded the temp dir with the three repos I used in writing this. If you go into the trunk repo you'll see that the other ones have already been configured as remote repos. And, if you go into branch_one or branch_two you'll see that they're clones of trunk and thus both already know that their upstream is trunk and how to fetch from it without having to configure anything. You can also see what the difference would be in pulling and fetching from one of them.

Please let me know if there's still something that's a little unclear about dealing with conflicts in git and I'll add it in.


If you found this useful you might be interested in some of the other git posts here.

Comments