Eddie Izzard - Turney Button Things - Glorious

By: MaineGemCutter

373   9   98594

Uploaded on 05/25/2008

I almost died laughing at this.

Comments (4):

By anonymous    2017-09-20

TL;DR: it's not actually the fast forward itself

Your question comes down to: "why isn't Git obeying my custom merge direction?" In fact, this problem can occur with any merge, and any custom merge driver. The fact that this merge can be done as a fast forward operation merely guarantees that you (with your particular case) will hit the problem.

The reason boils down to the fact that any custom .gitattributes merge driver, including merge=ours, is invoked only when Git believes there is "something to merge". This does not seem so bad until you realize what it takes for Git to have such a belief.

Sidebar: merge strategies

It's worth mentioning here, as a side-bar, Git's -s strategy argument to git merge. These strategies take over the whole process, including the "find the merge base" step—plus everything after that—and hence can do their own thing, which includes ignoring .gitattributes entirely. Obviously if a strategy ignores your .gitattributes, setting a custom merge driver or mode there won't help.

Therefore, we're looking only at the -s strategies that do use a merge base and two of what Git calls heads (which we'll label "ours" and "theirs"), and do use .gitattributes. There are three of those built in to Git—recursive, resolve, and subtree—but they all work the same here, with respect to what gets merged and what happens with custom merge drivers. (The other two built-in merge strategies, ours and octopus, either don't bother with a merge base and a "theirs" at all, or—for octopus—have more than two heads, so that there is no clear notion of "ours" and "theirs".)

One merge base and two heads

So, now that we have settled on the built in merges that have one merge base commit and two head commits, we can look at what it means for Git to think, in its tiny little pre-programmed Gitty way, that there is something to merge.

The two heads are easier to define. One of them, the one we call "ours", is just HEAD itself. The other is whatever argument we pass to git merge:

git merge A

means "ours" is HEAD and "theirs" is the commit identified by A.

Here is your git log --all --decorate --oneline --graph output again (thanks, by the way, for including that—it's critical for most merges!):

* da6a750 (A) Further in A, okay for merging back into master
*   bf27b58 Merge branch 'master' into A
| * 86294d1 (HEAD -> master) Development on master
* | abe6b8a Welcome to branch A
* 589517c First commit

so we can say that the two heads are commit 86294d1 (HEAD or master or just "ours") and commit da6a750 (A or just "theirs").

The merge base is whatever commit they first share in terms of their graph history, i.e., starting from both heads, work backwards in history if needed until you find a commit that they have in common, that you can reach from both heads. So we start from da6a750, work backwards one step to bf27b58, then work backwards one more step to both 86294d1 and abe6b8a. Meanwhile, we start from 86294d1 and ... oh look we've hit a common commit already! :-)

Since the merge base is one of the two heads, normally we'd either get a fast forward, or a complaint that there is nothing to merge. Since the merge base is the "ours" head, of those two options, Git would pick the fast forward operation. Using --no-ff tells Git: don't pick that, go ahead and do a full blown merge after all.

Now, the fact that the merge base is the "ours" commit guarantees we will have your problem, but in fact, we could have your problem even if the merge base were not the "ours" commit. Let's take a look at what's inside a commit, at the next level down of what Git needs and does when it works on both git diff and git merge—but first, let's think about what git merge is supposed to do.

The goal of a merge is to combine work

As a general rule, the idea when running git merge is that we want to take two sets of work—things we did on our branch in our commits, and things "they", whoever they are, did on their branch in their commits—and produce a new commit that is the best of both worlds: that takes any good stuff we did, plus any good stuff they did.

If we draw the graph horizontally instead of vertically, with older commits at the left and newer ones at the right, we can draw this:

          o--o--o--...--H   <-- ours
          o-----...-----T   <-- theirs

where each o is a commit, and so are B, H, and T. Commit B is the merge base, where the two forks in this graph rejoin in the "past" (leftward) direction. H is our (HEAD) commit and T is the head / tip commit of their branch. How, then, can we combine our work with their work?

Git's answer is to run two git diffs:

git diff B H     # find out what we did
git diff B T     # find out what they did

Then it can combine these two diffs:

  • Wherever we added something—some lines of text—to some files, Git should make the final result have those added lines in those files. Wherever we deleted some lines of text in some files, it should make the final result have those lines deleted.

    Because git diff expresses the differences as "delete this and add that" (even for differences that change this to that), that covers everything git diff says.

  • Likewise, wherever they added lines, Git should make the final result have the added lines. Wherever they deleted lines, Git should make the final result have those same deletions.

  • To take care of a very common case, if we and they made the exact same change—deleting the same original lines, and/or adding the same replacements—Git takes only one copy of this.

  • And of course, if there's a place where we both touched the same lines, but in different ways, Git just throws up its metaphorical hands, exclaims "Oy vey!", and declares a merge conflict.

    (It's these merge conflicts that give us the most headaches, so most of the twisty knobs Git gives us are designed for dealing with those conflicts in some way. That's mostly true of .gitattributes merge attributes, too—though that's not directly relevant to our problem here.)

Now, all this combining is a lot of work, so to make Git go fast, there's a short-cut.

What's inside a raw commit for git merge to git diff

We can look at any commit object, or indeed any Git object at all, with git cat-file -p:

$ git cat-file -p HEAD
tree 5bc304073b94505cd3f6716829c4cec5a7474762
parent 29257c2c82dca881c4cc65765392a32e46264fbe
author Chris Torek <chris.torek@gmail.com> 1490287144 -0700
committer Chris Torek <chris.torek@gmail.com> 1490297185 -0700

insert early footnote on Git branch creation

In the "about version control" chapter section that introduces

(I snipped the rest off here).

The more interesting part here is actually the tree, so let's view some of that:

$ git cat-file -p 5bc304073b94505cd3f6716829c4cec5a7474762
100644 blob 8d1519c435c4da5a65228785fa7ba7033fe011ff    .gitignore
100644 blob 66c9d22a735ee9d8da7f7ed49599583aa642842f    Makefile
100644 blob c9c824fa6668e45976c4fe8a10e4d5c25e272f0c    about.tex
100644 blob 1757109f5aa921ecf9a8051180c25f09e1496c07    aboutvc.tex

(again I snipped things off here).

Each of those raw hash IDs for each blob object—i.e., stored file version—tells Git which version goes with this commit. (More precisely, that's the file version for this tree object, but this tree goes with this commit, so it amounts to the same thing.)

Git can, and in fact has to, extract these blob hash IDs for each of the three commits—the merge base, "ours", and "theirs". The hash IDs are how it will be able to diff the old and new versions of files like aboutvc.tex (in my case) or specific (in yours). But there is an interesting thing about these hash IDs: they're based entirely on the contents of the object.1 If two files in two different commits are exactly, completely, 100% bit-for-bit identical, they have the same hash and are stored in the repository just once. This means that no matter how many commits have a copy of that particular version of that file, there's only one copy stored in the database.

1In fact, they are cryptographic hashes of the object contents, including the little type-and-size header Git sticks on the front of each object. That header is why the now-famous SHA-1 hash collision is not an immediate problem for Git.

Same hash => problem

This fast hash comparison—the fact that the same hash means "same version of that file"—means that git diff and git merge can immediately and easily tell that there's no change to some file, from base to ours, or base to theirs ... and this is precisely where merge=ours goes wrong. Git looks at base-vs-ours, and base-vs-theirs. One pair has the same hash. One pair has a different hash.

At this point, Git simply assumes that the right answer, regardless of merge strategy or turney-knob setting in .gitattributes, is to take the file from whichever head has a different hash. For most files, in most cases, that's the right answer. But if we have defined a custom merge driver, or set merge=ours, it might be the wrong answer.

When the one head that's different is "theirs", and the custom merge direction is "keep ours", it's the wrong answer. That's true no matter what commit is chosen as the merge base, but when the merge base is HEAD—is our commit—then all the hashes, in the diff from base to ours, are the same, and the result is always "their version of the file".

That, in fact, is why a fast forward is possible in the first place: the final merged tree is always just their tree. Git, in effect, ignores all the custom directions in .gitattributes. That remains true even if you force a real merge rather than a fast-forward-non-merge "merge".

Perhaps Git should check for custom merge drivers or merge=ours directives, and disable this short-cut, at least for real (non-fast-forward) merges. But it doesn't, and therefore you will have this problem. You will also have this problem for other cases, where there's a real merge to be done, but the file is modified only in the base-to-theirs comparison.

One last sidebar: don't do this for configuration files

People often want to use this merge=ours to make sure that configuration files stored on a branch are kept the way they are on that branch. This is nearly always the wrong overall strategy: instead, configuration files should be omitted entirely from version control, or at least from the version control of this particular repository. Instead of committing, e.g., config.ini or config.php, commit a config.ini.sample or config.default.php or some such. Copy this configuration to the "real config", or read it as a secondary strategy if the "real" configuration is missing or incomplete.

This gives you a way to version configurations (sample and/or default ones) in general, without versioning the specific run-time configuration of someone using this repository as the place from which they run the software / app itself. Should the user wish to version-control her particular configuration, she can store that in a separate repository, and replace config.ini with (e.g.) a symbolic link to ../myconfigs/fooapp.ini, which is where she has her configurations versioned.

(A similar trick is to get the configuration from $HOME/.gitconfig or /usr/local/etc/fooapp.ini. That is, store the configuration separately in the first place. Again, if you want or need some sort of default configuration, you can keep that versioned with the software, but the user's own configuration is separate, and not under your own version control at all.)

Original Thread

By anonymous    2017-09-20

If I directly put the origin url, the remote tracking branches are not getting updated ...

The reason for this is trivially simple: Git is stupid. :-)

More seriously, with one command, you are saying to Git: Use the name Fred, or Remote1234, or—wait, I know, this is the best name ever: origin! Anyway, as I was saying, use that name, fetch some stuff, and remember it for me.

With the other command, you are saying: Go to this URL, fetch some stuff, and remember it for me.

Under what name shall Git remember these things?

When you say "using the name origin", Git has a really good name to use. It sticks origin/ in front of each name:

   1dd995c..32a2ef5  branchA/somename -> origin/branchA/somename 
 * [new branch]      branchB/somename -> origin/branchB/somename 

When you give Git just a plain URL, it has no good name, so it falls back on the method it used decades ago, back before "remotes" were invented: it shoves all the information in a file named .git/FETCH_HEAD. This is why it says:

 * branch            HEAD       -> FETCH_HEAD

(You can stop here if you like. The section below is not necessary for the simple answer. The rest is more about how Git achieves this, than what the general idea is. The how part has a bunch of knock-on effects if you start fiddling with all of Git's little turney knobs.)

That's a nice, memorable explanation, but it hides a deeper truth

There is an important, yet somewhat obscure, difference that your question exposes. You've shown it above, and I have quoted it: the fetch using origin updated two remote-tracking branches, yet the fetch using a raw URL updated or created only one entry in the FETCH_HEAD file.

The reason for this is buried here in the git fetch documentation, under the "confgured remote-tracking branches" section:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*

When git fetch is run without specifying what branches and/or tags to fetch on the command line, e.g. git fetch origin or git fetch, remote.<repository>.fetch values are used as the refspecs—they specify which refs to fetch and which local refs to update. The example above will fetch all branches that exist in the origin (i.e. any ref that matches the left-hand side of the value, refs/heads/*) and update the corresponding remote-tracking branches in the refs/remotes/origin/* hierarchy.

That is, your Git determines which names to fetch (and consequently which commits to obtain from the other Git) using remote.origin.fetch, which you can show by running:

git config --get-all remote.origin.fetch

(we need --get-all as there may be more than one such configuration line; we want all of them, not just the last one, which plain --get would show us). Hence, giving Git the name Fred or remote1234 or, more commonly, origin, tells Git what to fetch by default, as well as how to rename the result (i.e., to stick origin/ in front). Changing the remote.origin.fetch line, or adding additional lines, changes the default set of "what to fetch" and/or the "how to rename the result".

These are less relevant, but not entirely irrelevant, if you supply refspecs (such as branch names) on the command line:

git fetch origin foobranch 'refs/notes/*:refs/notes/origin/*'

for instance. Here, you have explicitly told Git what to fetch, overriding remote.origin.fetch. But if you do not tell Git what to fetch, it looks for the named-remote's remote.origin.fetch setting—and if you use a raw URL, instead of a remote name like origin, there is no place to look, so you get yet another historical backup: it just brings over whatever it finds under the other Git's HEAD.

(There is more in the documentation, such as the description of --refmap. Study it for additional useless arcane Git knowledge. :-) )

Original Thread

Submit Your Video

If you have some great dev videos to share, please fill out this form.