How does git store resolved conflicts after merges and its author history


今天我完成了阅读有关 Git 内部的书籍章节,我认为我已经对 Git 的工作原理有了一个很好的概述。

但是我仍然不明白的是,在冲突解决后,Git 是如何以及在哪里存储每行的作者信息(已解决的冲突)。

据我所了解,提交只创建了4个元素,其中没有一个涵盖了"行 <==> 作者关系"。



Today I finished reading the book chapter about the internals of git and I think I got a good overview of how git works.

But what I still don't understand is how and where git stores the author information of each line (resolved conflict) after a conflict got resolved.

As far as I understood commits just create 4 elements, and none of them cover a line &lt;==&gt; author relationship

This line (1) got accepted from branch foo by author foo
This line (2) got accepted from branch bar by author bar


得分: 7





请注意,Git必须从一些线性提交链的末尾开始向后工作。像git blame这样的程序在这个后向遍历期间,当某一行更改为最终提交中的内容时,会"分配所有权"给某个源代码文件的某一行。如果它没有更改,我们还不知道要说这一行是谁编写的:我们必须继续向后查找。



             /    \
    ...--G--H      M
             \    /



这尤其奇怪,因为我们遵循了未更改文件的父级。但这就是git log历史简化的工作方式:在进行历史简化时,git log选择一些文件没有更改的父级,并沿着历史的这一支向下走。1

如果MF既不匹配其中一个输入,那么两个提交的所有行都必须有所贡献。是否存在合并冲突?我们不知道:我们只知道M中的快照JL中的两个快照都不匹配。在这种情况下,git log命令将不会简化合并的一个分支。git blame会做什么有点神秘,因为它从未真正有过详细的文档(而且 blame 的算法随着时间的推移也有所发展)。

1git log文档中有关于这是如何工作的详细描述,使用了TREESAME这个词。Git不仅查看一个文件,而是查看每个保留的文件,历史简化通常通过提到特定的路径来打开。路径控制了哪些路径被保留,哪些被删除,以便比较每个提交中的保存的快照,以确定TREESAME性质。

在这里,git blame的文档在细节方面有点不足。


Git doesn't store that at all. Git recomputes this kind of information, every time you ask about it.

Each commit stores a full snapshot of every file. So file F in commit a123456 is "authored" by whoever committed a123456. That's not very interesting, at least, not on its own.

But each commit also stores, in its metadata, the hash ID of some set of parent commits. Most commits have exactly one parent: perhaps the commit before a123456 is 3141592, for instance. File F is probably in this earlier commit as well. If we compare the content of file F in commit 3141592 with that of file F in a123456, maybe some lines are different. If that's the case, we can claim that whoever made a123456 really did write those particular lines, and whoever made 3141592 wrote the earlier lines.

But wait! 3141592 also has a parent, such as 2147483. That commit probably has file F too. If so, we repeat the comparing process: did the author of 3141592 change some lines, or simply carry the file through from before? Or, if 2147483 does not have F after all, we can deduce that all of these lines were authored by whoever made 3141592.

Note how Git has to start at the end of some linear chain of commits and work backwards. A program like git blame "assigns ownership" of some source-code line of some file when, during this backwards walk, the line changes to read however it does in the final commit. If it doesn't change, we don't yet know who to say wrote the line: we have to keep going back.

What about merges?

A merge commit stores the same snapshot as any non-merge commit, but instead of containing the hash ID of one parent, it has the hash ID of two parent commits. So if we have a merge commit in hand, we can compare it to either parent. Suppose file F is in merge commit M, and M has parents J and L. If F exactly matches both parents' copies, F is probably not changed all the way back to the merge base:

         /    \
...--G--H      M
         \    /

The F in M probably matches file F in H, and was not changed in any of either set of commits between H and M.

But if the F in M matches the F in J, and doesn't match the F in L, why then, we must have picked the copy in J when merging. So the copy in L probably matches the copy in H, while the one in J is probably different. We should walk from M to J to "assign blame" for changes in F.

This is particularly weird since we follow the parent that didn't change the file. But that's how git log's History Simplification works: when doing history simplification, git log picks some parent in which a file didn't change, and goes down just that one leg of history.<sup>1</sup>

If M's F doesn't match either of its inputs, both lines of commits must have contributed. Was there a merge conflict? We have no idea: we only know that the snapshot in M does not match either of the two snapshots in J and L. The git log command will in this case not simplify away one leg of the merge. What git blame will do is a bit of a mystery as it has never really been documented (and the algorithms for blame have evolved over time).

<sup>1</sup>There is a detailed description of how this works in the git log documentation, using the word TREESAME. Git looks not just at one file, but rather at every retained file, with history simplification normally being turned on by mentioning particular pathnames. The pathnames control which paths are retained, and which are stripped, in order to compare the saved snapshots in each commit, to determine TREESAME-ness.

Here, git blame's documentation is a little weak on detail.


得分: 1

Git有另一个“隐藏选项”git rerere

git rerere

Recorded Reused Resolution(记录的重复使用的解决方案)


  • 冲突的结果
  • 解决(已解决)的文件


$ tree .git/rr-cache
└── f08b1....(40位SHA-1)
    ├── postimage
    └── preimage


git config --global rerere.autoupdate true



One more addon on to of @torec answer

Git has another "hidden option" git rerere.

git rerere

Recorded Reused Resolution

Once rerere is enabled git will store the following information:

  • The result of the conflict
  • The solution (resolved) file

# enabled the option to record the 
git config --global rerere.enabled true

The output of the rerere once it resolves conflicts:

$ tree .git/rr-cache
└── f08b1.... (40 digits SHA-1)
    ├── postimage
    └── preimage

By the way, if you prefer rerere to auto-stage files it solved (I do), you can ask it to: you just need to tweak your configuration like so:

git config --global rerere.autoupdate true


