英文:
How does git store resolved conflicts after merges and its author history
问题
今天我完成了阅读有关 Git 内部的书籍章节,我认为我已经对 Git 的工作原理有了一个很好的概述。
但是我仍然不明白的是,在冲突解决后,Git 是如何以及在哪里存储每行的作者信息(已解决的冲突)。
据我所了解,提交只创建了4个元素,其中没有一个涵盖了“行 <==> 作者关系”。
helloworld.txt
这行(1)由作者foo从分支foo接受
这行(2)由作者bar从分支bar接受
英文:
Today I finished reading the book chapter about the internals of git and I think I got a good overview of how git works.
But what I still don't understand is how and where git stores the author information of each line (resolved conflict) after a conflict got resolved.
As far as I understood commits just create 4 elements, and none of them cover a line <==> author relationship
helloworld.txt
This line (1) got accepted from branch foo by author foo
This line (2) got accepted from branch bar by author bar
答案1
得分: 7
Git根本不会存储这种信息。Git每次你查询它时都会重新计算这种信息。
每个提交都会存储每个文件的完整快照。因此,提交a123456
中的文件F是由提交a123456
的作者编写的。这本身并不是非常有趣,至少单独来看不是。
但是每个提交还在其元数据中存储了一组父提交的哈希ID。大多数提交都只有一个父提交:也许提交a123456
之前的提交是3141592
,例如。文件F很可能也在这个较早的提交中。如果我们比较提交3141592
中文件F的内容与提交a123456
中的文件F,也许某些行是不同的。如果是这样,我们可以断定提交a123456
的作者确实编写了这些特定行,而提交3141592
的作者编写了较早的行。
但等等!3141592
也有一个父提交,比如2147483
。那个提交可能也有文件F。如果是这样,我们会重复比较过程:提交3141592
的作者是否更改了某些行,还是只是从之前的提交中继承了文件?或者,如果2147483
实际上没有F,我们可以推断所有这些行都是由提交3141592
的作者编写的。
请注意,Git必须从一些线性提交链的末尾开始向后工作。像git blame
这样的程序在这个后向遍历期间,当某一行更改为最终提交中的内容时,会"分配所有权"给某个源代码文件的某一行。如果它没有更改,我们还不知道要说这一行是谁编写的:我们必须继续向后查找。
合并又如何?
一个合并提交存储与任何非合并提交相同的快照,但是不同于包含一个父提交的哈希ID,它包含两个父提交的哈希ID。因此,如果我们手头有一个合并提交,我们可以将其与任一父提交进行比较。假设文件F位于合并提交M
中,而M
有父提交J
和L
。如果F
与两个父提交的副本完全匹配,那么F很可能没有一直更改到合并基础:
I--J
/ \
...--G--H M
\ /
K--L
在M
中的F很可能与H
中的文件F匹配,并且在H
和M
之间的任一组提交中都没有更改。
但是如果M
中的F与J
中的F匹配,并且与L
中的F不匹配,那么我们必须在合并时选择了J
中的副本。因此,L
中的副本很可能与H
中的副本匹配,而J
中的副本可能不同。我们应该从M
到J
进行"指责分配",以了解F的更改情况。
这尤其奇怪,因为我们遵循了未更改文件的父级。但这就是git log
的历史简化的工作方式:在进行历史简化时,git log
选择一些文件没有更改的父级,并沿着历史的这一支向下走。1
如果M
的F既不匹配其中一个输入,那么两个提交的所有行都必须有所贡献。是否存在合并冲突?我们不知道:我们只知道M
中的快照与J
和L
中的两个快照都不匹配。在这种情况下,git log
命令将不会简化合并的一个分支。git blame
会做什么有点神秘,因为它从未真正有过详细的文档(而且 blame 的算法随着时间的推移也有所发展)。
1 在git log文档中有关于这是如何工作的详细描述,使用了TREESAME这个词。Git不仅查看一个文件,而是查看每个保留的文件,历史简化通常通过提到特定的路径来打开。路径控制了哪些路径被保留,哪些被删除,以便比较每个提交中的保存的快照,以确定TREESAME性质。
在这里,git blame
的文档在细节方面有点不足。
英文:
Git doesn't store that at all. Git recomputes this kind of information, every time you ask about it.
Each commit stores a full snapshot of every file. So file F in commit a123456
is "authored" by whoever committed a123456
. That's not very interesting, at least, not on its own.
But each commit also stores, in its metadata, the hash ID of some set of parent commits. Most commits have exactly one parent: perhaps the commit before a123456
is 3141592
, for instance. File F is probably in this earlier commit as well. If we compare the content of file F in commit 3141592
with that of file F in a123456
, maybe some lines are different. If that's the case, we can claim that whoever made a123456
really did write those particular lines, and whoever made 3141592
wrote the earlier lines.
But wait! 3141592
also has a parent, such as 2147483
. That commit probably has file F too. If so, we repeat the comparing process: did the author of 3141592
change some lines, or simply carry the file through from before? Or, if 2147483
does not have F after all, we can deduce that all of these lines were authored by whoever made 3141592
.
Note how Git has to start at the end of some linear chain of commits and work backwards. A program like git blame
"assigns ownership" of some source-code line of some file when, during this backwards walk, the line changes to read however it does in the final commit. If it doesn't change, we don't yet know who to say wrote the line: we have to keep going back.
What about merges?
A merge commit stores the same snapshot as any non-merge commit, but instead of containing the hash ID of one parent, it has the hash ID of two parent commits. So if we have a merge commit in hand, we can compare it to either parent. Suppose file F is in merge commit M
, and M
has parents J
and L
. If F
exactly matches both parents' copies, F is probably not changed all the way back to the merge base:
I--J
/ \
...--G--H M
\ /
K--L
The F in M
probably matches file F in H
, and was not changed in any of either set of commits between H
and M
.
But if the F in M
matches the F in J
, and doesn't match the F in L
, why then, we must have picked the copy in J
when merging. So the copy in L
probably matches the copy in H
, while the one in J
is probably different. We should walk from M
to J
to "assign blame" for changes in F.
This is particularly weird since we follow the parent that didn't change the file. But that's how git log
's History Simplification works: when doing history simplification, git log
picks some parent in which a file didn't change, and goes down just that one leg of history.<sup>1</sup>
If M
's F doesn't match either of its inputs, both lines of commits must have contributed. Was there a merge conflict? We have no idea: we only know that the snapshot in M
does not match either of the two snapshots in J
and L
. The git log
command will in this case not simplify away one leg of the merge. What git blame
will do is a bit of a mystery as it has never really been documented (and the algorithms for blame have evolved over time).
<sup>1</sup>There is a detailed description of how this works in the git log
documentation, using the word TREESAME. Git looks not just at one file, but rather at every retained file, with history simplification normally being turned on by mentioning particular pathnames. The pathnames control which paths are retained, and which are stripped, in order to compare the saved snapshots in each commit, to determine TREESAME-ness.
Here, git blame
's documentation is a little weak on detail.
答案2
得分: 1
Git有另一个“隐藏选项”git rerere
。
git rerere
Recorded Reused Resolution(记录的重复使用的解决方案)
一旦启用rerere
,Git将存储以下信息:
- 冲突的结果
- 解决(已解决)的文件
启用rerere
选项的输出:
$ tree .git/rr-cache
.git/rr-cache
└── f08b1....(40位SHA-1)
├── postimage
└── preimage
顺便说一下,如果您喜欢rerere自动暂存已解决的文件(我喜欢),您可以这样配置它:
git config --global rerere.autoupdate true
英文:
One more addon on to of @torec answer
Git has another "hidden option" git rerere
.
git rerere
Recorded Reused Resolution
Once rerere
is enabled git will store the following information:
- The result of the conflict
- The solution (resolved) file
# enabled the option to record the
git config --global rerere.enabled true
The output of the rerere
once it resolves conflicts:
$ tree .git/rr-cache
.git/rr-cache
└── f08b1.... (40 digits SHA-1)
├── postimage
└── preimage
By the way, if you prefer rerere to auto-stage files it solved (I do), you can ask it to: you just need to tweak your configuration like so:
git config --global rerere.autoupdate true
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论