在 `git filter-repo` 后,相同提交哈希的不同树哈希。

huangapple go评论89阅读模式
英文:

Different tree hashes for same commit hash following git filter-repo

问题

我注意到在一个仓库上运行 git filter-repo 命令时,使用 --replace-text 选项后出现了这个异常,结果在一个更简单的示例中也能重现。这似乎违反了 git 的基本公理之一,即如果两个提交具有相同的提交哈希,它们就是同一个提交,因此树内容和树哈希应该是相同的(除了极其不可能的碰撞,这种情况在使用任意数据时也不会重现)。

重现步骤:
从一个全新的仓库(在目录 test_repo 中)开始,其历史非常简短,如下所示:

$ git log -p
commit 00fc46c3372592059ff63bc0dfb610d8e583fa3e (HEAD -> master)
Author: REDACTED
Date:   Thu Mar 9 10:04:38 2023 +0000

    second change

diff --git a/foo b/foo
index 8d5d568..243cf01 100644
--- a/foo
+++ b/foo
@@ -1,2 +1,4 @@
initial

+zzzzz
+

commit 19f6a7da7e8d9ca30a644872167bd7f17c3b5f92
Author: REDACTED
Date:   Thu Mar 9 10:04:23 2023 +0000

    blah

diff --git a/foo b/foo
new file mode 100644
index 0000000..8d5d568
--- /dev/null
+++ b/foo
@@ -0,0 +1,2 @@
+initial
+

创建一个文本替换文件:
repl_file:

zzz==>一二三四五六七八九零一二三

克隆它并使用 git filter-repo 进行重写:

git clone . ../test_repo_mod
cd ../test_repo_mod
git filter-repo --replace-text ../test_repo/repl_file

这符合预期。替换没有影响初始提交,所以它保持相同的哈希,但是第二个提交有不同的哈希。

$ git show
commit a2b5518f99f3a42d55931021e4a79c59f0971734 (HEAD -> master)
Author: REDACTED
Date:   Thu Mar 9 10:04:38 2023 +0000

    second change

diff --git a/foo b/foo
index 8d5d568..5d94489 100644
--- a/foo
+++ b/foo
@@ -1,2 +1,4 @@
initial

+一二三四五六七八九零一二三
+

然而,当我想引入旧历史时,情况就变得奇怪了:

$ git remote add prefilter ../test_repo
$ git fetch prefilter
[输出省略]

$ git log --graph --all  --pretty=format:'(%D) H:%H - T:%T' 
* (master) H:a2b5518f99f3a42d55931021e4a79c59f0971734 - T:0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
| * (HEAD, replaced, prefilter/master) H:00fc46c3372592059ff63bc0dfb610d8e583fa3e - T:0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
|/  
* () H:19f6a7da7e8d9ca30a644872167bd7f17c3b5f92 - T:6f3003a0d9bd438abda8b48741b0b0a68925dcfc

$ cd ../test_repo
$ git log --graph --all  --pretty=format:'(%D) H:%H - T:%T' 
* (HEAD -> master) H:00fc46c3372592059ff63bc0dfb610d8e583fa3e - T:179af78895dc9c453df9caa00ed83b7c5bb9a378
* () H:19f6a7da7e8d9ca30a644872167bd7f17c3b5f92 - T:6f3003a0d9bd438abda8b48741b0b0a68925dcfc

正如你所看到的,具有哈希 00fc46c3372592059ff63bc0dfb610d8e583fa3e 的提交,其树哈希(和内容)在我查看的仓库中是不同的。

git fsck 不显示错误,所以我不认为是 git filter-repo 损坏了数据。

添加:git cat-file 输出:

我已经删除了我的姓名+电子邮件地址,但每次都是相同的。
原始仓库:

$ git cat-file -p 00fc46c3372592059ff63bc0dfb610d8e583fa3e
tree 179af78895dc9c453df9caa00ed83b7c5bb9a378
parent 19f6a7da7e8d9ca30a644872167bd7f17c3b5f92
author REDACTED 1678356278 +0000
committer REDACTED 1678356278 +0000

second change

经过筛选的仓库:

$ git cat-file -p 00fc46c3372592059ff63bc0dfb610d8e583fa3e
tree 0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
parent 19f6a7da7e8

<details>
<summary>英文:</summary>

I noticed this anomaly after running `git filter-repo` with the  `--replace-text` option on a repository, and it turns out it&#39;s reproducible in a more simple example.

It seems to violate one of the basic axioms of git, that if two commits have the same commit hash, they *are* the same commit, and therefore the tree contents and tree hash should be the same. (Excepting an astronomically unlikely collision, which wouldn&#39;t be reproducible using arbitrary data anyway).

Steps to repro:
Start with a freshly `init`ed repo (in directory `test_repo`) with the following very short history:

$ git log -p
commit 00fc46c3372592059ff63bc0dfb610d8e583fa3e (HEAD -> master)
Author: REDACTED
Date: Thu Mar 9 10:04:38 2023 +0000

second change

diff --git a/foo b/foo
index 8d5d568..243cf01 100644
--- a/foo
+++ b/foo
@@ -1,2 +1,4 @@
initial

+zzzzz
+

commit 19f6a7da7e8d9ca30a644872167bd7f17c3b5f92
Author: REDACTED
Date: Thu Mar 9 10:04:23 2023 +0000

blah

diff --git a/foo b/foo
new file mode 100644
index 0000000..8d5d568
--- /dev/null
+++ b/foo
@@ -0,0 +1,2 @@
+initial
+

Make a text replacement file:
repl_file:

zzz==>yyyy


Clone it and perform the rewrite with git filter-repo:

git clone . ../test_repo_mod
cd ../test_repo_mod
git filter-repo --replace-text ../test_repo/repl_file


This works as expected. The replacements didn&#39;t affect the initial commit so it keeps the same hash, but the second commit has a different hash.

$ git show
commit a2b5518f99f3a42d55931021e4a79c59f0971734 (HEAD -> master)
Author: REDACTED
Date: Thu Mar 9 10:04:38 2023 +0000

second change

diff --git a/foo b/foo
index 8d5d568..5d94489 100644
--- a/foo
+++ b/foo
@@ -1,2 +1,4 @@
initial

+yyyyzz
+




However, things get weird when I want to bring in the old history:

$ git remote add prefilter ../test_repo
$ git fetch prefilter
[output removed]

$ git log --graph --all --pretty=format:'(%D) H:%H - T:%T'

  • (master) H:a2b5518f99f3a42d55931021e4a79c59f0971734 - T:0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
    | * (HEAD, replaced, prefilter/master) H:00fc46c3372592059ff63bc0dfb610d8e583fa3e - T:0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
    |/
  • () H:19f6a7da7e8d9ca30a644872167bd7f17c3b5f92 - T:6f3003a0d9bd438abda8b48741b0b0a68925dcfc

$ cd ../test_repo
$ git log --graph --all --pretty=format:'(%D) H:%H - T:%T'

  • (HEAD -> master) H:00fc46c3372592059ff63bc0dfb610d8e583fa3e - T:179af78895dc9c453df9caa00ed83b7c5bb9a378
  • () H:19f6a7da7e8d9ca30a644872167bd7f17c3b5f92 - T:6f3003a0d9bd438abda8b48741b0b0a68925dcfc

As you can see, the commit with hash `00fc46c3372592059ff63bc0dfb610d8e583fa3e` has a different tree hash (and different contents) depending on which repo I&#39;m viewing it in.

`git fsck` shows no errors, so I don&#39;t think it&#39;s `git filter-repo` corrupting the data.

ADDED: `git cat-file` output:

I&#39;ve removed my name+email but that&#39;s the same each time.
Original repo:

$ git cat-file -p 00fc46c3372592059ff63bc0dfb610d8e583fa3e
tree 179af78895dc9c453df9caa00ed83b7c5bb9a378
parent 19f6a7da7e8d9ca30a644872167bd7f17c3b5f92
author REDACTED 1678356278 +0000
committer REDACTED 1678356278 +0000

second change


Filtered repo:

$ git cat-file -p 00fc46c3372592059ff63bc0dfb610d8e583fa3e
tree 0258b53a79efebb3a1a18fae2f5f2b26338bfaf8
parent 19f6a7da7e8d9ca30a644872167bd7f17c3b5f92
author REDACTED 1678356278 +0000
committer REDACTED 1678356278 +0000

second change


</details>


# 答案1
**得分**: 2


`git filter-repo` 为旧 ID 添加了替代引用。执行 `git replace --list` 命令,你会看到日志中的 "(HEAD, replaced" 注释,这告诉你替代引用的信息。这样做可以使重写后的提交的消息等文本引用能够解析到重写历史中相应的提交。

你可以通过以下命令关闭替代引用查找:

```bash
git --no-replace-objects log --graph --all --pretty=format:'(%D) H:%H - T:%T'
英文:

git filter-repo adds replacement refs for the old id's. Say git replace --list, that's what the "(HEAD, replaced" notation in your log is telling you. This allows text references in places like messages on rewritten commits to resolve to the corresponding commits in the rewritten histories.

You can shut off the replacement lookups with e.g.

git --no-replace-objects log --graph --all  --pretty=format:&#39;(%D) H:%H - T:%T&#39;

huangapple
  • 本文由 发表于 2023年3月9日 19:06:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683743.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定