git: fatal: could not find pack 'pack-xxxxxxx.pack'

huangapple go评论59阅读模式
英文:

git: fatal: could not find pack 'pack-xxxxxxx.pack'

问题

出现了一个奇怪的问题。不确定是如何引起的。

$ git gc
正在枚举对象:625644,完成。
正在计数对象:100%(625644/625644),完成。
使用最多 16 个线程进行增量压缩
压缩对象:100%(126399/126399),完成。
写入对象:100%(625644/625644),完成。
总共 625644(增量 497529),重用 622563(增量 494488),未重用包 0
致命错误:找不到包 'pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack'
致命错误:无法运行重打包

我可以看到它所指的文件,不确定它是否损坏。它是二进制文件。

$ ls -lsah .git/objects/pack/pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack
1168 -rw-------  1 javier  staff   584K Oct  8  2021 .git/objects/pack/pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack

仔细看,这个哈希值没有与此目录中其他 .pack 文件一样的 .rev/.idx。这是否是问题的原因?

英文:

Having a strange problem. Not sure how I caused it.

$ git gc
Enumerating objects: 625644, done.
Counting objects: 100% (625644/625644), done.
Delta compression using up to 16 threads
Compressing objects: 100% (126399/126399), done.
Writing objects: 100% (625644/625644), done.
Total 625644 (delta 497529), reused 622563 (delta 494488), pack-reused 0
fatal: could not find pack 'pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack'
fatal: failed to run repack

I can see the file it's referring to, not sure if it's corrupted or not. It's all binary.

$ ls -lsah .git/objects/pack/pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack
1168 -rw-------  1 javier  staff   584K Oct  8  2021 .git/objects/pack/pack-6f30656f301f5f88438a5216f1df773bafcdf6d3.pack

Looking a little closer, this hash doesn't have an accompanying .rev/.idx like the rest of the .pack files have in this directory. Is that the issue?

答案1

得分: 2

这似乎在Git 2.42(2023年第三季度)中得到了解决,该版本现在避免了“git pack-objects --cruft(man)由于代码在枚举存储库中的pack文件时存在不一致性而导致的问题。

请参阅提交 73320e4(2023年6月7日),由Taylor Blau (ttaylorr)提交。
(由Junio C Hamano -- gitster --提交 e224f26中合并,2023年6月26日)

> ## builtin/repack.c: 仅收集完全形成的pack文件
> <sup>报告者:Michael Haggerty</sup>
> <sup>由 Taylor Blau 签名</sup>

> 为了根据“kept”(具有.keep文件或通过--keep-pack选项标记的pack)和“non-kept”(其他情况)对pack集进行分区,git repack(man](https://git-scm.com/docs/git-repack)) 使用了其collect_pack_filenames()函数。
>
> 通常情况下,我们会依赖方便的函数(例如get_all_packs())来枚举和分区pack集。
> 但是collect_pack_filenames()直接使用readdir()读取“$GIT_DIR/objects/pack”目录的内容,并将以“.pack”结尾的每个条目添加到适当的列表中(如上所述,包括kept或non-kept)。
>
> 这是微妙的竞态条件,因为collect_pack_filenames()可能会看到一个尚未完全 staged(即缺少其“.idx”文件)的pack。
> 通常情况下,这不会引起问题。
> 但是在生成cruft pack时可能会引起问题。
>
> 这是因为git repack将现有的kept pack列表传递给git pack-objects --cruft(man](https://git-scm.com/docs/git-pack-objects#Documentation/git-pack-objects.txt---cruft)),以指示任何kept pack不会从存储库中删除(以便cruft pack机制可以避免打包出现在这些pack中的对象作为cruft)。
>
> 但是read_cruft_objects()通过调用get_all_packs()列出pack文件。
> 因此,如果存在“.pack”文件(通过collect_pack_filenames()出现pack的必要条件),但没有相应的“.idx”文件(通过get_all_packs()出现pack的必要条件),我们将报错:
>
> fatal: 无法找到pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
>
> 通过教collect_pack_filenames()只收集具有相应的*.idx文件的pack,从而修复上述问题,指示这些pack已完全staged。
>
> 还有一些值得注意的事情:
>
> - 由于extra_keep列表中的每个条目(其中包含--keep-pack名称)都带有*.pack后缀,所以我们必须将后缀从“.pack”更改为“.idx”,然后进行比较。
>
> - 由于我们使用fname_kept_list来确定要删除的pack(使用git repack -d(man](https://git-scm.com/docs/git-repack#Documentation/git-repack.txt--d))),因此以前会删除没有索引的*.pack(因为“.pack”文件的存在是包括在现有非kept pack列表中的必要条件)。
>
> 现在我们将其保留下来(因为该pack不会出现在列表中)。这是更加正确的行为,因为我们不希望与正在staged的pack竞争。删除部分staged的pack是不太可能的,因为在staged pack的.idx文件移动到位之间的时间窗口微不足道。
>
> 请注意,此时间窗口不包括接收和索引pack所需的时间,因为传入数据放在“$GIT_DIR/objects/tmp_pack_XXXXXX”中,不以“.pack”结尾,因此collect_pack_filenames()会忽略它。
>
> 在将来,这个函数可能应该重写为for_each_file_in_pack_dir()的回调,但这是短期内可以做的最简单的更改。


另外,仍然与Git 2.42(2023年第三季度)一

英文:

That seems to be addressed with Git 2.42 (Q3 2023), which now avoids breakage of "git pack-objects --cruft"<sup>(man)</sup> due to inconsistency between the way the code enumerates packfiles in the repository.

See commit 73320e4 (07 Jun 2023) by Taylor Blau (ttaylorr).
<sup>(Merged by Junio C Hamano -- gitster -- in commit e224f26, 26 Jun 2023)</sup>

> ## builtin/repack.c: only collect fully-formed packs
> <sup>Reported-by: Michael Haggerty</sup>
> <sup>Signed-off-by: Taylor Blau</sup>

> To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the --keep-pack option) and "non-kept" ones (anything else), git repack<sup>(man)</sup> uses its collect_pack_filenames() function.
>
> Ordinarily, we would rely on a convenience function such as get_all_packs() to enumerate and partition the set of packs.
> But collect_pack_filenames() uses readdir() directly to read the contents of the &quot;$GIT_DIR/objects/pack&quot; directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above).
>
> This is subtly racy, since collect_pack_filenames() may see a pack that is not fully staged (i.e., it is missing its ".idx" file).
> Ordinarily, this doesn't cause a problem.
> But it can cause issues when generating a cruft pack.
>
> This is because git repack feeds (among other things) the list of existing kept packs down to git pack-objects --cruft<sup>(man)</sup> to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft).
>
> But read_cruft_objects() lists packfiles by calling get_all_packs().
> So if a ".pack" file exists (necessary to get that pack to appear to collect_pack_filenames()), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via get_all_packs()), we'll complain with:
>
> fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
>
> Fix the above by teaching collect_pack_filenames() to only collect packs with their corresponding *.idx files in place, indicating that those packs have been fully staged.
>
> There are a couple of things worth noting:
>
> - Since each entry in the extra_keep list (which contains the
> --keep-pack names) has a *.pack suffix, we'll have to swap the
> suffix from ".pack" to ".idx", and compare that instead.
>
> - Since we use the the fname_kept_list to figure out which packs to delete (with git repack -d<sup>(man)</sup>), we would have previously deleted a
> *.pack with no index (since the existince of a ".pack" file is
> necessary and sufficient to include that pack in the list of
> existing non-kept packs).
>
> Now we will leave it alone (since that pack won't appear in the
> list). This is far more correct behavior, since we don't want
> to race with a pack being staged. Deleting a partially staged pack
> is unlikely, however, since the window of time between staging a
> pack and moving its .idx file into place is miniscule.
>
> Note that this window does not include the time it takes to
> receive and index the pack, since the incoming data goes into
> "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
> and is thus ignored by collect_pack_filenames().
>
> In the future, this function should probably be rewritten as a callback to for_each_file_in_pack_dir(), but this is the simplest change we could do in the short-term.


And, still with Git 2.42 (Q3 2023), we create .pack and then .idx, we consider only packfiles that have .idx usable (those with only .pack are not ready yet), so we should remove .idx before removing .pack for consistency.

See commit 0dd1324 (20 Jun 2023) by Derrick Stolee (derrickstolee).
<sup>(Merged by Junio C Hamano -- gitster -- in commit b2166b0, 29 Jun 2023)</sup>

> ## packfile: delete .idx files before .pack files
> <sup>Signed-off-by: Derrick Stolee</sup>

> When installing a packfile, we place the .pack file before the .idx file.
> The intention is that Git scans for .idx files in the pack directory and then loads the .pack files from that list.
>
> However, when we delete packfiles, we do not do this in the reverse order as we should.
> The unlink_pack_path() method deletes the .pack followed by the .idx.
>
> This creates a window where the process could be interrupted between the .pack deletion and the .idx deletion, leaving the repository in a state that looks strange, but isn't actually too problematic if we assume the pack was safe to delete.
> The .idx without a .pack will cause some overhead, but will not interrupt other Git processes.
>
> This ordering was introduced into the 'git repack'<sup>(man)</sup> builtin by a1bbc6c ("repack: rewrite the shell script in C", 2013-09-15, Git v1.8.5-rc0 -- merge), though we must be careful to track history through the code move in 8434e85 ("repack: refactor pack deletion for future use", 2019-06-10, Git v2.23.0-rc0 -- merge listed in batch #6) to see that.
>
> This became more important after 73320e4 ("builtin/repack.c: only collect fully-formed packs", 2023-06-07, Git v2.42.0 -- merge listed in batch #5) changed how 'git repack' scanned for packfiles for use in the cruft pack process.
> It previously looked for .pack files, but that was problematic due to the order that packs are installed: repacks between the creation of a .pack and the creation of its .idx would result in hard failures.
>
> There is an independent proposal about what to do in the case of a .idx without a .pack during this 'git repack' scenario, but this change is focused on deleting .pack files more safely.
>
> Modify the order to delete the .idx before the .pack.
> The rest of the modifiers on the .pack should still come after the .pack so we know all of the presumed properties of the packfile as long as it exists in the filesystem, in case we wish to reinstate it by re-indexing the .pack file.


The recent change to "git repack"<sup>(man)</sup> made it react less nicely when a leftover .idx file that no longer has the corresponding .pack file in the repository: that has been corrected with Git 2.42 (Q3 2023).

See commit def390d (11 Jul 2023) by Taylor Blau (ttaylorr).
See commit 0af0672 (11 Jul 2023) by Derrick Stolee (derrickstolee).
<sup>(Merged by Junio C Hamano -- gitster -- in commit c6a5e1a, 18 Jul 2023)</sup>

> ## builtin/repack.c: avoid dir traversal in collect_pack_filenames()
> <sup>Signed-off-by: Taylor Blau</sup>

> When repacking, the function collect_pack_filenames() is responsible for collecting the set of existing packs in the repository, and partitioning them into "kept" (if the pack has a ".keep" file or was given via --keep-pack) and "nonkept" (otherwise) lists.
>
> This function comes from the original C port of git-repack.sh from back in a1bbc6c ("repack: rewrite the shell script in C", 2013-09-15, Git v1.8.5-rc0 -- merge), where it first appears as get_non_kept_pack_filenames().
> At the time, the implementation was a fairly direct translation from the relevant portion of git-repack.sh, which looped over the results of
>
> find "$PACKDIR" -type f -name '*.pack'
>
> either ignoring the pack as kept, or adding it to the list of existing packs.
>
> So the choice to directly translate this function in terms of readdir() in a1bbc6c made sense.
> At the time, it was possible to refine the C version in terms of packed_git structs, but was never done.
>
> However, manually enumerating a repository's packs via readdir() is confusing and error-prone.
> It leads to frustrating inconsistencies between which packs Git considers to be part of a repository (i.e., could be found in the list of packs from get_all_packs()), and which packs collect_pack_filenames() considers to meet the same criteria.
>
> This bit us in 73320e4 ("builtin/repack.c: only collect fully-formed packs", 2023-06-07, Git v2.42.0 -- merge listed in batch #5), and again in the previous commit.
>
> Prevent these issues from biting us in the future by implementing the collect_pack_filenames() function by looping over an array of pointers to packed_git structs, ensuring that we use the same criteria to determine the set of available packs.
>
> One gotcha here is that we have to ignore non-local packs, since the original version of collect_pack_filenames() only looks at the local pack directory to collect existing packs.

huangapple
  • 本文由 发表于 2023年6月14日 23:53:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475423.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定