英文:
How to find large objects in the last fetch?
问题
今天早上 git fetch
比平常花了更长的时间,因为它下载了 206 MB 的数据。通常情况下,我经常进行fetch,所以下载的数据量少于 1 MB。我上次fetch这个仓库是几天前,自那时以来有大约 30 个分支被更新。我想知道是哪个分支添加了包含大文件的提交,以便我可以与开发者合作,确定在合并到共享分支之前是否需要进行更改(这将永久将大文件锁定在历史中)。
我知道我们可以列出仓库中的大文件,但在这种情况下,我想查看最近fetch的大文件列表。我不确定在fetch之后是否还有可能,但几乎同样好的办法是查看我在过去 X 天中fetch的所有大对象。如果不行,也许我可以找到最近 X 天内提交的提交者日期包含大对象的所有大对象。(我相当肯定最后一种选项可以通过一些脚本来实现,尽管这不是很好,因为有可能有人最近第一次推送了一个旧的提交。)
**另外注意:**在这种情况下,我浏览了分支名称列表,并且成功猜出了是哪个分支。原来开发者不小心添加了一个包含许多图像文件的提交,后来意识到错误并添加了另一个提交,将它们全部删除。他们已经计划在完成PR之前合并这两个提交,只是没有意识到他们应该在推送之前就合并这两个提交。我当前的需求已经解决了,但下次发生类似情况时,我希望能比目前的答案更好,不仅仅是猜测并手动检查分支。
英文:
This morning git fetch
took a little longer than usual due to it downloading 206 MB. (Usually it's less than 1 MB as I fetch frequently.) I last fetched this repo a couple of days ago, and there were about 30 branches updated since then. I want to know which branch added the commit containing the large file sizes so I could work with the developer to determine if we should change something before it gets merged into a shared branch (which would lock the large files into the history permanently).
I know we can list large files in a repo, but in this case I'd like to see the list of large files that came in with the most recent fetch. I'm not sure if that's possible after the fetch was already done, but perhaps almost as good would be seeing all large objects that I fetched in the last X days. And if not even that, perhaps I could find all large objects in commits with committer dates in the last X days. (I'm fairly certain the last option is possible with some scripting, though it isn't quite as nice since it's possible someone recently pushed an old commit for the first time.)
Side Note: in this case I glanced at the list of branch names and was able to guess correctly which branch it was. It turns out the developer had accidentally added a commit with many image files, and then realizing the mistake had added another commit which deleted them all. They already had planned to squash those two commits before completing the PR, and simply didn't realize they should have squashed those two commits before even pushing. My immediately need is solved for today, but the next time it happens I'd like to do better than my current answer of just guessing and checking the branches manually.
答案1
得分: 2
要查找已更新的内容,您可以使用以下命令:
git reflog --remotes --date=short
然后,您可以运行更新的引用与没有引用选择器的差异,因此,如果最近的 origin/main
条目看起来需要检查,您可以使用以下命令:
git diff --name-status --diff-filter=A origin/main@{2023-02-25}..origin/main
这将显示上周意外拉取到 origin/main
的所有文件。根据需要进行调整。对该范围的 git log
可以显示由拉取添加的所有提交等等。
英文:
To find what's been updated you can git reflog --remotes --date=short
¹
then you can run a diff of the updated ref with and without the reflog selector, so if the most recent origin/main
entry looks like it could use some examining you can
git diff --name-status --diff-filter=A origin/main@{2023-02-25}..origin/main
will show you all the files added by last week's surprise Saturday pull to origin/main
, tune as needed. git log
of that range can show all the commits added by the pull, and so forth.
¹ random note: git reflog
without a subcommand defaults to git reflog show
, its docs merely hint this but that is interpreted as git log -g --oneline
aka git log --walk-reflogs --oneline
, you can extract whatever info you like about the commits with all of git log
's formatting machinery, the reflog-selector format symbol is %gd
.
答案2
得分: 1
一种方法可能是:
- 猜测“最后一次提取”分支,
- 使用 reflog 来扫描每个远程分支的范围
<previous>..<now>
。
棘手的部分是第一点:
- 如果你仍然有你上次
git fetch
命令的输出,你可以获取最后更新的引用列表,并将其输入到一个循环中,该循环可以扫描 reflog:
# 假设 ref_names.txt 包含类似 'origin/master', 'origin/feature1' ...
cat ref_names.txt | while read ref; do
git rev-list --objects $ref@{1}..$ref
done
-
你可以粗暴地遍历所有
origin/*
引用,然后扫描$ref@{1}..$ref
— 这可能会让你扫描过多的提交,但你可以确保扫描最近git fetch
更新的所有分支。 -
你可以使用中央服务器上的 API 来查找在过去两天内更新分支的操作,并仅扫描这些分支,
-
你可以深入查看日志文件本身:
引用日志行如下:
$ tail -1 .git/logs/refs/remotes/origin/master
454dfcbddf9624c129fa7600b3c774b99e36cb43 d15644fe0226af7ffc874572d968598564a230dd User Name <user@email.com> 1678166909 +0400 fetch: fast-forward
电子邮件之后提到的时间是引用在你的存储库上更新的时间,因此它大致匹配了最后一次更新此特定远程分支的git fetch
时间戳。
奇怪的是,我至少在文档中没有找到使用 git log
格式标志以格式化方式打印该值的方法。
你仍然可以使用这些信息(例如:浏览所有日志文件,查找提及fetch
或pull
的行,并保留最高时间戳)来猜测事后你的最后一次git fetch
发生的时间,并根据此信息筛选已更新的分支。
英文:
One way could be:
- guess the branches of "the last fetch",
- use the reflog to scan the range
<previous>..<now>
for each of these remote branches.
The tricky part is the first point :
- if you still have the output of your last
git fetch
command, you can get the list of the last updated references, and feed that into a loop which can scan the reflog:
# say ref_names.txt contains names like 'origin/master', 'origin/feature1' ...
cat ref_names.txt | while read ref; do
git rev-list --objects $ref@{1}..$ref
done
-
you may bluntly iterate over all
origin/*
references, and scan$ref@{1}..$ref
-- this may get you too many commits to scan through, but you would be 100% sure to scan all the branches updated by your latestgit fetch
-
you may use an api on your central server to spot the actions that updated a branch say in the last two days, and scan those branches only,
-
you may dig into the log files themselves:
a reflog line looks like:
$ tail -1 .git/logs/refs/remotes/origin/master
454dfcbddf9624c129fa7600b3c774b99e36cb43 d15644fe0226af7ffc874572d968598564a230dd User Name <user@email.com> 1678166909 +0400 fetch: fast-forward
the timestamp mentionned after the email is the time that ref was updated on your repo,so it roughly matches the timestamp of the last git fetch
which updated this particular remote branch.
Oddly, I haven't found a way to print that value in a formatted way with git log
format flags -- not in the documentation at least.
You may still use that information (e.g: go through all log files, look for lines mentioning fetch
or pull
, and keep the highest timestamp) to guess after the facts when your last git fecth
occured, and filter the branches that got updated based on this information.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论