How to find large objects in the last fetch?


This morning git fetch took a little longer than usual due to it downloading 206 MB. (Usually it's less than 1 MB as I fetch frequently.) I last fetched this repo a couple of days ago, and there were about 30 branches updated since then. I want to know which branch added the commit containing the large file sizes so I could work with the developer to determine if we should change something before it gets merged into a shared branch (which would lock the large files into the history permanently).

I know we can list large files in a repo, but in this case I'd like to see the list of large files that came in with the most recent fetch. I'm not sure if that's possible after the fetch was already done, but perhaps almost as good would be seeing all large objects that I fetched in the last X days. And if not even that, perhaps I could find all large objects in commits with committer dates in the last X days. (I'm fairly certain the last option is possible with some scripting, though it isn't quite as nice since it's possible someone recently pushed an old commit for the first time.)

Side Note: in this case I glanced at the list of branch names and was able to guess correctly which branch it was. It turns out the developer had accidentally added a commit with many image files, and then realizing the mistake had added another commit which deleted them all. They already had planned to squash those two commits before completing the PR, and simply didn't realize they should have squashed those two commits before even pushing. My immediately need is solved for today, but the next time it happens I'd like to do better than my current answer of just guessing and checking the branches manually.


To find what's been updated you can git reflog --remotes --date=short¹

then you can run a diff of the updated ref with and without the reflog selector, so if the most recent origin/main entry looks like it could use some examining you can

git diff --name-status --diff-filter=A origin/main@{2023-02-25}..origin/main

will show you all the files added by last week's surprise Saturday pull to origin/main, tune as needed. git log of that range can show all the commits added by the pull, and so forth.

¹ random note: git reflog without a subcommand defaults to git reflog show, its docs merely hint this but that is interpreted as git log -g --oneline aka git log --walk-reflogs --oneline, you can extract whatever info you like about the commits with all of git log's formatting machinery, the reflog-selector format symbol is %gd.


One way could be:

  • guess the branches of "the last fetch",
  • use the reflog to scan the range &lt;previous&gt;..&lt;now&gt; for each of these remote branches.

The tricky part is the first point :

  • if you still have the output of your last git fetch command, you can get the list of the last updated references, and feed that into a loop which can scan the reflog:
# say ref_names.txt contains names like &#39;origin/master&#39;, &#39;origin/feature1&#39; ...
cat ref_names.txt | while read ref; do
    git rev-list --objects $ref@{1}..$ref
  • you may bluntly iterate over all origin/* references, and scan $ref@{1}..$ref -- this may get you too many commits to scan through, but you would be 100% sure to scan all the branches updated by your latest git fetch

  • you may use an api on your central server to spot the actions that updated a branch say in the last two days, and scan those branches only,

  • you may dig into the log files themselves:

a reflog line looks like:

$ tail -1 .git/logs/refs/remotes/origin/master
454dfcbddf9624c129fa7600b3c774b99e36cb43 d15644fe0226af7ffc874572d968598564a230dd User Name &lt;user@email.com&gt; 1678166909 +0400	fetch: fast-forward

the timestamp mentionned after the email is the time that ref was updated on your repo,so it roughly matches the timestamp of the last git fetch which updated this particular remote branch.

Oddly, I haven't found a way to print that value in a formatted way with git log format flags -- not in the documentation at least.

You may still use that information (e.g: go through all log files, look for lines mentioning fetch or pull, and keep the highest timestamp) to guess after the facts when your last git fecth occured, and filter the branches that got updated based on this information.

