英文:
How can I make 'xargs' execute for all files found by 'find' command in bash script?
问题
I have *.test.*
files inside my src
folder and I wanna find authors who contributed the most. I came up with command below
find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c
but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find
? Or do I have to somehow "collect" results of git
command execution
I've googled that xargs
has -L
argument, but it didn't help. What am I doing wrong?
Expected result should be like output below
2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
英文:
I have *.test.*
files inside my src
folder and I wanna find authors who contributed the most. I came up with command below
find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c
but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find
? Or do I have to somehow "collect" results of git
command execution
I've googled that xargs
has -L
argument, but it didn't help. What am I doing wrong?
Expected result should be like output below
2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
答案1
得分: 5
"-0" 是给 "xargs" 的参数,所以必须传递给 "xargs" 而不是 "git blame"。
"git blame" 逐个文件工作,因此要使用 "-n1" 让 "xargs" 逐个传递文件。
英文:
find ./src/ -name "*.test.*" -print0 |
xargs -0 -n1 git --no-pager blame -p |
grep "^author " | uniq -c
-0
is an argument for xargs
so it must be passed to xargs
not to git blame
.
git blame
works one file at a time so make xargs
to pass one file using -n1
.
答案2
得分: 2
做了一些修改:
- 使用
xargs -n1
逐行执行 - 我们不需要
--no-pager
,因为git知道它被管道传输 - 使用
grep -Po "(?<=^author ).*"
删除了author
- 在
uniq
之前添加sort
,因为uniq
只计算连续的行 - 在
uniq
之后添加sort -nr
以倒序排序贡献者 - 添加行号以查看作者的排名,使用
nl
- 最后,对于400个文件,需要8秒,所以添加了
xargs -P0
以并行运行每个文件的git,只需1秒
更新
git blame -p
只计算作者的提交。在我看来,这不是真正的贡献。由于我们使用 git blame
,我认为我们对行更改感兴趣。幸运的是,git blame
有 --line-porcelain
,正是我们需要计算作者的行更改。
好的,但在我的情况下,xargs
未能以一致的方式传递如此大的数据,每次执行结果都不同。哎呀,即使计算提交时也有问题!我相信第一个带有 xargs
的示例中的这行是错误的!
38: 1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
我猜问题出在 xargs
的缓冲区,因此提供了损坏的行给 grep
。问题在这里描述,解决方案乍一看似乎不够优雅:https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel
安装了 GNU Parallel
并运行了命令,它是一致且足够快的!
所以首先:
"这里反对xargs,支持gnu parallel..."
然后是结果。正如你可以看到的,尽管只计算提交,但 Natalia Baganova
排名第二。所以这是真正的贡献,与仅仅计算提交不同:
$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
1: 4996 Twist
2: 4121 Natalia Baganova
3: 2771 zayceva
4: 2405 Anton
5: 2361 chuzhaikinadv
6: 1113 George
7: 1081 Nastie Deminka
8: 750 Kalchenko Ilia
9: 712 alnidok
10: 516 Lakhaev Andrey
11: 383 Andrey Smirnov
12: 365 dilya
13: 325 a.dadaev
14: 301 Andrey Klimenko
15: 291 sergey.ivanov
16: 134 Задорожний Александр
17: 124 George Barlukov
18: 116 Sergey
19: 59 egoprimary
20: 43 a.kaledin
21: 42 Mustafaeva Dilya
22: 41 Svetoslav Onosov
23: 38 Alexander Nenashev
24: 26 andrewsmirnov
25: 25 Nastie
26: 20 silentmantra
27: 18 Zharova Yaroslava
28: 16 Dilya Mustafaeva
29: 15 Andrey Kuznetsov
30: 9 over_ilaj
31: 9 Dilya
32: 6 Sharipova
33: 6 LaFut
34: 3 Александра Храпкова
35: 3 Irina Demchenko
36: 3 Andrey Lakhaev
37: 2 Vadim
38: 2 Kozyreva
39: 2 DESKTOP-0N4O88L\dev
40: 2 Alexander Zadoroznyi
real 0m1.334s
user 0m10.418s
sys 0m4.043s
英文:
Did some modifications:
xargs -n1
to execute line-by-line- we don't need
--no-pager
since git knows that it's piped - deleted
author
withgrep -Po "(?<=^author ).*"
- add
sort
beforeuniq
becauseuniq
counts only consecutive lines. - add
sort -nr
afteruniq
to sort contributors in reversed order. - add line numbers to see an author's rank with
nl
. - and finally with 400 files it takes 8 seconds. so added
xargs -P0
to run git for each file in parallel and it took 1 second.
$ time find ./src/ -name "*.twig" -print0 | xargs -0 -n1 -P0 git blame -p | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
1: 543 Twist
2: 273 chuzhaikinadv
3: 239 Anton
4: 225 zayceva
5: 204 Natalia Baganova
6: 113 Nastie Deminka
7: 103 Lakhaev Andrey
8: 79 sergey.ivanov
9: 72 alnidok
10: 70 Kalchenko Ilia
11: 41 Andrey Klimenko
12: 38 a.dadaev
13: 30 dilya
14: 20 a.kaledin
15: 17 George
16: 16 Svetoslav Onosov
17: 14 Andrey Smirnov
18: 13 Mustafaeva Dilya
19: 13 George Barlukov
20: 10 andrewsmirnov
21: 7 silentmantra
22: 6 Sergey
23: 6 Alexander Nenashev
24: 5 Задорожний Александр
25: 5 Dilya
26: 4 over_ilaj
27: 4 egoprimary
28: 4 Dilya Mustafaeva
29: 3 Zharova Yaroslava
30: 3 Nastie
31: 3 Irina Demchenko
32: 2 Александра Храпкова
33: 2 Vadim
34: 2 Sharipova
35: 2 LaFut
36: 2 Kozyreva
37: 2 Andrey Lakhaev
38: 1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
39: 1 DESKTOP-0N4O88L\dev
40: 1 Andrey Kuznetsov
41: 1 Alexander Zadoroznyi
real 0m0.917s
user 0m10.946s
sys 0m3.047s
UPDATE
git blame -p
counts only commits by an author. Imho it's not a real contribution. Since we are using git blame
I suppose we are interested in LINE CHANGES. Luckily git blame
has --line-porcelain
what exactly we need to count line changes by an author.
Ok, but in my case xargs
failed to deliver such big data in consistent manner and every execution the result was different. Oops, we have problem even with counting commits! I believe this line from the first example with xargs
is wrong!
38: 1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
I guess the problem is xargs
' buffering thus providing grep
with broken lines. The problem is described here and the solution at first glance doesnt seem elegant: https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel
Installed GNU Parallel
and launched the command, it's consistent and fast enough!
So first:
Here goes a vote against xargs in favor of gnu parallel...
And the results. As you can see Natalia Baganova
takes the 2nd place though counting commits only gives here only the 5th. So that's real contribution opposed to just counting commits:
$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
1: 4996 Twist
2: 4121 Natalia Baganova
3: 2771 zayceva
4: 2405 Anton
5: 2361 chuzhaikinadv
6: 1113 George
7: 1081 Nastie Deminka
8: 750 Kalchenko Ilia
9: 712 alnidok
10: 516 Lakhaev Andrey
11: 383 Andrey Smirnov
12: 365 dilya
13: 325 a.dadaev
14: 301 Andrey Klimenko
15: 291 sergey.ivanov
16: 134 Задорожний Александр
17: 124 George Barlukov
18: 116 Sergey
19: 59 egoprimary
20: 43 a.kaledin
21: 42 Mustafaeva Dilya
22: 41 Svetoslav Onosov
23: 38 Alexander Nenashev
24: 26 andrewsmirnov
25: 25 Nastie
26: 20 silentmantra
27: 18 Zharova Yaroslava
28: 16 Dilya Mustafaeva
29: 15 Andrey Kuznetsov
30: 9 over_ilaj
31: 9 Dilya
32: 6 Sharipova
33: 6 LaFut
34: 3 Александра Храпкова
35: 3 Irina Demchenko
36: 3 Andrey Lakhaev
37: 2 Vadim
38: 2 Kozyreva
39: 2 DESKTOP-0N4O88L\dev
40: 2 Alexander Zadoroznyi
real 0m1.334s
user 0m10.418s
sys 0m4.043s
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论