英文:
How can I make 'xargs' execute for all files found by 'find' command in bash script?
问题
I have *.test.* files inside my src folder and I wanna find authors who contributed the most. I came up with command below
find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c
but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find? Or do I have to somehow "collect" results of git command execution
I've googled that xargs has -L argument, but it didn't help. What am I doing wrong?
Expected result should be like output below
2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
英文:
I have *.test.* files inside my src folder and I wanna find authors who contributed the most. I came up with command below
find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c
but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find? Or do I have to somehow "collect" results of git command execution
I've googled that xargs has -L argument, but it didn't help. What am I doing wrong?
Expected result should be like output below
2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
答案1
得分: 5
"-0" 是给 "xargs" 的参数,所以必须传递给 "xargs" 而不是 "git blame"。
"git blame" 逐个文件工作,因此要使用 "-n1" 让 "xargs" 逐个传递文件。
英文:
find ./src/ -name "*.test.*" -print0 |
    xargs -0 -n1 git --no-pager blame -p |
    grep "^author " | uniq -c
-0 is an argument for xargs so it must be passed to xargs not to git blame.
git blame works one file at a time so make xargs to pass one file using -n1.
答案2
得分: 2
做了一些修改:
- 使用 
xargs -n1逐行执行 - 我们不需要 
--no-pager,因为git知道它被管道传输 - 使用 
grep -Po "(?<=^author ).*"删除了author - 在 
uniq之前添加sort,因为uniq只计算连续的行 - 在 
uniq之后添加sort -nr以倒序排序贡献者 - 添加行号以查看作者的排名,使用 
nl - 最后,对于400个文件,需要8秒,所以添加了 
xargs -P0以并行运行每个文件的git,只需1秒 
更新
git blame -p 只计算作者的提交。在我看来,这不是真正的贡献。由于我们使用 git blame,我认为我们对行更改感兴趣。幸运的是,git blame 有 --line-porcelain,正是我们需要计算作者的行更改。
好的,但在我的情况下,xargs 未能以一致的方式传递如此大的数据,每次执行结果都不同。哎呀,即使计算提交时也有问题!我相信第一个带有 xargs 的示例中的这行是错误的!
38: 1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
我猜问题出在 xargs 的缓冲区,因此提供了损坏的行给 grep。问题在这里描述,解决方案乍一看似乎不够优雅:https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel
安装了 GNU Parallel 并运行了命令,它是一致且足够快的!
所以首先:
"这里反对xargs,支持gnu parallel..."
然后是结果。正如你可以看到的,尽管只计算提交,但 Natalia Baganova 排名第二。所以这是真正的贡献,与仅仅计算提交不同:
$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:    4996 Twist
     2:    4121 Natalia Baganova
     3:    2771 zayceva
     4:    2405 Anton
     5:    2361 chuzhaikinadv
     6:    1113 George
     7:    1081 Nastie Deminka
     8:     750 Kalchenko Ilia
     9:     712 alnidok
    10:     516 Lakhaev Andrey
    11:     383 Andrey Smirnov
    12:     365 dilya
    13:     325 a.dadaev
    14:     301 Andrey Klimenko
    15:     291 sergey.ivanov
    16:     134 Задорожний Александр
    17:     124 George Barlukov
    18:     116 Sergey
    19:      59 egoprimary
    20:      43 a.kaledin
    21:      42 Mustafaeva Dilya
    22:      41 Svetoslav Onosov
    23:      38 Alexander Nenashev
    24:      26 andrewsmirnov
    25:      25 Nastie
    26:      20 silentmantra
    27:      18 Zharova Yaroslava
    28:      16 Dilya Mustafaeva
    29:      15 Andrey Kuznetsov
    30:       9 over_ilaj
    31:       9 Dilya
    32:       6 Sharipova
    33:       6 LaFut
    34:       3 Александра Храпкова
    35:       3 Irina Demchenko
    36:       3 Andrey Lakhaev
    37:       2 Vadim
    38:       2 Kozyreva
    39:       2 DESKTOP-0N4O88L\dev
    40:       2 Alexander Zadoroznyi
real    0m1.334s
user    0m10.418s
sys     0m4.043s
英文:
Did some modifications:
xargs -n1to execute line-by-line- we don't need 
--no-pagersince git knows that it's piped - deleted 
authorwithgrep -Po "(?<=^author ).*" - add 
sortbeforeuniqbecauseuniqcounts only consecutive lines. - add 
sort -nrafteruniqto sort contributors in reversed order. - add line numbers to see an author's rank with 
nl. - and finally with 400 files it takes 8 seconds. so added 
xargs -P0to run git for each file in parallel and it took 1 second. 
$ time find ./src/ -name "*.twig" -print0 | xargs -0 -n1 -P0 git blame -p | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:     543 Twist
     2:     273 chuzhaikinadv
     3:     239 Anton
     4:     225 zayceva
     5:     204 Natalia Baganova
     6:     113 Nastie Deminka
     7:     103 Lakhaev Andrey
     8:      79 sergey.ivanov
     9:      72 alnidok
    10:      70 Kalchenko Ilia
    11:      41 Andrey Klimenko
    12:      38 a.dadaev
    13:      30 dilya
    14:      20 a.kaledin
    15:      17 George
    16:      16 Svetoslav Onosov
    17:      14 Andrey Smirnov
    18:      13 Mustafaeva Dilya
    19:      13 George Barlukov
    20:      10 andrewsmirnov
    21:       7 silentmantra
    22:       6 Sergey
    23:       6 Alexander Nenashev
    24:       5 Задорожний Александр
    25:       5 Dilya
    26:       4 over_ilaj
    27:       4 egoprimary
    28:       4 Dilya Mustafaeva
    29:       3 Zharova Yaroslava
    30:       3 Nastie
    31:       3 Irina Demchenko
    32:       2 Александра Храпкова
    33:       2 Vadim
    34:       2 Sharipova
    35:       2 LaFut
    36:       2 Kozyreva
    37:       2 Andrey Lakhaev
    38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
    39:       1 DESKTOP-0N4O88L\dev
    40:       1 Andrey Kuznetsov
    41:       1 Alexander Zadoroznyi
real    0m0.917s
user    0m10.946s
sys     0m3.047s
UPDATE
git blame -p counts only commits by an author. Imho it's not a real contribution. Since we are using git blame I suppose we are interested in LINE CHANGES. Luckily git blame has --line-porcelain what exactly we need to count line changes by an author.
Ok, but in my case xargs failed to deliver such big data in consistent manner and every execution the result was different. Oops, we have problem even with counting commits! I believe this line from the first example with xargs is wrong!
38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
I guess the problem is xargs' buffering thus providing grep with broken lines. The problem is described here and the solution at first glance doesnt seem elegant: https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel
Installed GNU Parallel and launched the command, it's consistent and fast enough!
So first:
Here goes a vote against xargs in favor of gnu parallel...
And the results. As you can see Natalia Baganova takes the 2nd place though counting commits only gives here only the 5th. So that's real contribution opposed to just counting commits:
$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:    4996 Twist
     2:    4121 Natalia Baganova
     3:    2771 zayceva
     4:    2405 Anton
     5:    2361 chuzhaikinadv
     6:    1113 George
     7:    1081 Nastie Deminka
     8:     750 Kalchenko Ilia
     9:     712 alnidok
    10:     516 Lakhaev Andrey
    11:     383 Andrey Smirnov
    12:     365 dilya
    13:     325 a.dadaev
    14:     301 Andrey Klimenko
    15:     291 sergey.ivanov
    16:     134 Задорожний Александр
    17:     124 George Barlukov
    18:     116 Sergey
    19:      59 egoprimary
    20:      43 a.kaledin
    21:      42 Mustafaeva Dilya
    22:      41 Svetoslav Onosov
    23:      38 Alexander Nenashev
    24:      26 andrewsmirnov
    25:      25 Nastie
    26:      20 silentmantra
    27:      18 Zharova Yaroslava
    28:      16 Dilya Mustafaeva
    29:      15 Andrey Kuznetsov
    30:       9 over_ilaj
    31:       9 Dilya
    32:       6 Sharipova
    33:       6 LaFut
    34:       3 Александра Храпкова
    35:       3 Irina Demchenko
    36:       3 Andrey Lakhaev
    37:       2 Vadim
    38:       2 Kozyreva
    39:       2 DESKTOP-0N4O88L\dev
    40:       2 Alexander Zadoroznyi
real    0m1.334s
user    0m10.418s
sys     0m4.043s
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论