你可以这样在bash脚本中让’xargs’对’find’命令找到的所有文件执行:

huangapple go评论61阅读模式
英文:

How can I make 'xargs' execute for all files found by 'find' command in bash script?

问题

I have *.test.* files inside my src folder and I wanna find authors who contributed the most. I came up with command below

find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c

but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find? Or do I have to somehow "collect" results of git command execution

I've googled that xargs has -L argument, but it didn't help. What am I doing wrong?

Expected result should be like output below

2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
英文:

I have *.test.* files inside my src folder and I wanna find authors who contributed the most. I came up with command below

find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c

but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find? Or do I have to somehow "collect" results of git command execution

I've googled that xargs has -L argument, but it didn't help. What am I doing wrong?

Expected result should be like output below

2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob

答案1

得分: 5

"-0" 是给 "xargs" 的参数,所以必须传递给 "xargs" 而不是 "git blame"。

"git blame" 逐个文件工作,因此要使用 "-n1" 让 "xargs" 逐个传递文件。

英文:
find ./src/ -name "*.test.*" -print0 |
    xargs -0 -n1 git --no-pager blame -p |
    grep "^author " | uniq -c

-0 is an argument for xargs so it must be passed to xargs not to git blame.

git blame works one file at a time so make xargs to pass one file using -n1.

答案2

得分: 2

做了一些修改:

  • 使用 xargs -n1 逐行执行
  • 我们不需要 --no-pager,因为git知道它被管道传输
  • 使用 grep -Po "(?<=^author ).*" 删除了 author
  • uniq 之前添加 sort,因为 uniq 只计算连续的行
  • uniq 之后添加 sort -nr 以倒序排序贡献者
  • 添加行号以查看作者的排名,使用 nl
  • 最后,对于400个文件,需要8秒,所以添加了 xargs -P0 以并行运行每个文件的git,只需1秒

更新
git blame -p 只计算作者的提交。在我看来,这不是真正的贡献。由于我们使用 git blame,我认为我们对行更改感兴趣。幸运的是,git blame--line-porcelain,正是我们需要计算作者的行更改。
好的,但在我的情况下,xargs 未能以一致的方式传递如此大的数据,每次执行结果都不同。哎呀,即使计算提交时也有问题!我相信第一个带有 xargs 的示例中的这行是错误的!

38: 1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1

我猜问题出在 xargs 的缓冲区,因此提供了损坏的行给 grep。问题在这里描述,解决方案乍一看似乎不够优雅:https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel

安装了 GNU Parallel 并运行了命令,它是一致且足够快的!
所以首先:
"这里反对xargs,支持gnu parallel..."

然后是结果。正如你可以看到的,尽管只计算提交,但 Natalia Baganova 排名第二。所以这是真正的贡献,与仅仅计算提交不同:

$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:    4996 Twist
     2:    4121 Natalia Baganova
     3:    2771 zayceva
     4:    2405 Anton
     5:    2361 chuzhaikinadv
     6:    1113 George
     7:    1081 Nastie Deminka
     8:     750 Kalchenko Ilia
     9:     712 alnidok
    10:     516 Lakhaev Andrey
    11:     383 Andrey Smirnov
    12:     365 dilya
    13:     325 a.dadaev
    14:     301 Andrey Klimenko
    15:     291 sergey.ivanov
    16:     134 Задорожний Александр
    17:     124 George Barlukov
    18:     116 Sergey
    19:      59 egoprimary
    20:      43 a.kaledin
    21:      42 Mustafaeva Dilya
    22:      41 Svetoslav Onosov
    23:      38 Alexander Nenashev
    24:      26 andrewsmirnov
    25:      25 Nastie
    26:      20 silentmantra
    27:      18 Zharova Yaroslava
    28:      16 Dilya Mustafaeva
    29:      15 Andrey Kuznetsov
    30:       9 over_ilaj
    31:       9 Dilya
    32:       6 Sharipova
    33:       6 LaFut
    34:       3 Александра Храпкова
    35:       3 Irina Demchenko
    36:       3 Andrey Lakhaev
    37:       2 Vadim
    38:       2 Kozyreva
    39:       2 DESKTOP-0N4O88L\dev
    40:       2 Alexander Zadoroznyi

real    0m1.334s
user    0m10.418s
sys     0m4.043s
英文:

Did some modifications:

  • xargs -n1 to execute line-by-line
  • we don't need --no-pager since git knows that it's piped
  • deleted author with grep -Po &quot;(?&lt;=^author ).*&quot;
  • add sort before uniq because uniq counts only consecutive lines.
  • add sort -nr after uniq to sort contributors in reversed order.
  • add line numbers to see an author's rank with nl.
  • and finally with 400 files it takes 8 seconds. so added xargs -P0 to run git for each file in parallel and it took 1 second.
$ time find ./src/ -name &quot;*.twig&quot; -print0 | xargs -0 -n1 -P0 git blame -p | grep -Po &quot;(?&lt;=^author ).*&quot; | sort | uniq -c | sort -nr | nl -s&#39;: &#39;
     1:     543 Twist
     2:     273 chuzhaikinadv
     3:     239 Anton
     4:     225 zayceva
     5:     204 Natalia Baganova
     6:     113 Nastie Deminka
     7:     103 Lakhaev Andrey
     8:      79 sergey.ivanov
     9:      72 alnidok
    10:      70 Kalchenko Ilia
    11:      41 Andrey Klimenko
    12:      38 a.dadaev
    13:      30 dilya
    14:      20 a.kaledin
    15:      17 George
    16:      16 Svetoslav Onosov
    17:      14 Andrey Smirnov
    18:      13 Mustafaeva Dilya
    19:      13 George Barlukov
    20:      10 andrewsmirnov
    21:       7 silentmantra
    22:       6 Sergey
    23:       6 Alexander Nenashev
    24:       5 Задорожний Александр
    25:       5 Dilya
    26:       4 over_ilaj
    27:       4 egoprimary
    28:       4 Dilya Mustafaeva
    29:       3 Zharova Yaroslava
    30:       3 Nastie
    31:       3 Irina Demchenko
    32:       2 Александра Храпкова
    33:       2 Vadim
    34:       2 Sharipova
    35:       2 LaFut
    36:       2 Kozyreva
    37:       2 Andrey Lakhaev
    38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
    39:       1 DESKTOP-0N4O88L\dev
    40:       1 Andrey Kuznetsov
    41:       1 Alexander Zadoroznyi

real    0m0.917s
user    0m10.946s
sys     0m3.047s

UPDATE
git blame -p counts only commits by an author. Imho it's not a real contribution. Since we are using git blame I suppose we are interested in LINE CHANGES. Luckily git blame has --line-porcelain what exactly we need to count line changes by an author.
Ok, but in my case xargs failed to deliver such big data in consistent manner and every execution the result was different. Oops, we have problem even with counting commits! I believe this line from the first example with xargs is wrong!

38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1

I guess the problem is xargs' buffering thus providing grep with broken lines. The problem is described here and the solution at first glance doesnt seem elegant: https://stackoverflow.com/questions/44569541/xargs-output-buffering-p-parallel

Installed GNU Parallel and launched the command, it's consistent and fast enough!
So first:

Here goes a vote against xargs in favor of gnu parallel...

And the results. As you can see Natalia Baganova takes the 2nd place though counting commits only gives here only the 5th. So that's real contribution opposed to just counting commits:

$ $ time find ./src/ -name &quot;*.twig&quot; -print0 | parallel -0 git blame --line-porcelain | grep -Po &quot;(?&lt;=^author ).*&quot; | sort | uniq -c | sort -nr | nl -s&#39;: &#39;
     1:    4996 Twist
     2:    4121 Natalia Baganova
     3:    2771 zayceva
     4:    2405 Anton
     5:    2361 chuzhaikinadv
     6:    1113 George
     7:    1081 Nastie Deminka
     8:     750 Kalchenko Ilia
     9:     712 alnidok
    10:     516 Lakhaev Andrey
    11:     383 Andrey Smirnov
    12:     365 dilya
    13:     325 a.dadaev
    14:     301 Andrey Klimenko
    15:     291 sergey.ivanov
    16:     134 Задорожний Александр
    17:     124 George Barlukov
    18:     116 Sergey
    19:      59 egoprimary
    20:      43 a.kaledin
    21:      42 Mustafaeva Dilya
    22:      41 Svetoslav Onosov
    23:      38 Alexander Nenashev
    24:      26 andrewsmirnov
    25:      25 Nastie
    26:      20 silentmantra
    27:      18 Zharova Yaroslava
    28:      16 Dilya Mustafaeva
    29:      15 Andrey Kuznetsov
    30:       9 over_ilaj
    31:       9 Dilya
    32:       6 Sharipova
    33:       6 LaFut
    34:       3 Александра Храпкова
    35:       3 Irina Demchenko
    36:       3 Andrey Lakhaev
    37:       2 Vadim
    38:       2 Kozyreva
    39:       2 DESKTOP-0N4O88L\dev
    40:       2 Alexander Zadoroznyi

real    0m1.334s
user    0m10.418s
sys     0m4.043s

huangapple
  • 本文由 发表于 2023年5月25日 18:05:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定