Bash 无法逐个处理 Git 输出,项目粘在一起。

huangapple go评论73阅读模式
英文:

Bash cannot process git output one-by-one, items stuck together

问题

我试图运行以下脚本:

Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')

for file in "${Changed_Files[@]}"; do
subdir=$(echo "$file" | cut -d / -f 2)
if [[ "$subdir" == "bar1" ]]; then
func1 "$file"
else
echo ("Error")
exit(1)
fi
done


示例 `Changed_Files`:

foo/bar1/file1.json
foo/bar1/file2.json


**预期结果:**

`func1` 会在两个文件上运行。

**实际结果:**

脚本退出,因为 `subdir` 评估为 `bar1 bar1`。而不是逐个处理文件,变量 `file` 将始终包含来自 `git diff` 的整个结果列表。
我还尝试了

for i in "${!Changed_Files[@]}"; do
file="${Changed_Files[$i]}"
...


但这会产生相同的结果。如何逐个处理从 `git diff` 返回的项目,而不是将整个输出粘在一起?
英文:

I'm trying to run a script like below:

Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/') 
  
for file in "${Changed_Files[@]}"; do 
  subdir=$(echo "$file" | cut -d / -f 2)
  if [[ "$subdir" == "bar1" ]]; then
    func1 "$file"
  else 
    echo ("Error")
    exit(1)
  fi
done

Sample Changed_Files:

foo/bar1/file1.json
foo/bar1/file2.json

Expected Result:

func1 gets run on both files.

Actual Result:

The script exits because subdir evaluates to bar1 bar1. Instead of processing the files one-by-one, the variable file will always contain the entire list of results from git diff.
I also tried

for i in "${!Changed_Files[@]}"; do
  file="${Changed_Files[$i]}"
  ...

But this gives the same result. How can I process items returned from git diff one-by-one instead of the entire output stuck together?

答案1

得分: 1

引发描述症状的问题是,您将输出存储在字符串中,而不是数组中。

在bash 4.x及更高版本中修复的正确方法是:

readarray -t Changed_Files < <(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')

但不要完全按照给定的方式使用:\s是PCRE扩展,而符合POSIX标准的grep实现不一定支持它。请改用[[:space:]]代替。

如果您需要支持bash 3.x(例如,与MacOS一起提供的版本),您可以考虑完全重写为本机shell,不依赖于除git diff之外的任何外部工具:

# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob

parentDirTgt=foo # 原始代码中正在查找的内容

errors_seen=0
while IFS=/ read -r parentDir subDir rest <&3; do # fd 3 has "git diff" output
  parentDir=${parentDir##+([[:space:]])}          # 去掉前导空格
  [[ $parentDir = $parentDirTgt ]] || continue    # 跳过不在父目录下的文件
  [[ $rest ]] || continue       # 跳过不在两个目录层下的文件
  file=$parentDir/$subDir/$rest # 重建完整的文件名
  case $subDir in               # 根据子目录层进行分支
    bar1) func1 "$file";;       # 对于目录foo/bar1,调用函数func1
    *)    echo "Ignoring change in unrecognized subdirectory $subDir" >&2
          errors_seen=1;;
  esac
done 3< <(git diff --diff-filter=dr --name-only origin/main..."$My_branch")

exit "$errors_seen"

这可能需要一些解释。

  • while read循环的做法在BashFAQ #1中有详细介绍。我们不是将每行读入一个变量中,而是在/上拆分,并读入三个变量:parentDirsubDirrest(包含第二个/后的所有内容)。
  • 使用&lt;( )而不是|git diff连接到循环的原因在BashFAQ #24中有描述。
  • 使用+([[:space:]])是一个extglob,它匹配尽可能多的空格,用于${var##pattern} 参数扩展的上下文,以从变量内容的开头删除最长可能匹配的模式。
  • 文件描述符3用于使func1中尝试从stdin读取的任何内容实际上从原始stdin而不是git diff输出中读取。

在https://ideone.com/ZA1Cgn的在线沙盒中查看此代码运行情况。

英文:

The problem that would cause the symptom described is that you're storing output in a string, not an array.

The right way to fix that in bash 4.x and later is:

readarray -t Changed_Files &lt; &lt;(git diff --diff-filter=dr --name-only origin/main...&quot;$My_branch&quot; | grep &#39;^\s*foo/&#39;)

But don't use that exactly as given: \s is a PCRE extension, and POSIX-compliant grep implementations aren't guaranteed to support it. Use [[:space:]] instead.


If you need to support bash 3.x (like the version included with MacOS as well), you might consider a rewrite entirely in native shell, not relying on any external tools other than git diff itself:

# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob

parentDirTgt=foo # the thing you were grepping for in original code

errors_seen=0
while IFS=/ read -r parentDir subDir rest &lt;&amp;3; do # fd 3 has &quot;git diff&quot; output
  parentDir=${parentDir##+([[:space:]])}          # strip leading whitespace
  [[ $parentDir = $parentDirTgt ]] || continue    # skip files not under parent
  [[ $rest ]] || continue       # skip files not under two directory layers
  file=$parentDir/$subDir/$rest # reconstruct full file name
  case $subDir in               # branch on subdirectory layer
    bar1) func1 &quot;$file&quot;;;       # for directory foo/bar1, call function func1
    *)    echo &quot;Ignoring change in unrecognized subdirectory $subDir&quot; &gt;&amp;2
          errors_seen=1;;
  esac
done 3&lt; &lt;(git diff --diff-filter=dr --name-only origin/main...&quot;$My_branch&quot;)

exit &quot;$errors_seen&quot;

This probably calls for some explanation.

  • The while read looping practice is covered in BashFAQ #1. Instead of reading each line into just one variable, we split on /s and read into three variables: parentDir, subDir, and rest (containing everything after the second /).
  • The use of &lt;( ) instead of | to connect git diff to the loop is for reasons described in BashFAQ #24.
  • The use of +([[:space:]]) is an extglob that matches as many spaces as possible, used in the context of the ${var##pattern} parameter expansion to strip the longest possible matching pattern from the beginning of a variable's contents.
  • File descriptor 3 is used so that anything in func1 that tries to read from stdin will actually consumer original stdin instead of the git diff output.

See this running an an online sandbox at https://ideone.com/ZA1Cgn

huangapple
  • 本文由 发表于 2023年7月6日 19:45:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76628499.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定