英文:
Bash cannot process git output one-by-one, items stuck together
问题
我试图运行以下脚本:
Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
for file in "${Changed_Files[@]}"; do
subdir=$(echo "$file" | cut -d / -f 2)
if [[ "$subdir" == "bar1" ]]; then
func1 "$file"
else
echo ("Error")
exit(1)
fi
done
示例 `Changed_Files`:
foo/bar1/file1.json
foo/bar1/file2.json
**预期结果:**
`func1` 会在两个文件上运行。
**实际结果:**
脚本退出,因为 `subdir` 评估为 `bar1 bar1`。而不是逐个处理文件,变量 `file` 将始终包含来自 `git diff` 的整个结果列表。
我还尝试了
for i in "${!Changed_Files[@]}"; do
file="${Changed_Files[$i]}"
...
但这会产生相同的结果。如何逐个处理从 `git diff` 返回的项目,而不是将整个输出粘在一起?
英文:
I'm trying to run a script like below:
Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
for file in "${Changed_Files[@]}"; do
subdir=$(echo "$file" | cut -d / -f 2)
if [[ "$subdir" == "bar1" ]]; then
func1 "$file"
else
echo ("Error")
exit(1)
fi
done
Sample Changed_Files:
foo/bar1/file1.json
foo/bar1/file2.json
Expected Result:
func1 gets run on both files.
Actual Result:
The script exits because subdir evaluates to bar1 bar1. Instead of processing the files one-by-one, the variable file will always contain the entire list of results from git diff.
I also tried
for i in "${!Changed_Files[@]}"; do
file="${Changed_Files[$i]}"
...
But this gives the same result. How can I process items returned from git diff one-by-one instead of the entire output stuck together?
答案1
得分: 1
引发描述症状的问题是,您将输出存储在字符串中,而不是数组中。
在bash 4.x及更高版本中修复的正确方法是:
readarray -t Changed_Files < <(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
但不要完全按照给定的方式使用:\s是PCRE扩展,而符合POSIX标准的grep实现不一定支持它。请改用[[:space:]]代替。
如果您需要支持bash 3.x(例如,与MacOS一起提供的版本),您可以考虑完全重写为本机shell,不依赖于除git diff之外的任何外部工具:
# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob
parentDirTgt=foo # 原始代码中正在查找的内容
errors_seen=0
while IFS=/ read -r parentDir subDir rest <&3; do # fd 3 has "git diff" output
parentDir=${parentDir##+([[:space:]])} # 去掉前导空格
[[ $parentDir = $parentDirTgt ]] || continue # 跳过不在父目录下的文件
[[ $rest ]] || continue # 跳过不在两个目录层下的文件
file=$parentDir/$subDir/$rest # 重建完整的文件名
case $subDir in # 根据子目录层进行分支
bar1) func1 "$file";; # 对于目录foo/bar1,调用函数func1
*) echo "Ignoring change in unrecognized subdirectory $subDir" >&2
errors_seen=1;;
esac
done 3< <(git diff --diff-filter=dr --name-only origin/main..."$My_branch")
exit "$errors_seen"
这可能需要一些解释。
while read循环的做法在BashFAQ #1中有详细介绍。我们不是将每行读入一个变量中,而是在/上拆分,并读入三个变量:parentDir、subDir和rest(包含第二个/后的所有内容)。- 使用
<( )而不是|将git diff连接到循环的原因在BashFAQ #24中有描述。 - 使用
+([[:space:]])是一个extglob,它匹配尽可能多的空格,用于${var##pattern}参数扩展的上下文,以从变量内容的开头删除最长可能匹配的模式。 - 文件描述符3用于使
func1中尝试从stdin读取的任何内容实际上从原始stdin而不是git diff输出中读取。
在https://ideone.com/ZA1Cgn的在线沙盒中查看此代码运行情况。
英文:
The problem that would cause the symptom described is that you're storing output in a string, not an array.
The right way to fix that in bash 4.x and later is:
readarray -t Changed_Files < <(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
But don't use that exactly as given: \s is a PCRE extension, and POSIX-compliant grep implementations aren't guaranteed to support it. Use [[:space:]] instead.
If you need to support bash 3.x (like the version included with MacOS as well), you might consider a rewrite entirely in native shell, not relying on any external tools other than git diff itself:
# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob
parentDirTgt=foo # the thing you were grepping for in original code
errors_seen=0
while IFS=/ read -r parentDir subDir rest <&3; do # fd 3 has "git diff" output
parentDir=${parentDir##+([[:space:]])} # strip leading whitespace
[[ $parentDir = $parentDirTgt ]] || continue # skip files not under parent
[[ $rest ]] || continue # skip files not under two directory layers
file=$parentDir/$subDir/$rest # reconstruct full file name
case $subDir in # branch on subdirectory layer
bar1) func1 "$file";; # for directory foo/bar1, call function func1
*) echo "Ignoring change in unrecognized subdirectory $subDir" >&2
errors_seen=1;;
esac
done 3< <(git diff --diff-filter=dr --name-only origin/main..."$My_branch")
exit "$errors_seen"
This probably calls for some explanation.
- The
while readlooping practice is covered in BashFAQ #1. Instead of reading each line into just one variable, we split on/s and read into three variables:parentDir,subDir, andrest(containing everything after the second/). - The use of
<( )instead of|to connectgit diffto the loop is for reasons described in BashFAQ #24. - The use of
+([[:space:]])is an extglob that matches as many spaces as possible, used in the context of the${var##pattern}parameter expansion to strip the longest possible matching pattern from the beginning of a variable's contents. - File descriptor 3 is used so that anything in
func1that tries to read from stdin will actually consumer original stdin instead of thegit diffoutput.
See this running an an online sandbox at https://ideone.com/ZA1Cgn
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论