英文:
Bash cannot process git output one-by-one, items stuck together
问题
我试图运行以下脚本:
Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
for file in "${Changed_Files[@]}"; do
subdir=$(echo "$file" | cut -d / -f 2)
if [[ "$subdir" == "bar1" ]]; then
func1 "$file"
else
echo ("Error")
exit(1)
fi
done
示例 `Changed_Files`:
foo/bar1/file1.json
foo/bar1/file2.json
**预期结果:**
`func1` 会在两个文件上运行。
**实际结果:**
脚本退出,因为 `subdir` 评估为 `bar1 bar1`。而不是逐个处理文件,变量 `file` 将始终包含来自 `git diff` 的整个结果列表。
我还尝试了
for i in "${!Changed_Files[@]}"; do
file="${Changed_Files[$i]}"
...
但这会产生相同的结果。如何逐个处理从 `git diff` 返回的项目,而不是将整个输出粘在一起?
英文:
I'm trying to run a script like below:
Changed_Files=$(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
for file in "${Changed_Files[@]}"; do
subdir=$(echo "$file" | cut -d / -f 2)
if [[ "$subdir" == "bar1" ]]; then
func1 "$file"
else
echo ("Error")
exit(1)
fi
done
Sample Changed_Files
:
foo/bar1/file1.json
foo/bar1/file2.json
Expected Result:
func1
gets run on both files.
Actual Result:
The script exits because subdir
evaluates to bar1 bar1
. Instead of processing the files one-by-one, the variable file
will always contain the entire list of results from git diff
.
I also tried
for i in "${!Changed_Files[@]}"; do
file="${Changed_Files[$i]}"
...
But this gives the same result. How can I process items returned from git diff
one-by-one instead of the entire output stuck together?
答案1
得分: 1
引发描述症状的问题是,您将输出存储在字符串中,而不是数组中。
在bash 4.x及更高版本中修复的正确方法是:
readarray -t Changed_Files < <(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
但不要完全按照给定的方式使用:\s
是PCRE扩展,而符合POSIX标准的grep实现不一定支持它。请改用[[:space:]]
代替。
如果您需要支持bash 3.x(例如,与MacOS一起提供的版本),您可以考虑完全重写为本机shell,不依赖于除git diff
之外的任何外部工具:
# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob
parentDirTgt=foo # 原始代码中正在查找的内容
errors_seen=0
while IFS=/ read -r parentDir subDir rest <&3; do # fd 3 has "git diff" output
parentDir=${parentDir##+([[:space:]])} # 去掉前导空格
[[ $parentDir = $parentDirTgt ]] || continue # 跳过不在父目录下的文件
[[ $rest ]] || continue # 跳过不在两个目录层下的文件
file=$parentDir/$subDir/$rest # 重建完整的文件名
case $subDir in # 根据子目录层进行分支
bar1) func1 "$file";; # 对于目录foo/bar1,调用函数func1
*) echo "Ignoring change in unrecognized subdirectory $subDir" >&2
errors_seen=1;;
esac
done 3< <(git diff --diff-filter=dr --name-only origin/main..."$My_branch")
exit "$errors_seen"
这可能需要一些解释。
while read
循环的做法在BashFAQ #1中有详细介绍。我们不是将每行读入一个变量中,而是在/
上拆分,并读入三个变量:parentDir
、subDir
和rest
(包含第二个/
后的所有内容)。- 使用
<( )
而不是|
将git diff
连接到循环的原因在BashFAQ #24中有描述。 - 使用
+([[:space:]])
是一个extglob,它匹配尽可能多的空格,用于${var##pattern}
参数扩展的上下文,以从变量内容的开头删除最长可能匹配的模式。 - 文件描述符3用于使
func1
中尝试从stdin读取的任何内容实际上从原始stdin而不是git diff
输出中读取。
在https://ideone.com/ZA1Cgn的在线沙盒中查看此代码运行情况。
英文:
The problem that would cause the symptom described is that you're storing output in a string, not an array.
The right way to fix that in bash 4.x and later is:
readarray -t Changed_Files < <(git diff --diff-filter=dr --name-only origin/main..."$My_branch" | grep '^\s*foo/')
But don't use that exactly as given: \s
is a PCRE extension, and POSIX-compliant grep implementations aren't guaranteed to support it. Use [[:space:]]
instead.
If you need to support bash 3.x (like the version included with MacOS as well), you might consider a rewrite entirely in native shell, not relying on any external tools other than git diff
itself:
# feature needed for +( ) to work; in older bash releases, even needed in PEs
shopt -s extglob
parentDirTgt=foo # the thing you were grepping for in original code
errors_seen=0
while IFS=/ read -r parentDir subDir rest <&3; do # fd 3 has "git diff" output
parentDir=${parentDir##+([[:space:]])} # strip leading whitespace
[[ $parentDir = $parentDirTgt ]] || continue # skip files not under parent
[[ $rest ]] || continue # skip files not under two directory layers
file=$parentDir/$subDir/$rest # reconstruct full file name
case $subDir in # branch on subdirectory layer
bar1) func1 "$file";; # for directory foo/bar1, call function func1
*) echo "Ignoring change in unrecognized subdirectory $subDir" >&2
errors_seen=1;;
esac
done 3< <(git diff --diff-filter=dr --name-only origin/main..."$My_branch")
exit "$errors_seen"
This probably calls for some explanation.
- The
while read
looping practice is covered in BashFAQ #1. Instead of reading each line into just one variable, we split on/
s and read into three variables:parentDir
,subDir
, andrest
(containing everything after the second/
). - The use of
<( )
instead of|
to connectgit diff
to the loop is for reasons described in BashFAQ #24. - The use of
+([[:space:]])
is an extglob that matches as many spaces as possible, used in the context of the${var##pattern}
parameter expansion to strip the longest possible matching pattern from the beginning of a variable's contents. - File descriptor 3 is used so that anything in
func1
that tries to read from stdin will actually consumer original stdin instead of thegit diff
output.
See this running an an online sandbox at https://ideone.com/ZA1Cgn
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论