英文:
Display file matching piped grep condition
问题
I am trying to find XML files containing a particular string. These files are however zipped as .gz. Essentially, I want to search through all of these gz files in the directory without extracting them. Additionally, I would like to get the specific filename which matches the search pattern and not the output itself.
我正在尝试查找包含特定字符串的XML文件。这些文件以.gz格式压缩。基本上,我想在目录中搜索所有这些gz文件,而不解压它们。另外,我想获取与搜索模式匹配的具体文件名,而不是输出本身。
I have managed to get the following command to get me the matching output itself from a piped grep command:
我已经成功得到了以下命令,可以从一个管道化的grep命令中获取匹配的输出本身:
gunzip -c *.xml.gz | grep 'idName="M"'
I would like to get the filenames however. I read somewhere that the -l
flag for grep will return the matching filename, but in this case, it gives me a result saying (standard input)
. I assume this is because I need to be piping the filename from gunzip too, but how do I do that?
然而,我想获取文件名。我在某处看到,grep的-l
标志将返回匹配的文件名,但在这种情况下,它给我返回结果说“(standard input)”(标准输入)。我猜这是因为我还需要从gunzip中传递文件名,但我应该如何做到这一点?
Edit: Also adding that I have somewhat partial success by doing
编辑:还要补充的是,通过执行以下操作,我在某种程度上获得了部分成功
gunzip -vc *.xml.gz | grep 'idName="M"'
but this gives me output like
但这会给我输出如下:
filename_X: 30% -- replaced with stdout
filename_Y: 50% -- replaced with stdout
filename_Z: complete matching output
I would like to suppress the matching output too in this case, and not show all the non-matching filenames.
在这种情况下,我也想抑制匹配的输出,不显示所有不匹配的文件名。
英文:
I am trying to find XML files containing a particular string. These files are however zipped as .gz. Essentially, I want to search through all of these gz files in the directory without extracting them. Additionally, I would like to get the specific filename which matches the search pattern and not the output itself.
I have managed to get the following command to get me the matching output itself from a piped grep command:
gunzip -c *.xml.gz | grep 'idName="M"'
I would like to get the filenames however. I read somewhere that the -l
flag for grep will return the matching filename, but in this case, it gives me a result saying (standard input)
. I assume this is because I need to be piping the filename from gunzip too, but how do I do that?
Edit: Also adding that I have somewhat partial success by doing
gunzip -vc *.xml.gz | grep 'idName="M"'
but this gives me output like
filename_X: 30% -- replaced with stdout
filename_Y: 50% -- replaced with stdout
filename_Z: complete matching output
I would like to suppress the matching output too in this case, and not show all the non-matching filenames.
答案1
得分: 2
The zgrep
工具族正好适用于此用例。
如果您需要对*.zip
文件执行相同操作,请查找zipgrep
。
如果您要搜索的模式只是一个静态字符串,而不是正则表达式,您可以通过使用-F
标志(又名传统的fgrep
)来加快处理速度。如果文件很大,这可能会产生重大差异。
如果您需要这个功能,但找不到已存在的工具来提供此功能,实现大致如下:
regex=$1
shift
for file; do
gzip -dc < "$file" |
sed -n "/$regex/s|^|$file:|p"
done
... 还需要处理不同选项等各种复杂情况,并且需要注意,这个简单的sed
脚本在许多边缘情况下存在健壮性问题(正则表达式不能包含斜杠,文件名不能包含字面的|
或换行符)。
如果您使用GNU grep
,可以尝试类似以下方式:
regex=$1
options=$(... 复杂逻辑来提取grep选项 ...)
shift
for file; do
gzip -dc < "$file" |
grep --label="$file" -H -e "$regex" $options
done
在您的特定情况下,可以将其简化为:
regex=$1
shift
for file; do
gzip -dc < "$file" |
grep -q "$regex" &&
echo "$file"
done
没有任何GNU特定功能。
显然,您可以用您需要的内容替换gzip -dc
,以从您想要处理的文件类型中提取信息。
英文:
The zgrep
family of tools exist exactly for this use case.
zgrep -l 'idName="M"' *.xml.gz
If you need the same for *.zip
files, look for zipgrep
.
If the pattern you are searching for is just a static string, not a regular expression, you can speed up processing by using the -F
flag (aka legacy fgrep
).
This can make a substantial difference if the files are big.
If you need this for a file type for which you can't find an existing tool which provides this functionality, the implementation looks crudely something like
regex=$1
shift
for file; do
gzip -dc <"$file" |
sed -n "/$regex/s|^|$file:|p"
done
... with various complications to handle different options, etc; and with the caveat that this simple sed
script has robustness issues in a number of corner cases (the regex can't contain a slash, and the file name can't contain a literal |
or a newline).
If you have GNU grep
, try something like
regex=$1
options=$(... complex logic to extract grep options ...)
shift
for file; do
gzip -dc <"$file" |
grep --label="$file" -H -e "$regex" $options
done
In your particular case, this can be reduced to just
regex=$1
shift
for file; do
gzip -dc <"$file" |
grep -q "$regex" &&
echo "$file"
done
without any GNUisms.
Obviously, you'd replace gzip -dc
with whatever you need to extract the information from the file type you want to process.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论