显示与管道grep条件匹配的文件。

huangapple go评论51阅读模式
英文:

Display file matching piped grep condition

问题

I am trying to find XML files containing a particular string. These files are however zipped as .gz. Essentially, I want to search through all of these gz files in the directory without extracting them. Additionally, I would like to get the specific filename which matches the search pattern and not the output itself.

我正在尝试查找包含特定字符串的XML文件。这些文件以.gz格式压缩。基本上,我想在目录中搜索所有这些gz文件,而不解压它们。另外,我想获取与搜索模式匹配的具体文件名,而不是输出本身。

I have managed to get the following command to get me the matching output itself from a piped grep command:

我已经成功得到了以下命令,可以从一个管道化的grep命令中获取匹配的输出本身:

gunzip -c *.xml.gz | grep 'idName="M"'

I would like to get the filenames however. I read somewhere that the -l flag for grep will return the matching filename, but in this case, it gives me a result saying (standard input). I assume this is because I need to be piping the filename from gunzip too, but how do I do that?

然而,我想获取文件名。我在某处看到,grep的-l标志将返回匹配的文件名,但在这种情况下,它给我返回结果说“(standard input)”(标准输入)。我猜这是因为我还需要从gunzip中传递文件名,但我应该如何做到这一点?

Edit: Also adding that I have somewhat partial success by doing

编辑:还要补充的是,通过执行以下操作,我在某种程度上获得了部分成功

gunzip -vc *.xml.gz | grep 'idName="M"'

but this gives me output like

但这会给我输出如下:

filename_X:    30% -- replaced with stdout
filename_Y:    50% -- replaced with stdout
filename_Z:    complete matching output

I would like to suppress the matching output too in this case, and not show all the non-matching filenames.

在这种情况下,我也想抑制匹配的输出,不显示所有不匹配的文件名。

英文:

I am trying to find XML files containing a particular string. These files are however zipped as .gz. Essentially, I want to search through all of these gz files in the directory without extracting them. Additionally, I would like to get the specific filename which matches the search pattern and not the output itself.

I have managed to get the following command to get me the matching output itself from a piped grep command:

gunzip -c *.xml.gz | grep 'idName="M"'

I would like to get the filenames however. I read somewhere that the -l flag for grep will return the matching filename, but in this case, it gives me a result saying (standard input). I assume this is because I need to be piping the filename from gunzip too, but how do I do that?

Edit: Also adding that I have somewhat partial success by doing

gunzip -vc *.xml.gz | grep 'idName="M"'

but this gives me output like

filename_X:    30% -- replaced with stdout
filename_Y:    50% -- replaced with stdout
filename_Z:    complete matching output

I would like to suppress the matching output too in this case, and not show all the non-matching filenames.

答案1

得分: 2

The zgrep工具族正好适用于此用例。

如果您需要对*.zip文件执行相同操作,请查找zipgrep

如果您要搜索的模式只是一个静态字符串,而不是正则表达式,您可以通过使用-F标志(又名传统的fgrep)来加快处理速度。如果文件很大,这可能会产生重大差异。


如果您需要这个功能,但找不到已存在的工具来提供此功能,实现大致如下:

regex=$1
shift
for file; do
    gzip -dc < "$file" |
    sed -n "/$regex/s|^|$file:|p"
done

... 还需要处理不同选项等各种复杂情况,并且需要注意,这个简单的sed脚本在许多边缘情况下存在健壮性问题(正则表达式不能包含斜杠,文件名不能包含字面的|或换行符)。

如果您使用GNU grep,可以尝试类似以下方式:

regex=$1
options=$(... 复杂逻辑来提取grep选项 ...)
shift
for file; do
    gzip -dc < "$file" |
    grep --label="$file" -H -e "$regex" $options
done

在您的特定情况下,可以将其简化为:

regex=$1
shift
for file; do
    gzip -dc < "$file" |
    grep -q "$regex" &&
    echo "$file"
done

没有任何GNU特定功能。

显然,您可以用您需要的内容替换gzip -dc,以从您想要处理的文件类型中提取信息。

英文:

The zgrep family of tools exist exactly for this use case.

zgrep -l &#39;idName=&quot;M&quot;&#39; *.xml.gz

If you need the same for *.zip files, look for zipgrep.

If the pattern you are searching for is just a static string, not a regular expression, you can speed up processing by using the -F flag (aka legacy fgrep).
This can make a substantial difference if the files are big.


If you need this for a file type for which you can't find an existing tool which provides this functionality, the implementation looks crudely something like

regex=$1
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    sed -n &quot;/$regex/s|^|$file:|p&quot;
done

... with various complications to handle different options, etc; and with the caveat that this simple sed script has robustness issues in a number of corner cases (the regex can't contain a slash, and the file name can't contain a literal | or a newline).

If you have GNU grep, try something like

regex=$1
options=$(... complex logic to extract grep options ...)
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    grep --label=&quot;$file&quot; -H -e &quot;$regex&quot; $options
done

In your particular case, this can be reduced to just

regex=$1
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    grep -q &quot;$regex&quot; &amp;&amp;
    echo &quot;$file&quot;
done

without any GNUisms.

Obviously, you'd replace gzip -dc with whatever you need to extract the information from the file type you want to process.

huangapple
  • 本文由 发表于 2023年5月11日 17:38:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226205.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定