2023年5月11日 17:38:46go评论92阅读模式

英文:

Display file matching piped grep condition

问题

I am trying to find XML files containing a particular string. These files are however zipped as .gz. Essentially, I want to search through all of these gz files in the directory without extracting them. Additionally, I would like to get the specific filename which matches the search pattern and not the output itself.

我正在尝试查找包含特定字符串的XML文件。这些文件以.gz格式压缩。基本上，我想在目录中搜索所有这些gz文件，而不解压它们。另外，我想获取与搜索模式匹配的具体文件名，而不是输出本身。

I have managed to get the following command to get me the matching output itself from a piped grep command:

我已经成功得到了以下命令，可以从一个管道化的grep命令中获取匹配的输出本身：

gunzip -c *.xml.gz | grep &#39;idName=&quot;M&quot;&#39;

I would like to get the filenames however. I read somewhere that the -l flag for grep will return the matching filename, but in this case, it gives me a result saying (standard input). I assume this is because I need to be piping the filename from gunzip too, but how do I do that?

然而，我想获取文件名。我在某处看到，grep的-l标志将返回匹配的文件名，但在这种情况下，它给我返回结果说“(standard input)”（标准输入）。我猜这是因为我还需要从gunzip中传递文件名，但我应该如何做到这一点？

Edit: Also adding that I have somewhat partial success by doing

编辑：还要补充的是，通过执行以下操作，我在某种程度上获得了部分成功

gunzip -vc *.xml.gz | grep &#39;idName=&quot;M&quot;&#39;

but this gives me output like

但这会给我输出如下：

filename_X:    30% -- replaced with stdout
filename_Y:    50% -- replaced with stdout
filename_Z:    complete matching output

I would like to suppress the matching output too in this case, and not show all the non-matching filenames.

在这种情况下，我也想抑制匹配的输出，不显示所有不匹配的文件名。

英文:

I have managed to get the following command to get me the matching output itself from a piped grep command:

gunzip -c *.xml.gz | grep &#39;idName=&quot;M&quot;&#39;

Edit: Also adding that I have somewhat partial success by doing

gunzip -vc *.xml.gz | grep &#39;idName=&quot;M&quot;&#39;

but this gives me output like

filename_X:    30% -- replaced with stdout
filename_Y:    50% -- replaced with stdout
filename_Z:    complete matching output

I would like to suppress the matching output too in this case, and not show all the non-matching filenames.

答案1

得分: 2

The zgrep工具族正好适用于此用例。

如果您需要对*.zip文件执行相同操作，请查找zipgrep。

如果您要搜索的模式只是一个静态字符串，而不是正则表达式，您可以通过使用-F标志（又名传统的fgrep）来加快处理速度。如果文件很大，这可能会产生重大差异。

如果您需要这个功能，但找不到已存在的工具来提供此功能，实现大致如下：

regex=$1
shift
for file; do
    gzip -dc < "$file" |
    sed -n "/$regex/s|^|$file:|p"
done

... 还需要处理不同选项等各种复杂情况，并且需要注意，这个简单的sed脚本在许多边缘情况下存在健壮性问题（正则表达式不能包含斜杠，文件名不能包含字面的|或换行符）。

如果您使用GNU grep，可以尝试类似以下方式：

regex=$1
options=$(... 复杂逻辑来提取grep选项 ...)
shift
for file; do
    gzip -dc < "$file" |
    grep --label="$file" -H -e "$regex" $options
done

在您的特定情况下，可以将其简化为：

regex=$1
shift
for file; do
    gzip -dc < "$file" |
    grep -q "$regex" &&
    echo "$file"
done

没有任何GNU特定功能。

显然，您可以用您需要的内容替换gzip -dc，以从您想要处理的文件类型中提取信息。

英文:

The zgrep family of tools exist exactly for this use case.

zgrep -l &#39;idName=&quot;M&quot;&#39; *.xml.gz

If you need the same for *.zip files, look for zipgrep.

If the pattern you are searching for is just a static string, not a regular expression, you can speed up processing by using the -F flag (aka legacy fgrep).
This can make a substantial difference if the files are big.

If you need this for a file type for which you can't find an existing tool which provides this functionality, the implementation looks crudely something like

regex=$1
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    sed -n &quot;/$regex/s|^|$file:|p&quot;
done

... with various complications to handle different options, etc; and with the caveat that this simple sed script has robustness issues in a number of corner cases (the regex can't contain a slash, and the file name can't contain a literal | or a newline).

If you have GNU grep, try something like

regex=$1
options=$(... complex logic to extract grep options ...)
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    grep --label=&quot;$file&quot; -H -e &quot;$regex&quot; $options
done

In your particular case, this can be reduced to just

regex=$1
shift
for file; do
    gzip -dc &lt;&quot;$file&quot; |
    grep -q &quot;$regex&quot; &amp;&amp;
    echo &quot;$file&quot;
done

without any GNUisms.

Obviously, you'd replace gzip -dc with whatever you need to extract the information from the file type you want to process.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

显示与管道grep条件匹配的文件。

问题

答案1

将标准输出的每一行作为新工具的标准输入。

创建一个换行符的 Bash 脚本？

awk: 从文件中读取模式，awk ‘$2 !~ /{换行分隔的文件}/ && $1 > 5000’

Explain BASH code line let RETVAL=$((RETVAL|$?)): 解释BASH代码行 let RETVAL=$((RETVAL|$?))：

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。