2023年6月6日 04:39:56go评论80阅读模式

英文:

Best way to filter using awk or grep

问题

我有一个文件，其内容格式如下：

1234567890 -&gt; 
2345678901 -&gt; /some/directory/some_file.txt

我试图运行grep或awk命令，只获取包含文件路径而不仅仅是数字 -> 组合的行。我目前尝试的是：

awk '/^[[:digit:]]+\/data\/.*/gm' file_to_test.txt

但它返回了所有行。我觉得解决方案应该很简单，但我可能没有看到。非常感谢任何帮助。

英文:

I have a file that has the following content format;

1234567890 -&gt; 
2345678901 -&gt; /some/directory/some_file.txt

What I am attempting to do is to run either a grep or awk command that will only give me the lines that contain the file paths and not just the number -> combo. My present attempt to do this is;

awk &#39;/^[[:digit:]]+\/data\/.*/gm&#39; file_to_test.txt

except it is returning all the lines. I feel like the solution is really simple and I am just not seeing it. Any help would be really appreciated.

答案1

得分: 3

awk 'NF==3 {print $0}' file_to_test.txt

对于任何具有三个字段（NF==3）的行，打印整行（print $0）。

正如 markp-fuso 指出的：awk 的默认行为是只打印整行，因此上述命令可以缩写为：

awk 'NF==3' file_to_test.txt

英文:

awk &#39;NF==3 {print $0}&#39; file_to_test.txt

For any line that has three fields (NF==3), print the whole line (print $0).

As markp-fuso pointed out: the default behaviour of awk is to just print the whole line, so the above can be shortened to just:

awk &#39;NF==3&#39; file_to_test.txt

答案2

得分: 1

如果文件路径始终包含斜杠/（您没有明确指定，但您的示例表明是这种情况），一个简单的

grep -F / | cut -w -f 3-

就可以。grep选择行，而cut选择行中的文件路径。-w指定行中的字段由空格分隔。我使用3-，即从第3个字段到末尾的所有内容，以允许文件路径包含空格。

英文:

If the file pathes always contain a / (you did not specify this explicitly, but your example suggests that this is the case), a simple

grep -F / | cut -w -f 3-

should do. The grep selects the lines, and the cut selects the file pathes from the line. -w specifies that the fields in the line are separated by white space. I use 3-, i.e. everything from field 3 to the end, to allow file pathes to contain spaces.

答案3

得分: 0

你可以使用 sed 删除只包含数字和箭头的行。

sed -e '/[0-9]* -&gt; *$/d' ./some_file.txt

输出：

2345678901 -&gt; /some/directory/some_file.csv

英文:

You could use sed to delete lines that only contain the digits and the arrow.

sed -e &#39;/[0-9]* -&gt; *$/d&#39; ./some_file.txt

Output:

2345678901 -&gt; /some/directory/some_file.csv

答案4

得分: 0

echo '1234567890 ->
2345678901 -> /some/directory/some_file.txt' |
mawk 'NF *= _ < $NF' FS='^[^/]+' OFS=

/some/directory/some_file.txt

英文:

echo &#39;1234567890 -&gt; 
      2345678901 -&gt; /some/directory/some_file.txt&#39; |

mawk &#39;NF *= _ &lt; $NF&#39; FS=&#39;^[^/]+&#39; OFS=

/some/directory/some_file.txt

答案5

得分: 0

以下是翻译好的部分：

如果需要的话，我可以使用GNU AWK 进行如下处理，假设有一个名为 file.txt 的文件：

1234567890 ->
2345678901 -> /some/directory/some_file.txt

然后运行以下命令：

awk '$3' file.txt

将得到如下输出：

2345678901 -> /some/directory/some_file.txt

解释：将每一行视为包含由一个或多个空白字符分隔的列（这是GNU AWK的默认行为），然后找到第三列为真的元素。请注意，使用GNU AWK，您可以引用超出行范围的字段。免责声明：此解决方案假定路径永远不以只包含 0 数字开头，如果不是这种情况，请不要使用它。

（在GNU Awk 5.1.0中测试过）

英文:

> solution is really simple

If this is desired I would harness GNU AWK following way, let file.txt

1234567890 -&gt; 
2345678901 -&gt; /some/directory/some_file.txt

then

awk &#39;$3&#39; file.txt

gives output

2345678901 -&gt; /some/directory/some_file.txt

Explanation: treat lines as containing columns separated by one-or-more whitespace characters (this is GNU AWK default) and find elements where 3rd column is truthy. Observe that using GNU AWK you might reference fields which are outside range for lines, Disclaimer: this solution assumes path never starts with just 0 digits, if this is not case do not use it

(tested in GNU Awk 5.1.0)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用awk或grep过滤的最佳方法

问题

答案1

答案2

答案3

答案4

答案5

模板变量替换而已。文本/模板是否适合？

Looking for Unix AWK command to perform full outer join of 2 files based on a common column (1st column in unix files is common)

Shell脚本中问号（?）的使用问题

Automatic Ubuntu terminal commands get cut

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论