用awk或grep过滤的最佳方法

huangapple go评论63阅读模式
英文:

Best way to filter using awk or grep

问题

我有一个文件,其内容格式如下:

  1. 1234567890 ->
  2. 2345678901 -> /some/directory/some_file.txt

我试图运行grep或awk命令,只获取包含文件路径而不仅仅是数字 -> 组合的行。我目前尝试的是:

  1. awk '/^[[:digit:]]+\/data\/.*/gm' file_to_test.txt

但它返回了所有行。我觉得解决方案应该很简单,但我可能没有看到。非常感谢任何帮助。

英文:

I have a file that has the following content format;

  1. 1234567890 ->
  2. 2345678901 -> /some/directory/some_file.txt

What I am attempting to do is to run either a grep or awk command that will only give me the lines that contain the file paths and not just the number -> combo. My present attempt to do this is;

  1. awk '/^[[:digit:]]+\/data\/.*/gm' file_to_test.txt

except it is returning all the lines. I feel like the solution is really simple and I am just not seeing it. Any help would be really appreciated.

答案1

得分: 3

awk 'NF==3 {print $0}' file_to_test.txt

对于任何具有三个字段(NF==3)的行,打印整行(print $0)。

正如 markp-fuso 指出的:awk 的默认行为是只打印整行,因此上述命令可以缩写为:

awk 'NF==3' file_to_test.txt

英文:
  1. awk 'NF==3 {print $0}' file_to_test.txt

For any line that has three fields (NF==3), print the whole line (print $0).

As markp-fuso pointed out: the default behaviour of awk is to just print the whole line, so the above can be shortened to just:

  1. awk 'NF==3' file_to_test.txt

答案2

得分: 1

如果文件路径始终包含斜杠/(您没有明确指定,但您的示例表明是这种情况),一个简单的

  1. grep -F / | cut -w -f 3-

就可以。grep选择行,而cut选择行中的文件路径。-w指定行中的字段由空格分隔。我使用3-,即从第3个字段到末尾的所有内容,以允许文件路径包含空格。

英文:

If the file pathes always contain a / (you did not specify this explicitly, but your example suggests that this is the case), a simple

  1. grep -F / | cut -w -f 3-

should do. The grep selects the lines, and the cut selects the file pathes from the line. -w specifies that the fields in the line are separated by white space. I use 3-, i.e. everything from field 3 to the end, to allow file pathes to contain spaces.

答案3

得分: 0

你可以使用 sed 删除只包含数字和箭头的行。

  1. sed -e '/[0-9]* -> *$/d' ./some_file.txt

输出:

  1. 2345678901 -> /some/directory/some_file.csv
英文:

You could use sed to delete lines that only contain the digits and the arrow.

  1. sed -e '/[0-9]* -> *$/d' ./some_file.txt

Output:

  1. 2345678901 -> /some/directory/some_file.csv

答案4

得分: 0

echo '1234567890 ->
2345678901 -> /some/directory/some_file.txt' |
mawk 'NF *= _ < $NF' FS='^[^/]+' OFS=

/some/directory/some_file.txt

英文:
  1. echo &#39;1234567890 -&gt;
  2. 2345678901 -&gt; /some/directory/some_file.txt&#39; |

  1. mawk &#39;NF *= _ &lt; $NF&#39; FS=&#39;^[^/]+&#39; OFS=

  1. /some/directory/some_file.txt

答案5

得分: 0

以下是翻译好的部分:

如果需要的话,我可以使用GNU AWK 进行如下处理,假设有一个名为 file.txt 的文件:

  1. 1234567890 ->
  2. 2345678901 -> /some/directory/some_file.txt

然后运行以下命令:

  1. awk '$3' file.txt

将得到如下输出:

  1. 2345678901 -> /some/directory/some_file.txt

解释:将每一行视为包含由一个或多个空白字符分隔的列(这是GNU AWK的默认行为),然后找到第三列为真的元素。请注意,使用GNU AWK,您可以引用超出行范围的字段。免责声明:此解决方案假定路径永远不以只包含 0 数字开头,如果不是这种情况,请不要使用它。

(在GNU Awk 5.1.0中测试过)

英文:

> solution is really simple

If this is desired I would harness GNU AWK following way, let file.txt

  1. 1234567890 -&gt;
  2. 2345678901 -&gt; /some/directory/some_file.txt

then

  1. awk &#39;$3&#39; file.txt

gives output

  1. 2345678901 -&gt; /some/directory/some_file.txt

Explanation: treat lines as containing columns separated by one-or-more whitespace characters (this is GNU AWK default) and find elements where 3rd column is truthy. Observe that using GNU AWK you might reference fields which are outside range for lines, Disclaimer: this solution assumes path never starts with just 0 digits, if this is not case do not use it

(tested in GNU Awk 5.1.0)

huangapple
  • 本文由 发表于 2023年6月6日 04:39:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定