2023年2月18日 00:39:59go评论80阅读模式

英文:

Can awk find a field containing a string from a list?

问题

我有一个包含不同字段的文件。
我有另一个包含不同单词列表的文件。
我需要使用awk命令从我的第一个文件中提取所有记录，其中特定字段包含来自我的第二个文件的一个或多个单词。

例如，第一个文件：

Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt

例如，第二个文件：

rm
reboot
shutdown

然后awk命令应该返回：

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

尝试了使用数组/映射，也尝试了以下命令：

awk -F ": " '{if ($3 ~ "^rm" || $3 ~ "^reboot" || $3 ~ "^shutdown") print}'

但我要查找的单词列表越来越多。
我宁愿使用一个文件列表。

感谢任何帮助。

谢谢！
Serge

英文:

I have a file containing different fields.
I have another file containing a list of different words.
I need to use awk command to extract from my 1st file all records where a specific field contains one or different words from my 2nd file.

For example 1st file:

Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt

For example 2nd file:

rm
reboot
shutdown

Then awk command should returns:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

Tried deperatly with array/map.

Tried this to:

awk -F &quot;: &quot; &#39;{if ($3 ~ &quot;^rm&quot; || $3 ~ &quot;^reboot&quot; || $3 ~ &quot;^shutdown&quot;) print}&#39;

But the list of words I'm looking for is getting bigger and bigger.
I'd rather use a file list.

Appreciate any help.

Thank you !
Serge

答案1

得分: 3

你可以这样做：

awk -F ': ' '
    FNR == NR { commands[$0]; next }
    split($3, cmdline, " ") && (cmdline[1] in commands)
' file2 file1

输出：

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

英文:

You might do it like this:

awk -F &#39;: &#39; &#39;
    FNR == NR { commands[$0]; next }
    split($3,cmdline,&quot; &quot;) &amp;&amp; (cmdline[1] in commands)
&#39; file2 file1

output:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

答案2

得分: 0

不要浪费时间处理数组。只需动态生成硬编码的regex：

printf '%s' "${file_a}" | 
gawk -p- -b 'BEGIN { FS = "[]]: " } '"$(
 awk -v __="${file_b}" 'BEGIN { 
    FS = RS  ;    OFS = "|"
    RS = "^$"; _= ORS =  ""
    
    $_ = __
    print "$NF ~ \"^(" $(_*(NF-=_==$NF)) ")( |$)\"" }' )"
>
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
>
# this part being dynamically generated
awk 'BEGIN { FS = "[]]: " } $NF ~ "^(rm|reboot|shutdown)( |$)" '

然后，不需要循环遍历数组，它将高速单次通过file A，而无需存储任何行之间的内容。

英文:

don't waste time with arrays. just dynamically generate hard-coded regex on the fly :

printf &#39;%s&#39; &quot;${file_a}&quot; | 
gawk -p-             -b &#39;BEGIN { FS = &quot;[]]: &quot; } &#39;&quot;$(
 awk -v  __=&quot;${file_b}&quot; &#39;BEGIN { 
    FS = RS  ;    OFS = &quot;|&quot;
    RS = &quot;^$&quot;; _= ORS =  &quot;&quot;
    $_ = __
    print &quot;$NF ~ \&quot;^(&quot; $(_*(NF-=_==$NF)) &quot;)( |$)\&quot;&quot; }&#39; )&quot;

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

>
# this part being dynamically generated

awk &#39;BEGIN { FS = &quot;[]]: &quot; } $NF ~ &quot;^(rm|reboot|shutdown)( |$)&quot; &#39;

then instead of looping through an array, it'll be a high speed single pass through file A without having to store any rows in between

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

awk可以找到包含列表中字符串的字段吗？

问题

答案1

答案2

寻找另一个范围内的最近较高和较低范围

AWK命令根据另一列中相同的值获取列中的唯一值。

在Awk中拆分SAM文件，保留N行作为标题。

将awk用于根据字符分隔bed文件中的行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。