awk可以找到包含列表中字符串的字段吗?

huangapple go评论54阅读模式
英文:

Can awk find a field containing a string from a list?

问题

我有一个包含不同字段的文件。
我有另一个包含不同单词列表的文件。
我需要使用awk命令从我的第一个文件中提取所有记录,其中特定字段包含来自我的第二个文件的一个或多个单词。

例如,第一个文件:

Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt

例如,第二个文件:

rm
reboot
shutdown

然后awk命令应该返回:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

尝试了使用数组/映射,也尝试了以下命令:

awk -F ": " '{if ($3 ~ "^rm" || $3 ~ "^reboot" || $3 ~ "^shutdown") print}'

但我要查找的单词列表越来越多。
我宁愿使用一个文件列表。

感谢任何帮助。

谢谢!
Serge

英文:

I have a file containing different fields.
I have another file containing a list of different words.
I need to use awk command to extract from my 1st file all records where a specific field contains one or different words from my 2nd file.

For example 1st file:

Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt

For example 2nd file:

rm
reboot
shutdown

Then awk command should returns:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

Tried deperatly with array/map.

Tried this to:

awk -F ": " '{if ($3 ~ "^rm" || $3 ~ "^reboot" || $3 ~ "^shutdown") print}'

But the list of words I'm looking for is getting bigger and bigger.
I'd rather use a file list.

Appreciate any help.

Thank you !
Serge

答案1

得分: 3

你可以这样做:

awk -F ': ' '
    FNR == NR { commands[$0]; next }
    split($3, cmdline, " ") && (cmdline[1] in commands)
' file2 file1

输出:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
英文:

You might do it like this:

awk -F ': ' '
    FNR == NR { commands[$0]; next }
    split($3,cmdline," ") && (cmdline[1] in commands)
' file2 file1

output:

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

答案2

得分: 0

不要浪费时间处理数组。只需动态生成硬编码的regex

printf '%s' "${file_a}" | 

gawk -p- -b 'BEGIN { FS = "[]]: " } '"$(

 awk -v __="${file_b}" 'BEGIN { 

    FS = RS  ;    OFS = "|"
    RS = "^$"; _= ORS =  ""
    
    $_ = __

    print "$NF ~ \"^(" $(_*(NF-=_==$NF)) ")( |$)\"" }' )"
>

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
>
# this part being dynamically generated

awk 'BEGIN { FS = "[]]: " } $NF ~ "^(rm|reboot|shutdown)( |$)" '

然后,不需要循环遍历数组,它将高速单次通过file A,而无需存储任何行之间的内容。

英文:

don't waste time with arrays. just dynamically generate hard-coded regex on the fly :

printf '%s' "${file_a}" | 

gawk -p-             -b 'BEGIN { FS = "[]]: " } '"$(

 awk -v  __="${file_b}" 'BEGIN { 

    FS = RS  ;    OFS = "|"
    RS = "^$"; _= ORS =  ""

    $_ = __

    print "$NF ~ \"^(" $(_*(NF-=_==$NF)) ")( |$)\"" }' )"

>

Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot

>
# this part being dynamically generated

awk 'BEGIN { FS = "[]]: " } $NF ~ "^(rm|reboot|shutdown)( |$)" ' 

then instead of looping through an array, it'll be a high speed single pass through file A without having to store any rows in between

huangapple
  • 本文由 发表于 2023年2月18日 00:39:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75486991.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定