英文:
Can awk find a field containing a string from a list?
问题
我有一个包含不同字段的文件。
我有另一个包含不同单词列表的文件。
我需要使用awk命令从我的第一个文件中提取所有记录,其中特定字段包含来自我的第二个文件的一个或多个单词。
例如,第一个文件:
Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt
例如,第二个文件:
rm
reboot
shutdown
然后awk命令应该返回:
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
尝试了使用数组/映射,也尝试了以下命令:
awk -F ": " '{if ($3 ~ "^rm" || $3 ~ "^reboot" || $3 ~ "^shutdown") print}'
但我要查找的单词列表越来越多。
我宁愿使用一个文件列表。
感谢任何帮助。
谢谢!
Serge
英文:
I have a file containing different fields.
I have another file containing a list of different words.
I need to use awk command to extract from my 1st file all records where a specific field contains one or different words from my 2nd file.
For example 1st file:
Feb 15 12:05:10 lcif adm.slm: root [23416]: cd /tmp
Feb 15 12:05:24 lcif adm.slm: root [23416]: cat tst.sh
Feb 15 12:05:44 lcif adm.slm: root [23416]: date
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:05:58 lcif adm.pse: root [23419]: who
Feb 15 12:06:02 lcif adm.pse: root [23419]: uptime
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Feb 15 12:06:58 lcif adm.pse: root [23419]: ls -lrt
For example 2nd file:
rm
reboot
shutdown
Then awk command should returns:
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
Tried deperatly with array/map.
Tried this to:
awk -F ": " '{if ($3 ~ "^rm" || $3 ~ "^reboot" || $3 ~ "^shutdown") print}'
But the list of words I'm looking for is getting bigger and bigger.
I'd rather use a file list.
Appreciate any help.
Thank you !
Serge
答案1
得分: 3
你可以这样做:
awk -F ': ' '
FNR == NR { commands[$0]; next }
split($3, cmdline, " ") && (cmdline[1] in commands)
' file2 file1
输出:
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
英文:
You might do it like this:
awk -F ': ' '
FNR == NR { commands[$0]; next }
split($3,cmdline," ") && (cmdline[1] in commands)
' file2 file1
output:
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
答案2
得分: 0
不要浪费时间处理数组。只需动态生成硬编码的regex
:
printf '%s' "${file_a}" |
gawk -p- -b 'BEGIN { FS = "[]]: " } '"$(
awk -v __="${file_b}" 'BEGIN {
FS = RS ; OFS = "|"
RS = "^$"; _= ORS = ""
$_ = __
print "$NF ~ \"^(" $(_*(NF-=_==$NF)) ")( |$)\"" }' )"
>
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
>
# this part being dynamically generated
awk 'BEGIN { FS = "[]]: " } $NF ~ "^(rm|reboot|shutdown)( |$)" '
然后,不需要循环遍历数组,它将高速单次通过file A
,而无需存储任何行之间的内容。
英文:
don't waste time with arrays. just dynamically generate hard-coded regex
on the fly :
printf '%s' "${file_a}" |
gawk -p- -b 'BEGIN { FS = "[]]: " } '"$(
awk -v __="${file_b}" 'BEGIN {
FS = RS ; OFS = "|"
RS = "^$"; _= ORS = ""
$_ = __
print "$NF ~ \"^(" $(_*(NF-=_==$NF)) ")( |$)\"" }' )"
>
Feb 15 12:05:52 lcif adm.pse: root [23419]: rm -f file
Feb 15 12:06:56 lcif adm.pse: root [23419]: reboot
>
# this part being dynamically generated
awk 'BEGIN { FS = "[]]: " } $NF ~ "^(rm|reboot|shutdown)( |$)" '
then instead of looping through an array, it'll be a high speed single pass through file A
without having to store any rows in between
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论