英文:
AWK not able to extract the from field in the maillog
问题
以下是翻译的内容:
我正在尝试从邮件日志中提取一些字段,除了下面的错误消息之外,一切正常。
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<"dexter prod" <dexter_noreply@au.edu>>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
现在,当我执行awk命令时,它给我以下输出。
cat email | awk '{print $7 " " $NF}'
from=<"dexter [1.2.3.4]
现在,所有的电子邮件地址都以from=<abc@xyz.com>的格式呈现。只有这个消息首先有自定义名称,然后是电子邮件地址。有没有人可以告诉我一个统一的正则表达式,可以适用于上述文本和其他消息?
期望的输出是:
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
由于其他消息的from字段中没有空格,所以我的awk命令适用于这些消息。但在涉及到空格的情况下提取失败。此外,可以提供上述输出的正则表达式也应适用于其余以from=<dexter_noreply@au.edu
格式设置其from字段的消息。
所以,假设有两条格式不同的消息,如下所示。
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<"dexter prod" <dexter_noreply@au.edu>>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<dexter_noreply@au.edu>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.5]
实施正则表达式后,期望的输出应如下所示。
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.4]
是否可能,还是我需要将这两个分开的正则表达式捕获为两个独立的部分?
英文:
I am trying to extract some fields from the maillog and it worked fine except this below message when it failed.
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<"dexter prod" <dexter_noreply@au.edu>>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
Now when I perform awk it gives me below output
cat email | awk '{print $7 " " $NF}'
from=<"dexter [1.2.3.4]
Now all email addresses are in format from=<abc@xyz.com>. Only this message has custom name first and then the email addresses. Can some one tell one unifying regex that will work for above text and rest of the messages as well.
Desired output
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
Since rest of messages does not have spaces in from=<"email"> , that's why my awk commands work for those messages. It fails to extract when there is space involved. Also the regex that will give the above output should also work for rest of messages that have there from field set in format from=<dexter_noreply@au.edu
So let's say there are two messages with different formats like below.
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<"dexter prod" <dexter_noreply@au.edu>>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<dexter_noreply@au.edu>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.5]
The desired output after implementing the regex should be like below
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.4]
Is it possible or do I need to capture these as two separate regex ?
答案1
得分: 2
Using GNU
awk
:
不需要使用管道 cat | awk
,这是一个 UUOC
,也就是无用的Cat用法
awk 'BEGIN{FPAT="from=.*>>"}{print $1}' file
from=<"dexter prod" <dexter_noreply@au.edu>>
参见按内容拆分
使用grep
:
grep -oE 'from=.*?>>' file
from=<"dexter prod" <dexter_noreply@au.edu>>
使用sed
:
sed -E 's/.*(from=.*?>>).*/\1/' file
from=<"dexter prod" <dexter_noreply@au.edu>>
使用Perl
:
perl -nE 'say $& if /from=.*?>>/' file
from=<"dexter prod" <dexter_noreply@au.edu>>
英文:
Using GNU
awk
:
no need to pipe cat | awk
, it's a UUOC
aka Useless Use Of Cat
awk 'BEGIN{FPAT="from=.*>>"}{print $1}' file
from=<"dexter prod" <dexter_noreply@au.edu>>
With grep
:
grep -oE 'from=.*?>>' file
from=<"dexter prod" <dexter_noreply@au.edu>>
With sed
:
sed -E 's/.*(from=.*?>>).*//' file
from=<"dexter prod" <dexter_noreply@au.edu>>
With Perl
:
perl -nE 'say $& if /from=.*?>>/' file
from=<"dexter prod" <dexter_noreply@au.edu>>
答案2
得分: 2
第一种解决方案: 使用awk
的match
函数,请尝试以下解决方案。使用正则表达式: from=<"[^>]*>>
以获取OP所需的确切输出。如果找到正则表达式的匹配,则通过子字符串打印匹配的值。
awk 'match($0,/: from=<"[^>]*>>/){print substr($0,RSTART+2,RLENGTH-2)}' 输入文件
第二种解决方案: 使用带有E
选项的sed
,请尝试以下操作。
sed -E 's/^.*: (from=<"[^>]*>>).*$//' 输入文件
第三种解决方案: 使用带有正则表达式的GNU grep
,并使用\K
选项在打印时忽略匹配,使用以下代码。
grep -oP '^.*: \Kfrom=<"[^>]*>>' 输入文件
第四种解决方案: 使用GNU awk
及其RS
和RT
功能,请尝试以下操作。
awk -v RS=': from=<"[^>]*>>' '
RT && split(RT,arr,": "){
print arr[2]
}
' 输入文件
第五种解决方案: 仅使用您展示的示例,使用简单的字段分隔符。
awk -F': |, ' '{print $3}' 输入文件
英文:
1st solution: With using awk
's match
function please try following solution. Using regex : from=<"[^>]*>>
to get the exact required output by OP. If a match of regex is Found then printing the matched values by sub-string.
awk 'match($0,/: from=<"[^>]*>>/){print substr($0,RSTART+2,RLENGTH-2)}' Input_file
2nd solution: Using sed
with E
option please try following.
sed -E 's/^.*: (from=<"[^>]*>>).*$//' Input_file
3rd solution: Using GNU grep
with regex and using \K
option to forget the match while printing use following code.
grep -oP '^.*: \Kfrom=<"[^>]*>>' Input_file
4th solution: Using GNU awk
with its RS
and RT
capabilities try following.
awk -v RS=': from=<"[^>]*>>' '
RT && split(RT,arr,": "){
print arr[2]
}
' Input_file
5th solution: Using simple field separators with your shown samples only.
awk -F': |, ' '{print $3}' Input_file
答案3
得分: 2
可以使用逗号来标记from=
字段的结束,安全性可能较高:
awk 'match($0,/from=[^,]*/) { print substr($0,RSTART,RLENGTH), $NF }'
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.5]
英文:
It's probably safe to use the comma to mark the end of the from=
field:
awk 'match($0,/from=[^,]*/) { print substr($0,RSTART,RLENGTH), $NF }'
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.5]
答案4
得分: 1
from=<"dexter prod" <dexter_noreply@au.edu>>
更少使用正则表达式
,更多硬编码的方式是
from=<"dexter prod" <dexter_noreply@au.edu>>
英文:
> mawk 'BEGIN { ORS = sprintf("%.*s\n",+= ++,RS = ">>[^\n]+\n")
> FS = ".+ " (OFS = "from=") } NF = _'
from=<"dexter prod" <dexter_noreply@au.edu>>
a less regex
more hard-coded in way would be
> gawk '$(NF = !_ + ($!_ = "from")^_) = $2 ">>"' FS='=|>>.+$' OFS==
from=<"dexter prod" <dexter_noreply@au.edu>>
答案5
得分: 1
这个GNU的sed应该适用于两种情况:
sed -E 's/.*(from=.*>.*), size=.* (\[.*\])$///' file
英文:
This GNU
sed
should work for both the cases :
sed -E 's/.*(from=.*>.*), size=.* (\[.*\])$///' file
答案6
得分: 1
使用sed命令:
$ sed 's/.*\(from=[^,]*\).* / /' email
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.5]
英文:
Using any sed:
$ sed 's/.*\(from=[^,]*\).* / /' email
from=<"dexter prod" <dexter_noreply@au.edu>> [1.2.3.4]
from=<dexter_noreply@au.edu> [1.2.3.5]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论