AWK无法从maillog中提取”from”字段。

huangapple go评论65阅读模式
英文:

AWK not able to extract the from field in the maillog

问题

以下是翻译的内容:

我正在尝试从邮件日志中提取一些字段,除了下面的错误消息之外,一切正常。

May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=<"dexter prod" <dexter_noreply@au.edu>>, size=452, class=0, nrcpts=1, msgid=<202305101115.34ABF4Kb946558@chitraak.abc.com>, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]

现在,当我执行awk命令时,它给我以下输出。

cat email | awk '{print $7 " " $NF}'
from=<"dexter [1.2.3.4]

现在,所有的电子邮件地址都以from=<abc@xyz.com>的格式呈现。只有这个消息首先有自定义名称,然后是电子邮件地址。有没有人可以告诉我一个统一的正则表达式,可以适用于上述文本和其他消息?

期望的输出是:

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]

由于其他消息的from字段中没有空格,所以我的awk命令适用于这些消息。但在涉及到空格的情况下提取失败。此外,可以提供上述输出的正则表达式也应适用于其余以from=&lt;dexter_noreply@au.edu格式设置其from字段的消息。

所以,假设有两条格式不同的消息,如下所示。

May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;, size=452, class=0, nrcpts=1, msgid=&lt;202305101115.34ABF4Kb946558@chitraak.abc.com&gt;, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=&lt;dexter_noreply@au.edu&gt;, size=452, class=0, nrcpts=1, msgid=&lt;202305101115.34ABF4Kb946558@chitraak.abc.com&gt;, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.5]

实施正则表达式后,期望的输出应如下所示。

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.4]

是否可能,还是我需要将这两个分开的正则表达式捕获为两个独立的部分?

英文:

I am trying to extract some fields from the maillog and it worked fine except this below message when it failed.

May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;, size=452, class=0, nrcpts=1, msgid=&lt;202305101115.34ABF4Kb946558@chitraak.abc.com&gt;, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]

Now when I perform awk it gives me below output

cat email | awk &#39;{print $7 &quot; &quot; $NF}&#39;
from=&lt;&quot;dexter [1.2.3.4]

Now all email addresses are in format from=<abc@xyz.com>. Only this message has custom name first and then the email addresses. Can some one tell one unifying regex that will work for above text and rest of the messages as well.

Desired output

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]

Since rest of messages does not have spaces in from=<"email"> , that's why my awk commands work for those messages. It fails to extract when there is space involved. Also the regex that will give the above output should also work for rest of messages that have there from field set in format from=&lt;dexter_noreply@au.edu

So let's say there are two messages with different formats like below.

May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;, size=452, class=0, nrcpts=1, msgid=&lt;202305101115.34ABF4Kb946558@chitraak.abc.com&gt;, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.4]
May 10 07:15:04 chitraak sendmail[8558]: 34ABF4Kb008569: from=&lt;dexter_noreply@au.edu&gt;, size=452, class=0, nrcpts=1, msgid=&lt;202305101115.34ABF4Kb946558@chitraak.abc.com&gt;, proto=ESMTP, daemon=MTA, relay=ip-192-68-1-4.ec2.internal [1.2.3.5]

The desired output after implementing the regex should be like below

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.4]

Is it possible or do I need to capture these as two separate regex ?

答案1

得分: 2

Using GNU awk:

不需要使用管道 cat | awk,这是一个 UUOC,也就是无用的Cat用法

awk 'BEGIN{FPAT="from=.*>>"}{print $1}' file
from=<"dexter prod" <dexter_noreply@au.edu>>

参见按内容拆分

使用grep

grep -oE 'from=.*?&gt;&gt;' file
from=<"dexter prod" <dexter_noreply@au.edu>>

使用sed

sed -E 's/.*(from=.*?&gt;&gt;).*/\1/' file
from=<"dexter prod" <dexter_noreply@au.edu>>

使用Perl

perl -nE 'say $& if /from=.*?&gt;&gt;/' file
from=<"dexter prod" <dexter_noreply@au.edu>>

英文:

Using GNU awk:

no need to pipe cat | awk, it's a UUOC aka Useless Use Of Cat

awk &#39;BEGIN{FPAT=&quot;from=.*&gt;&gt;&quot;}{print $1}&#39; file
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

See splitting by content

With grep:

grep -oE &#39;from=.*?&gt;&gt;&#39; file
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

With sed:

sed -E &#39;s/.*(from=.*?&gt;&gt;).*//&#39; file
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

With Perl:

perl -nE &#39;say $&amp; if /from=.*?&gt;&gt;/&#39; file
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

答案2

得分: 2

第一种解决方案: 使用awkmatch函数,请尝试以下解决方案。使用正则表达式: from=&lt;&quot;[^&gt;]*&gt;&gt;以获取OP所需的确切输出。如果找到正则表达式的匹配,则通过子字符串打印匹配的值。

awk 'match($0,/: from=&lt;&quot;[^&gt;]*&gt;&gt;/){print substr($0,RSTART+2,RLENGTH-2)}' 输入文件

第二种解决方案: 使用带有E选项的sed,请尝试以下操作。

sed -E 's/^.*: (from=&lt;&quot;[^&gt;]*&gt;&gt;).*$//' 输入文件

第三种解决方案: 使用带有正则表达式的GNU grep,并使用\K选项在打印时忽略匹配,使用以下代码。

grep -oP '^.*: \Kfrom=&lt;&quot;[^&gt;]*&gt;&gt;' 输入文件

第四种解决方案: 使用GNU awk及其RSRT功能,请尝试以下操作。

awk -v RS=': from=&lt;&quot;[^&gt;]*&gt;&gt;' '
RT && split(RT,arr,": "){
  print arr[2]
}
' 输入文件

第五种解决方案: 仅使用您展示的示例,使用简单的字段分隔符。

awk -F': |, ' '{print $3}' 输入文件
英文:

1st solution: With using awk's match function please try following solution. Using regex : from=&lt;&quot;[^&gt;]*&gt;&gt; to get the exact required output by OP. If a match of regex is Found then printing the matched values by sub-string.

awk &#39;match($0,/: from=&lt;&quot;[^&gt;]*&gt;&gt;/){print substr($0,RSTART+2,RLENGTH-2)}&#39; Input_file

2nd solution: Using sed with E option please try following.

sed -E &#39;s/^.*: (from=&lt;&quot;[^&gt;]*&gt;&gt;).*$//&#39; Input_file

3rd solution: Using GNU grep with regex and using \K option to forget the match while printing use following code.

grep -oP &#39;^.*: \Kfrom=&lt;&quot;[^&gt;]*&gt;&gt;&#39; Input_file

4th solution: Using GNU awk with its RS and RT capabilities try following.

awk -v RS=&#39;: from=&lt;&quot;[^&gt;]*&gt;&gt;&#39; &#39;
RT &amp;&amp; split(RT,arr,&quot;: &quot;){
  print arr[2]
}
&#39; Input_file

5th solution: Using simple field separators with your shown samples only.

awk -F&#39;: |, &#39; &#39;{print $3}&#39; Input_file

答案3

得分: 2

可以使用逗号来标记from=字段的结束,安全性可能较高:

awk 'match($0,/from=[^,]*/) { print substr($0,RSTART,RLENGTH), $NF }'
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.5]
英文:

It's probably safe to use the comma to mark the end of the from= field:

awk &#39;match($0,/from=[^,]*/) { print substr($0,RSTART,RLENGTH), $NF }&#39;
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.5]

答案4

得分: 1

from=<"dexter prod" <dexter_noreply@au.edu>>
更少使用正则表达式,更多硬编码的方式是

from=<"dexter prod" <dexter_noreply@au.edu>>

英文:

> mawk 'BEGIN { ORS = sprintf("%.*s\n",+= ++,RS = ">>[^\n]+\n")
> FS = ".+ " (OFS = "from=") } NF = _'

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

a less regex more hard-coded in way would be

> gawk '$(NF = !_ + ($!_ = "from")^_) = $2 ">>"' FS='=|>>.+$' OFS==

from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt;

答案5

得分: 1

这个GNU的sed应该适用于两种情况:

sed -E 's/.*(from=.*>.*), size=.* (\[.*\])$///' file
英文:

This GNU sed should work for both the cases :

sed -E &#39;s/.*(from=.*&gt;.*), size=.* (\[.*\])$///&#39; file

答案6

得分: 1

使用sed命令:

$ sed 's/.*\(from=[^,]*\).* / /' email
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.5]
英文:

Using any sed:

$ sed &#39;s/.*\(from=[^,]*\).* / /&#39; email
from=&lt;&quot;dexter prod&quot; &lt;dexter_noreply@au.edu&gt;&gt; [1.2.3.4]
from=&lt;dexter_noreply@au.edu&gt; [1.2.3.5]

huangapple
  • 本文由 发表于 2023年5月11日 11:24:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223955.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定