PCRE – 如何仅匹配变量字符串,仅当整行匹配特定字符串时

huangapple go评论59阅读模式
英文:

PCRE - How to match variable strings only if entire line matches specific string

问题

以下是翻译好的内容:

这是用于日志事件的,我想匹配数据中可能存在的某些字段,但仅限于包含特定字符串的事件,例如示例中的字符串是 type="traffic"

显然,我应该更新这个问题,而不是提出另一个问题,很抱歉我不能接受这个答案,有人扣了我的分数。

谢谢!

以下是示例事件,我想捕获任何包含 type="traffic" 的行中的特定字段,但不包括 type="utm" 的行。

这是基于一个答案的正则表达式,所以我为我的实际数据重新编写了它,但它没有正常工作。

这是到目前为止最佳解决方案的正则表达式示例的链接:

https://regex101.com/r/BDkbMb/1

英文:

This is for log events, where I want to match certain fields that might exist in data, but only for events that contain a certain string, in this example that string is type="traffic"

Apparently I'm supposed to update this question instead of asking another one, sorry I couldn't accept the answer to this, someone dinged me on points.

Thanks!

Here are sample events, I want to capture specific fields in any line that includes type="traffic" but not any that have type="utm"

Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444922720 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="incoming" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723388 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444901220 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=15895 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="outgoing" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Network.Service" app="SSL" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723383 url="/" msg="Network.Service: SSL," apprisk="elevated" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444603820 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=45.42.34.136 dstip=10.150.148.104 srcport=60638 dstport=443 srcintf="port1" srcintfrole="wan" dstintf="port5" dstintfrole="dmz" proto=6 service="SSL" direction="incoming" policyid=10 sessionid=4976049 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="www.testdata.com" incidentserialno=205723390 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="www.testdata.com"
May 19 16:32:23 10.150.160.13 date=2023-05-19 time=16:32:25 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684539145135795404 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=64507 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547154413 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=64507 duration=249 sentbyte=169 rcvdbyte=231 sentpkt=2 rcvdpkt=2 appcat="unscanned" srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:23:57 10.150.160.13 date=2023-05-19 time=16:23:58 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538639125610717 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=63392 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547134202 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=63392 duration=145 sentbyte=230 rcvdbyte=382 sentpkt=3 rcvdpkt=3 appcat="unscanned" sentdelta=230 rcvddelta=382 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:25:35 10.132.119.14 date=2023-05-19 time=16:25:36 devname="FW1-testMAIN-ABCT01" devid="ABCT3KD3Z17800372" eventtime=1684538737153322514 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.151.143.4 srcport=50423 srcintf="port4" srcintfrole="lan" dstip=52.111.145.1 dstport=443 dstintf="port3" dstintfrole="undefined" srccountry="Reserved" dstinetsvc="Microsoft-Office365" dstcountry="United States" dstregion="California" dstcity="San Jose" dstreputation=5 sessionid=3673596551 proto=6 action="accept" policyid=10045 policytype="policy" poluuid="96f15028-15d6-51e9-6b81-d98bf1466b99" user="JULLOPEZ" authserver="FSSO_PSR" service="Microsoft-Office365" trandisp="snat" transip=199.68.152.135 transport=50423 appid=41468 app="Microsoft.Office.365.Portal" appcat="Collaboration" apprisk="elevated" applist="Edge-Prod-Block-Mode-P2P_PROXY" duration=30553 sentbyte=94401 rcvdbyte=112203 sentpkt=1052 rcvdpkt=1539 sentdelta=254 rcvddelta=230
May 19 16:26:00 10.150.160.13 date=2023-05-19 time=16:26:01 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538762118706615 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.106.11 srcport=54254 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=17.188.143.10 dstport=443 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=513091673 proto=6 action="accept" policyid=83 policytype="policy" poluuid="2k2k2-b4c8-51e9-512e-62cf5b7e3bcd" policyname="Internal Server Nets Outbound" service="HTTPS" trandisp="snat" transip=38.70.139.3 transport=54254 appid=42662 app="Apple.Services" appcat="General.Interest" apprisk="elevated" applist="PROD-APPCTRL_LV-EXT" appact="detected" duration=686309 sentbyte=22070530 rcvdbyte=14199406 sentpkt=279649 rcvdpkt=148924 sentdelta=3600 rcvddelta=2352 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:26:59 10.151.129.106 date=2023-05-19 time=16:27:00 devname="FW1-testPSR-DC" devid="ABCT3KD3Z17800305" eventtime=1684538820421783095 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="DataCenter" srcip=10.151.110.100 srcname="PSRPSOLAPP01.test.NET" identifier=2875 srcintf="Enterprise_ACI" srcintfrole="wan" dstip=10.132.116.4 dstname="10.132.116.4" dstintf="Enterprise" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=3796675657 proto=1 action="accept" policyid=10654 policytype="policy" poluuid="6ddcbe16-9058-51ec-2052-64e35cf6fddc" policyname="Solarwinds Catch-ALL" user="SVC-SOLARWINDS-IPAM" authserver="FSSO_PSR" service="PING" trandisp="noop" duration=60 sentbyte=59 rcvdbyte=59 sentpkt=1 rcvdpkt=1 appcat="unscanned"
May 19 16:33:13 10.150.148.52 date=2023-05-19 time=16:33:14 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1684539194871377700 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" srcip=10.151.100.36 identifier=18877 srcintf="TUNNEL_SCH" srcintfrole="undefined" dstip=10.150.148.52 dstintf="port2" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=105443335 proto=1 action="accept" policyid=8 policytype="policy" poluuid="06d7ce0e-e8ae-51ed-b77f-59e907ddba86" policyname="test TO AZURE LAN" service="icmp/8/0" trandisp="noop" appid=24466 app="Ping" appcat="Network.Service" apprisk="elevated" applist="PROD-APPCTRL-AZURE" duration=60 sentbyte=84 rcvdbyte=84 sentpkt=1 rcvdpkt=1 vpn="TUNNEL_SCH" vpntype="ipsec-static" utmaction="allow" countapp=1 masterdstmac="12:12:12:12:9a:bc" dstmac="12:12:12:12:9a:bc" dstserver=1

This regex is based off one of the answers, so I rewrote it for my actual data but it's not working right

.*type="(anomaly|log|event|utm)".*(*SKIP)(*FAIL)|(^.{15})\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?(devname=\S+)\s(devid=\S+).+?(?:vd=\H+)|(srcip=\H+)|(srcport=\H+)|(srcintf=\H+)|(dstip=\H+)|(dstport=\H+)|(dstintf=\H+)|(proto=\H+)|(action=\H+)|(policyid=\H+)|(user=\H+)|(service=\H+)|(transport=\H+)|(app=\H+)|(applist=\H+)|(vpn=\H+)|(vpntype="?\S+"?)|(?:\s+)|(?:\S+=".+?")|(?:\S+=\S+)

Here is link to regex example with best solution so far:

https://regex101.com/r/BDkbMb/1

答案1

得分: 2

PCRE/PCRE2(带有\K\G):

(?(DEFINE)                 # 子模式声明:
(?<field>                # 匹配一个字段
\b                     # 具有键值,要么是
(?:field1|field2)      # 'field1' 或 'field2'(插入其他字段名称)
=\S+                   # 然后是 '=' 和 1 个或多个非空白字符。
\b                     #
)                        #
)                          #
# 主模式:
\g<field>(?=.*\btype=a\b)  # 一个字段后跟任何内容,然后 'type=a'
|                          # 或者
(?:\btype=a\b|\G(?!^)).*?  # 'type=a' 或上一个匹配的结尾,后面跟任何内容,
\K                         # 所有这些都被放弃(不包括在匹配中),
\g<field>                  # 然后是一个字段。

regex101.com 上尝试

ECMAScript/.NET(带有后向查找):

\b(?:field[1234])=\S+\b  # 匹配一个字段
(?=.*\btype=a\b)         # 后面跟 'type=a'
|                        # 或者
(?<=\btype=a\b.*)        # 前面跟 'type=a'。
\b(?:field[1234])=\S+\b  # 一个字段

regex101.com 上尝试

英文:

PCRE/PCRE2 (with \K and \G):

(?(DEFINE)                 # Subpattern declaration:
(?&lt;field&gt;                # Match a field
\b                     # with a key that is either
(?:field1|field2)      # &#39;field1&#39; or &#39;field2&#39; (insert other field names here)
=\S+                   # then a &#39;=&#39; and 1+ non-whitespace chars.
\b                     #
)                        #
)                          #
# Main pattern:
\g&lt;field&gt;(?=.*\btype=a\b)  # A field followed by anything then &#39;type=a&#39;
|                          # or
(?:\btype=a\b|\G(?!^)).*?  # &#39;type=a&#39; or the end of the last match, followed by anything,
\K                         # all of which we forfeit (not included in the match),
\g&lt;field&gt;                  # then a field.

Try it on regex101.com.

ECMAScript/.NET (with lookbehind):

\b(?:field[1234])=\S+\b  # Match a field
(?=.*\btype=a\b)         # followed by &#39;type=a&#39;
|                        # or
(?&lt;=\btype=a\b.*)        #         preceded by &#39;type=a&#39;.
\b(?:field[1234])=\S+\b  # a field

Try it on regex101.com.

答案2

得分: 1

以下是您要翻译的内容:

"You can start by matching the beginning date, time and ip of the log.

Then skip matching all the key=value pairs that you don't want, and match the ones that match one of the alternatives:

(?:^[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:.\d{1,3}){3}\b(?=.?\btype="traffic")(?!.\btype="utm")|\G(?!^))(?:(?!\h+(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)\h+\S+)+\h\K[^\s=]+=\S+

Explanation

  • (?: Non capture group for the alternatives
    • ^ Start of string
    • [A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b Match the leading data, time and ip like format
    • (?=.*?\btype=&quot;traffic&quot;)(?!.*\btype=&quot;utm&quot;) Assert that to the right is type=&quot;traffic&quot; and is not type=&quot;utm&quot;
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, not at the start
  • ) Close the non capture group
  • (?: Non capture group
    • (?! Negative lookahead, assert that from the current position to the right is not
    • \h+ Match 1+ horizontal whitespace chars
    • (?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)= Match one of the alternatives
    • \h+\S+ Match 1+ horizontal whitespace chars and 1+ non-whitespace chars
  • )*+ Close the non-capture group and optionally repeat using a possessive quantifier
  • \h*\K Match optional horizontal whitespace chars and forget what is matched until now
  • [^\s=]+=\S+ Match the key-value pair

Regex demo"

英文:

You can start by matching the beginning date, time and ip of the log.

Then skip matching all the key=value pairs that you don't want, and match the ones that match one of the alternatives:

(?:^[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b(?=.*?\btype=&quot;traffic&quot;)(?!.*\btype=&quot;utm&quot;)|\G(?!^))(?:(?!\h+(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)\h+\S+)*+\h*\K[^\s=]+=\S+

Explanation

  • (?: Non capture group for the alternatives
    • ^ Start of string
    • [A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b Match the leading data, time and ip like format
    • (?=.*?\btype=&quot;traffic&quot;)(?!.*\btype=&quot;utm&quot;) Assert that to the right is type=&quot;traffic&quot; and is not type=&quot;utm&quot;
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, not at the start
  • ) Close the non capture group
  • (?: Non capture group
    • (?! Negative lookahead, assert that from the current position to the right is not
    • \h+ Match 1+ horizontal
      whitespace chars
    • (?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=) Match one of the alternatives
    • \h+\S+ Match 1+ horizontal whitespace chars and 1+ non whitespace chars
  • )*+ Close the non capture group and optionally repeat using a possessive quantifier
  • \h*\K Match optional horizontal whitespace chars and forget what is matched until now
  • [^\s=]+=\S+ Match the key value pair

Regex demo

huangapple
  • 本文由 发表于 2023年6月9日 06:17:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76436041.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定