英文:
PCRE - How to match variable strings only if entire line matches specific string
问题
以下是翻译好的内容:
这是用于日志事件的,我想匹配数据中可能存在的某些字段,但仅限于包含特定字符串的事件,例如示例中的字符串是 type="traffic"
显然,我应该更新这个问题,而不是提出另一个问题,很抱歉我不能接受这个答案,有人扣了我的分数。
谢谢!
以下是示例事件,我想捕获任何包含 type="traffic" 的行中的特定字段,但不包括 type="utm" 的行。
这是基于一个答案的正则表达式,所以我为我的实际数据重新编写了它,但它没有正常工作。
这是到目前为止最佳解决方案的正则表达式示例的链接:
https://regex101.com/r/BDkbMb/1
英文:
This is for log events, where I want to match certain fields that might exist in data, but only for events that contain a certain string, in this example that string is type="traffic"
Apparently I'm supposed to update this question instead of asking another one, sorry I couldn't accept the answer to this, someone dinged me on points.
Thanks!
Here are sample events, I want to capture specific fields in any line that includes type="traffic" but not any that have type="utm"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444922720 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="incoming" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723388 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444901220 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=15895 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="outgoing" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Network.Service" app="SSL" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723383 url="/" msg="Network.Service: SSL," apprisk="elevated" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444603820 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=45.42.34.136 dstip=10.150.148.104 srcport=60638 dstport=443 srcintf="port1" srcintfrole="wan" dstintf="port5" dstintfrole="dmz" proto=6 service="SSL" direction="incoming" policyid=10 sessionid=4976049 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="www.testdata.com" incidentserialno=205723390 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="www.testdata.com"
May 19 16:32:23 10.150.160.13 date=2023-05-19 time=16:32:25 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684539145135795404 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=64507 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547154413 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=64507 duration=249 sentbyte=169 rcvdbyte=231 sentpkt=2 rcvdpkt=2 appcat="unscanned" srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:23:57 10.150.160.13 date=2023-05-19 time=16:23:58 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538639125610717 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=63392 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547134202 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=63392 duration=145 sentbyte=230 rcvdbyte=382 sentpkt=3 rcvdpkt=3 appcat="unscanned" sentdelta=230 rcvddelta=382 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:25:35 10.132.119.14 date=2023-05-19 time=16:25:36 devname="FW1-testMAIN-ABCT01" devid="ABCT3KD3Z17800372" eventtime=1684538737153322514 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.151.143.4 srcport=50423 srcintf="port4" srcintfrole="lan" dstip=52.111.145.1 dstport=443 dstintf="port3" dstintfrole="undefined" srccountry="Reserved" dstinetsvc="Microsoft-Office365" dstcountry="United States" dstregion="California" dstcity="San Jose" dstreputation=5 sessionid=3673596551 proto=6 action="accept" policyid=10045 policytype="policy" poluuid="96f15028-15d6-51e9-6b81-d98bf1466b99" user="JULLOPEZ" authserver="FSSO_PSR" service="Microsoft-Office365" trandisp="snat" transip=199.68.152.135 transport=50423 appid=41468 app="Microsoft.Office.365.Portal" appcat="Collaboration" apprisk="elevated" applist="Edge-Prod-Block-Mode-P2P_PROXY" duration=30553 sentbyte=94401 rcvdbyte=112203 sentpkt=1052 rcvdpkt=1539 sentdelta=254 rcvddelta=230
May 19 16:26:00 10.150.160.13 date=2023-05-19 time=16:26:01 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538762118706615 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.106.11 srcport=54254 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=17.188.143.10 dstport=443 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=513091673 proto=6 action="accept" policyid=83 policytype="policy" poluuid="2k2k2-b4c8-51e9-512e-62cf5b7e3bcd" policyname="Internal Server Nets Outbound" service="HTTPS" trandisp="snat" transip=38.70.139.3 transport=54254 appid=42662 app="Apple.Services" appcat="General.Interest" apprisk="elevated" applist="PROD-APPCTRL_LV-EXT" appact="detected" duration=686309 sentbyte=22070530 rcvdbyte=14199406 sentpkt=279649 rcvdpkt=148924 sentdelta=3600 rcvddelta=2352 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:26:59 10.151.129.106 date=2023-05-19 time=16:27:00 devname="FW1-testPSR-DC" devid="ABCT3KD3Z17800305" eventtime=1684538820421783095 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="DataCenter" srcip=10.151.110.100 srcname="PSRPSOLAPP01.test.NET" identifier=2875 srcintf="Enterprise_ACI" srcintfrole="wan" dstip=10.132.116.4 dstname="10.132.116.4" dstintf="Enterprise" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=3796675657 proto=1 action="accept" policyid=10654 policytype="policy" poluuid="6ddcbe16-9058-51ec-2052-64e35cf6fddc" policyname="Solarwinds Catch-ALL" user="SVC-SOLARWINDS-IPAM" authserver="FSSO_PSR" service="PING" trandisp="noop" duration=60 sentbyte=59 rcvdbyte=59 sentpkt=1 rcvdpkt=1 appcat="unscanned"
May 19 16:33:13 10.150.148.52 date=2023-05-19 time=16:33:14 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1684539194871377700 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" srcip=10.151.100.36 identifier=18877 srcintf="TUNNEL_SCH" srcintfrole="undefined" dstip=10.150.148.52 dstintf="port2" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=105443335 proto=1 action="accept" policyid=8 policytype="policy" poluuid="06d7ce0e-e8ae-51ed-b77f-59e907ddba86" policyname="test TO AZURE LAN" service="icmp/8/0" trandisp="noop" appid=24466 app="Ping" appcat="Network.Service" apprisk="elevated" applist="PROD-APPCTRL-AZURE" duration=60 sentbyte=84 rcvdbyte=84 sentpkt=1 rcvdpkt=1 vpn="TUNNEL_SCH" vpntype="ipsec-static" utmaction="allow" countapp=1 masterdstmac="12:12:12:12:9a:bc" dstmac="12:12:12:12:9a:bc" dstserver=1
This regex is based off one of the answers, so I rewrote it for my actual data but it's not working right
.*type="(anomaly|log|event|utm)".*(*SKIP)(*FAIL)|(^.{15})\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?(devname=\S+)\s(devid=\S+).+?(?:vd=\H+)|(srcip=\H+)|(srcport=\H+)|(srcintf=\H+)|(dstip=\H+)|(dstport=\H+)|(dstintf=\H+)|(proto=\H+)|(action=\H+)|(policyid=\H+)|(user=\H+)|(service=\H+)|(transport=\H+)|(app=\H+)|(applist=\H+)|(vpn=\H+)|(vpntype="?\S+"?)|(?:\s+)|(?:\S+=".+?")|(?:\S+=\S+)
Here is link to regex example with best solution so far:
答案1
得分: 2
PCRE/PCRE2(带有\K
和\G
):
(?(DEFINE) # 子模式声明:
(?<field> # 匹配一个字段
\b # 具有键值,要么是
(?:field1|field2) # 'field1' 或 'field2'(插入其他字段名称)
=\S+ # 然后是 '=' 和 1 个或多个非空白字符。
\b #
) #
) #
# 主模式:
\g<field>(?=.*\btype=a\b) # 一个字段后跟任何内容,然后 'type=a'
| # 或者
(?:\btype=a\b|\G(?!^)).*? # 'type=a' 或上一个匹配的结尾,后面跟任何内容,
\K # 所有这些都被放弃(不包括在匹配中),
\g<field> # 然后是一个字段。
ECMAScript/.NET(带有后向查找):
\b(?:field[1234])=\S+\b # 匹配一个字段
(?=.*\btype=a\b) # 后面跟 'type=a'
| # 或者
(?<=\btype=a\b.*) # 前面跟 'type=a'。
\b(?:field[1234])=\S+\b # 一个字段
英文:
PCRE/PCRE2 (with \K
and \G
):
(?(DEFINE) # Subpattern declaration:
(?<field> # Match a field
\b # with a key that is either
(?:field1|field2) # 'field1' or 'field2' (insert other field names here)
=\S+ # then a '=' and 1+ non-whitespace chars.
\b #
) #
) #
# Main pattern:
\g<field>(?=.*\btype=a\b) # A field followed by anything then 'type=a'
| # or
(?:\btype=a\b|\G(?!^)).*? # 'type=a' or the end of the last match, followed by anything,
\K # all of which we forfeit (not included in the match),
\g<field> # then a field.
Try it on regex101.com.
ECMAScript/.NET (with lookbehind):
\b(?:field[1234])=\S+\b # Match a field
(?=.*\btype=a\b) # followed by 'type=a'
| # or
(?<=\btype=a\b.*) # preceded by 'type=a'.
\b(?:field[1234])=\S+\b # a field
Try it on regex101.com.
答案2
得分: 1
以下是您要翻译的内容:
"You can start by matching the beginning date, time and ip of the log.
Then skip matching all the key=value pairs that you don't want, and match the ones that match one of the alternatives:
(?:^[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:.\d{1,3}){3}\b(?=.?\btype="traffic")(?!.\btype="utm")|\G(?!^))(?:(?!\h+(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)\h+\S+)+\h\K[^\s=]+=\S+
Explanation
(?:
Non capture group for the alternatives^
Start of string[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b
Match the leading data, time and ip like format(?=.*?\btype="traffic")(?!.*\btype="utm")
Assert that to the right istype="traffic"
and is nottype="utm"
|
Or\G(?!^)
Assert the current position at the end of the previous match, not at the start
)
Close the non capture group(?:
Non capture group(?!
Negative lookahead, assert that from the current position to the right is not\h+
Match 1+ horizontal whitespace chars(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=
Match one of the alternatives\h+\S+
Match 1+ horizontal whitespace chars and 1+ non-whitespace chars
)*+
Close the non-capture group and optionally repeat using a possessive quantifier\h*\K
Match optional horizontal whitespace chars and forget what is matched until now[^\s=]+=\S+
Match the key-value pair
英文:
You can start by matching the beginning date, time and ip of the log.
Then skip matching all the key=value pairs that you don't want, and match the ones that match one of the alternatives:
(?:^[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b(?=.*?\btype="traffic")(?!.*\btype="utm")|\G(?!^))(?:(?!\h+(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)\h+\S+)*+\h*\K[^\s=]+=\S+
Explanation
(?:
Non capture group for the alternatives^
Start of string[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b
Match the leading data, time and ip like format(?=.*?\btype="traffic")(?!.*\btype="utm")
Assert that to the right istype="traffic"
and is nottype="utm"
|
Or\G(?!^)
Assert the current position at the end of the previous match, not at the start
)
Close the non capture group(?:
Non capture group(?!
Negative lookahead, assert that from the current position to the right is not\h+
Match 1+ horizontal
whitespace chars(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)
Match one of the alternatives\h+\S+
Match 1+ horizontal whitespace chars and 1+ non whitespace chars
)*+
Close the non capture group and optionally repeat using a possessive quantifier\h*\K
Match optional horizontal whitespace chars and forget what is matched until now[^\s=]+=\S+
Match the key value pair
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论