英文:
Python regex issue with optional substring in between
问题
我已经明白您的需求。以下是经过翻译的正则表达式部分:
packet_re = r'.*RADIUS.*\s*Accounting(\s|-)Request.*(Framed(\s|-)IP(\s|-)Address.*Attribute.*Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?.*(Username|User-Name)(\s|-)Attribute.*Value:\s*(?P<username>\S+).*'
如果您需要进一步的帮助,请随时提出。
英文:
Been bashing my head on this since 2 days. I'm trying to match a packet content with regex API:
packet_re = (r'.*RADIUS.*\s*Accounting(\s|-)Request.*(Framed(\s|-)IP(\s|-)Address.*Attribute.*Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?.*(Username|User-Name)(\s|-)Attribute.*Value:\s*(?P<username>\S+).*')
packet1 = """
IP (tos 0x0, ttl 64, id 35592, offset 0, flags [DF], proto UDP (17), length 213)
10.10.10.1.41860 > 10.10.10.3.1813: [udp sum ok] RADIUS, length: 185
Accounting-Request (4), id: 0x0a, Authenticator: 41b3b548c4b7f65fe810544995620308
Framed-IP-Address Attribute (8), length: 6, Value: 10.10.10.11
0x0000: 0a0a 0a0b
User-Name Attribute (1), length: 14, Value: 005056969256
0x0000: 3030 3530 3536 3936 3932 3536
"""
result = search(packet_re, packet1, DOTALL)
The regex matches, but it fails to capture Framed-IP-Address Attribute
, client_ip=10.10.10.11
. The thing is Framed-IP-Address Attribute
can or cannot come in the packet. Hence the pattern is enclosed in another capture group ending with ?
meaning 0 or 1 occurrence.
I should be able to ignore it when it doesn't come. Hence packet content can also be:
packet2 = """
IP (tos 0x0, ttl 64, id 60162, offset 0, flags [DF], proto UDP (17), length 163)
20.20.20.1.54035 > 20.20.20.2.1813: [udp sum ok] RADIUS, length: 135
Accounting-Request (4), id: 0x01, Authenticator: 219b694bcff639221fa29940e8d2a4b2
User-Name Attribute (1), length: 14, Value: 005056962f54
0x0000: 3030 3530 3536 3936 3266 3534
"""
The regex should ignore Framed-IP-Address in this case. It does ignore but it doesn't capture when it does come.
答案1
得分: 2
我建议使用以下正则表达式模式:
RADIUS.*?Accounting[\s-]Request(?:.*?(Framed[\s-]IP[\s-]Address.*?Attribute(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?))?.*User-?[nN]ame[\s-]Attribute.*?Value:\s*(?P<username>\S+)
你可以在正则表达式演示中查看示例。
注意,我已经删除了模式两端的 .*
,因为你使用的是 re.search
,它不需要在字符串的开头进行匹配,而且 MatchData
对象包含了 .string
属性,你可以使用它来获取整个输入字符串。
详细说明:
-
RADIUS
- 一个单词 -
.*?
- 任意零个或多个字符,尽可能少 -
Accounting
- 一个单词 -
[\s-]
- 一个空格或连字符 -
Request
- 一个单词 -
(?:.*?
- 开始一个可选的非捕获组:任意零个或多个字符,尽可能少,然后...(Framed[\s-]IP[\s-]Address.*?Attribute
- 第1组:Framed
+ 一个空格或连字符 +IP
+ 空格/连字符 +Address
+ 任意零个或多个字符,尽可能少 +Attribute
(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?
- 一个可选的非捕获组,匹配任意零个或多个字符,尽可能少,然后Value:
,然后 Group "client_ip":四个一个或多个数字匹配模式,用点号分隔
)
- 第1组结束
-
)?
- 外部非捕获组结束 -
.*
- 任意零个或多个字符,尽可能多 -
User-?[nN]ame
-Username
、UserName
或User-name
/User-Name
-
[\s-]
- 空格或连字符 -
Attribute
- 一个单词 -
.*?
- 任意零个或多个字符,尽可能少 -
Value:
- 一个字面字符串 -
\s*
- 零个或多个空格 -
(?P<username>\S+)
- Group "username":一个或多个非空格字符
英文:
I suggest using
RADIUS.*?Accounting[\s-]Request(?:.*?(Framed[\s-]IP[\s-]Address.*?Attribute(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?))?.*User-?[nN]ame[\s-]Attribute.*?Value:\s*(?P<username>\S+)
See the regex demo.
Note I removed .*
on both ends of the pattern as you are using re.search
that does not require matching at the start of string like re.match
, and the MatchData
object contains .string
property that you can access to obtain the whole input string.
Details
RADIUS
- a word.*?
- any zero or more chars, as few as possibleAccounting
- a word[\s-]
- a whitespace or hyphenRequest
- a word(?:.*?
- start of an optional non-capturing group: any zero or more chars as few as possible, then...(Framed[\s-]IP[\s-]Address.*?Attribute
- Group 1:Framed
+ a whitespace or a hyphen +IP
+ whitespace/hyphen +Address
+ any zero or more chars as few as possible +Attribute
(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?
- an optional non-capturing group matching any zero or more chars as few as possible +Value:
+ Group "client_ip": four one or more digit matching patterns separated with a literal dot
)
- end of the Group 1
)?
- end of the outer non-capturing group.*
- any zero or more chars as many as possibleUser-?[nN]ame
-Username
,UserName
orUser-name
/User-Name
[\s-]
- whitespace or hyphenAttribute
- a word.*?
- any zero or more chars as few as possibleValue:
- a literal string\s*
- zero or more whitespaces(?P<username>\S+)
- Group "username": one or more non-whitespace chars
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论