Python正则表达式中可选子串之间的问题

huangapple go评论66阅读模式
英文:

Python regex issue with optional substring in between

问题

我已经明白您的需求。以下是经过翻译的正则表达式部分:

packet_re = r'.*RADIUS.*\s*Accounting(\s|-)Request.*(Framed(\s|-)IP(\s|-)Address.*Attribute.*Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?.*(Username|User-Name)(\s|-)Attribute.*Value:\s*(?P<username>\S+).*'

如果您需要进一步的帮助,请随时提出。

英文:

Been bashing my head on this since 2 days. I'm trying to match a packet content with regex API:

packet_re = (r&#39;.*RADIUS.*\s*Accounting(\s|-)Request.*(Framed(\s|-)IP(\s|-)Address.*Attribute.*Value: (?P&lt;client_ip&gt;\d+\.\d+\.\d+\.\d+))?.*(Username|User-Name)(\s|-)Attribute.*Value:\s*(?P&lt;username&gt;\S+).*&#39;)

packet1 = &quot;&quot;&quot;
IP (tos 0x0, ttl 64, id 35592, offset 0, flags [DF], proto UDP (17), length 213)
    10.10.10.1.41860 &gt; 10.10.10.3.1813: [udp sum ok] RADIUS, length: 185
	Accounting-Request (4), id: 0x0a, Authenticator: 41b3b548c4b7f65fe810544995620308
	  Framed-IP-Address Attribute (8), length: 6, Value: 10.10.10.11
	    0x0000:  0a0a 0a0b
	  User-Name Attribute (1), length: 14, Value: 005056969256
	    0x0000:  3030 3530 3536 3936 3932 3536
&quot;&quot;&quot;
result = search(packet_re, packet1, DOTALL)

The regex matches, but it fails to capture Framed-IP-Address Attribute, client_ip=10.10.10.11. The thing is Framed-IP-Address Attribute can or cannot come in the packet. Hence the pattern is enclosed in another capture group ending with ? meaning 0 or 1 occurrence.

I should be able to ignore it when it doesn't come. Hence packet content can also be:

packet2 = &quot;&quot;&quot;
IP (tos 0x0, ttl 64, id 60162, offset 0, flags [DF], proto UDP (17), length 163)
    20.20.20.1.54035 &gt; 20.20.20.2.1813: [udp sum ok] RADIUS, length: 135
	Accounting-Request (4), id: 0x01, Authenticator: 219b694bcff639221fa29940e8d2a4b2
	  User-Name Attribute (1), length: 14, Value: 005056962f54
	    0x0000:  3030 3530 3536 3936 3266 3534
&quot;&quot;&quot;

The regex should ignore Framed-IP-Address in this case. It does ignore but it doesn't capture when it does come.

答案1

得分: 2

我建议使用以下正则表达式模式:

RADIUS.*?Accounting[\s-]Request(?:.*?(Framed[\s-]IP[\s-]Address.*?Attribute(?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))?))?.*User-?[nN]ame[\s-]Attribute.*?Value:\s*(?P<username>\S+)

你可以在正则表达式演示中查看示例。

注意,我已经删除了模式两端的 .*,因为你使用的是 re.search,它不需要在字符串的开头进行匹配,而且 MatchData 对象包含了 .string 属性,你可以使用它来获取整个输入字符串。

详细说明

  • RADIUS - 一个单词

  • .*? - 任意零个或多个字符,尽可能少

  • Accounting - 一个单词

  • [\s-] - 一个空格或连字符

  • Request - 一个单词

  • (?:.*? - 开始一个可选的非捕获组:任意零个或多个字符,尽可能少,然后...

    • (Framed[\s-]IP[\s-]Address.*?Attribute - 第1组:Framed + 一个空格或连字符 + IP + 空格/连字符 + Address + 任意零个或多个字符,尽可能少 + Attribute
      • (?:.*?Value: (?P<client_ip>\d+\.\d+\.\d+\.\d+))? - 一个可选的非捕获组,匹配任意零个或多个字符,尽可能少,然后 Value: ,然后 Group "client_ip":四个一个或多个数字匹配模式,用点号分隔
    • ) - 第1组结束
  • )? - 外部非捕获组结束

  • .* - 任意零个或多个字符,尽可能多

  • User-?[nN]ame - UsernameUserNameUser-name/User-Name

  • [\s-] - 空格或连字符

  • Attribute - 一个单词

  • .*? - 任意零个或多个字符,尽可能少

  • Value: - 一个字面字符串

  • \s* - 零个或多个空格

  • (?P<username>\S+) - Group "username":一个或多个非空格字符

英文:

I suggest using

RADIUS.*?Accounting[\s-]Request(?:.*?(Framed[\s-]IP[\s-]Address.*?Attribute(?:.*?Value: (?P&lt;client_ip&gt;\d+\.\d+\.\d+\.\d+))?))?.*User-?[nN]ame[\s-]Attribute.*?Value:\s*(?P&lt;username&gt;\S+)

See the regex demo.

Note I removed .* on both ends of the pattern as you are using re.search that does not require matching at the start of string like re.match, and the MatchData object contains .string property that you can access to obtain the whole input string.

Details

  • RADIUS - a word
  • .*? - any zero or more chars, as few as possible
  • Accounting - a word
  • [\s-] - a whitespace or hyphen
  • Request - a word
  • (?:.*? - start of an optional non-capturing group: any zero or more chars as few as possible, then...
    • (Framed[\s-]IP[\s-]Address.*?Attribute - Group 1: Framed + a whitespace or a hyphen + IP + whitespace/hyphen + Address + any zero or more chars as few as possible + Attribute
      • (?:.*?Value: (?P&lt;client_ip&gt;\d+\.\d+\.\d+\.\d+))? - an optional non-capturing group matching any zero or more chars as few as possible + Value: + Group "client_ip": four one or more digit matching patterns separated with a literal dot
    • ) - end of the Group 1
  • )? - end of the outer non-capturing group
  • .* - any zero or more chars as many as possible
  • User-?[nN]ame - Username, UserName or User-name/User-Name
  • [\s-] - whitespace or hyphen
  • Attribute - a word
  • .*? - any zero or more chars as few as possible
  • Value: - a literal string
  • \s* - zero or more whitespaces
  • (?P&lt;username&gt;\S+) - Group "username": one or more non-whitespace chars

huangapple
  • 本文由 发表于 2023年6月26日 15:50:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554603.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定