Regex to detect this pattern: something;something=something,something=something… for an unknown number of times

huangapple go评论57阅读模式
英文:

Regex to detect this pattern: something;something=something,something=something... for an unknown number of times

问题

以下是翻译好的部分:

明确规则:

  1. 字符串由两部分组成,用分号分隔。

  2. 第一部分允许包含字母数字字符、短横线、下划线和点号。

  3. 字符串的第二部分包含键值对,键与其值之间用等号分隔,键值对用逗号分隔,我们不知道键值对会重复多少次。

示例:

  • blahblahblah;first=1,second=two
  • bl.hbl-hbl_hbl4hbl4h;first=1,second=two,third=thr33

到目前为止,我提出的最佳正则表达式是([A-Za-z1-9_\-\.]+);(((.+?)(?:,|$))+),显然还远远不正确。我不擅长在正则表达式中使用前瞻、后顾等相对高级的东西,但我希望有一个正则表达式解决这个问题。

如果正则表达式引擎有关,我正在使用PHP 8.1中的兼容Perl的正则表达式引擎。

英文:

Explicit rules:

  1. The string has two parts, separated using a semicolon.

  2. The first part is allowed to have alphanumeric characters, dashes, underscores and dots

  3. The second part of the string contains key-value pairs where key is set to its value using an equality sign and the pairs are comma separated and we don't know how many times they're repeated beforehand

Examples:

  • blahblahblah;first=1,second=two
  • bl.hbl-hbl_hbl4hbl4h;first=1,second=two,third=thr33

The best I've come up with so far is ([A-Za-z1-9_\-\.]+);(((.+?)(?:,|$))+) which is obviously far from correct. I am not good at writing regexps with lookaheads, lookbehinds, and other relatively advanced stuff in regex but I hope that a regex solution exists for this problem.

If regex engine matters, I am using the Perl-compatible regex engine in PHP 8.1

答案1

得分: 3

你可以尝试使用以下正则表达式:

^([\w.-]+);([A-Za-z]\w*=\w+(?:,[A-Za-z]\w*=\w+)*)$

正则表达式解释

  • ^:字符串的开头
  • ([\w.-]+):第一个字符串,由字母数字字符、下划线、破折号和点组成
  • ;:分号
  • ([A-Za-z]\w*=\w+(?:,[A-Za-z]\w*=\w+)*):键值对
    • [A-Za-z]:字母字符
    • \w+:字母数字字符的序列
    • =:等号
    • \w+:字母数字字符的序列
    • (?:,[A-Za-z]\w+=\w+)*:非捕获组,包含可选的下一个键值对
      • ,:逗号
      • [A-Za-z]:字母字符
      • \w+:字母数字字符的序列
      • =:等号
      • \w+:字母数字字符的序列
  • $:字符串的结尾

在此处查看演示链接

英文:

You can try with the following regex:

^([\w.-]+);([A-Za-z]\w*=\w+(?:,[A-Za-z]\w*=\w+)*)$

Regex Explanation:

  • ^: start of string
  • ([\w.-]+): first string, made of alphanumeric characters, underscores, dashes and dots
  • ;: semicolon
  • ([A-Za-z]\w*=\w+(?:,[A-Za-z]\w*=\w+)*): key-value pairs
    • [A-Za-z]: alphabetical character
    • \w+: sequence of alphanumerical characters
    • =
    • \w+: sequence of alphanumerical characters
    • (?:,[A-Za-z]\w+=\w+)*: non-capturing group with the optional next key-value pairs
      • ,: comma
      • [A-Za-z]: alphabetical character
      • \w+: sequence of alphanumerical characters
      • =
      • \w+: sequence of alphanumerical characters
  • $: end of string

Check the demo here.

huangapple
  • 本文由 发表于 2023年5月25日 21:54:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76333075.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定