正则表达式以查找数字序列?

huangapple go评论81阅读模式
英文:

Regex expression to find number sequence?

问题

我正在尝试使用正则表达式从以下文本主体中提取票号:

TICKET #IM40135514 OPENED

在这种情况下,应该返回

IM40135514

我不确定正确的正则表达式是什么。我尝试过

number = re.findall("TICKET (\w{2}\d{7}+))", filetext)

但一直出现错误。

英文:

I am trying to use regex to return the ticket number from the following body of text:

TICKET #IM40135514 OPENED

In this case, would return

IM40133514

I am not sure what the proper regex expression would be. I tried

number=re.findall("TICKET (\w{2}d{7}+))", filetext)

but keep getting an error.

答案1

得分: 1

使用re.findall 在这里应该没问题:

inp = "TICKET #IM40135514 OPENED"
nums = re.findall(r'\bTICKET #(\S+)', inp)
print(nums)  # ['IM40135514']

请注意,我在正则表达式模式中使用了原始字符串,这通过前缀r来指示。

英文:

Using re.findall should be fine here:

<!-- language: python -->

inp = &quot;TICKET #IM40135514 OPENED&quot;
nums = re.findall(r&#39;\bTICKET #(\S+)&#39;, inp)
print(nums)  #[&#39;IM40135514&#39;]

Note that I am using a raw string for the regex pattern, which is indicated with a prefix of r.

答案2

得分: 1

代码部分不要翻译:

s = "TICKET #IM40135514 OPENED"
ticket = s.split()[1].replace("#", "")
print(ticket)

翻译结果:

IM40135514
英文:

Regex is not required here. You can just split() the text by a space, grab the middle string, and remove the "#".

s = &quot;TICKET #IM40135514 OPENED&quot;
ticket = s.split()[1].replace(&quot;#&quot;, &quot;&quot;)
print(ticket)

and the ticket # is returned,

IM40135514

答案3

得分: 0

  1. 除非使用原始字符串,否则需要双重转义,以便正则表达式引擎获取完整的转义序列。r"\w" 等同于 "\w"
  2. 在你的表达式中缺少一个 #。
  3. 你需要转义 \w{2}\d{7} 中的 d。
    • 表示重复一次或多次,并不会在 {7} 后编译。

建议使用 regex101.com 来构建你的表达式,它会将它们分解并向你解释。

表达式 r"TICKET #(\w{2}\d{7})" 与你的要求非常接近,可能适合你的需求。请注意,\w 匹配数字,所以如果你需要特定的字母,可以使用 [a-zA-Z]{2}

英文:

There's a few things wrong with your expression.

  1. Unless using a raw string, you'll need to double escape so that the regex engine gets the full escape sequence. r&quot;\w&quot; is equivalent to &quot;\\w&quot;.
  2. You're missing a # in your expression.
  3. You need to escape the d in \w{2}\d{7}.
  4. The + means it repeats one or more times and doesn't compile after {7} like that.

Recommend using regex101.com to build your expressions as it breaks them down and explains them to you.

The expression r&quot;TICKET #(\w{2}\d{7})&quot; closely matches what you had and might work for you. Note that \w matches numbers as well so if you specifically want letters, you can use [a-zA-Z]{2}.

huangapple
  • 本文由 发表于 2023年2月27日 10:58:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75576422.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定