匹配数字模式使用正则表达式(regex)

huangapple go评论46阅读模式
英文:

Matching numbers patterns using regular expressions (regex)

问题

以下是翻译好的部分:

需要您的友善帮助,使用正则表达式匹配一些数字模式。
我有成千上万个10位数字,按照以下模式需要提取它们。

注意 - 我不需要数字之间有空格。

模式1
数字:3527432432
让我们考虑上面的数字是
ABCD XYZ XYZ 模式

模式2
数字:3527898989
ABCD XY XY XY

模式3
数字:3535358745
XY XY XY ABCD

模式4
数字:5432888999
ABCD XXX YYY

模式5
数字:5432888899
ABCD XXXX YY

模式6
数字:5432334422
ABCD XX YY ZZ

非常感谢在任何方式上的任何帮助。

对于像
5432888899
这样的模式,我正在使用类似的正则表达式
\d\d\d\d8888\d\d
然后通过将8888更改为其他数字(例如1111)来手动查找匹配的数字。

英文:

Need your kind help in matching some numbers patterns using regex.
I have thousands of 10-digit numbers in the following patterns and need to extract them according to their patterns.

Note - I don't need spaces in between.

Pattern 1:
Number: 3527 432 432
Let's consider the above number as
ABCD XYZ XYZ pattern

Pattern 2:
Number:
3527 89 89 89
ABCD XY XY XY

Pattern 3:
Number:
35 35 35 8745
XY XY XY ABCD

Pattern 4:
5432 888 999
ABCD XXX YYY

Pattern 5:
5432 8888 99
ABCD XXXX YY

Pattern 6
5432 33 44 22
ABCD XX YY ZZ

Any kind of help in any manner is much appreciated.

I am a complete beginner in regex and know basic things.

For patterns like
5432 8888 99

I am using regex like
\d\d\d\d8888\d\d

and then manually find the matching numbers from the list by changing 8888 to other digits like 1111.

答案1

得分: 0

以下是已经翻译好的内容:

要检查是否有相同的符号,您可以使用backreferences

简而言之,像 (.)\1 这样的正则表达式将匹配连续两个相同的符号,而 (.)(.)\1\2 将匹配类似 abab 的情况。

以下是针对您的情况的表达式:

  1. \d{4}(\d{3})\1
  2. \d{4}(\d{2})\1{2}
  3. (\d{2})\1{2}\d{4}
  4. \d{4}(\d)\1{2}(\d)\2{2}
  5. \d{4}(\d)\1{3}(\d)\2
  6. \d{4}(\d)\1(\d)\2(\d)\3

对于第一个表达式的解释:

  • \d{4} 匹配任意四个数字,
  • (\d{3}) 匹配任意三个数字,并将它们捕获到第一组中,
  • \1 匹配第一组中的确切内容。

希望基于这个解释和关于backreferences如何工作的一般描述,其他表达式应该相当清晰。

第一个表达式的演示在此处

英文:

To check for same symbols, you can use backreferences.

In short, regex like (.)\1 will match two same symbols in a row, and (.)(.)\1\2 will match occurrences like abab.

Here expressions for your cases:

  1. \d{4}(\d{3})\1
  2. \d{4}(\d{2})\1{2}
  3. (\d{2})\1{2}\d{4}
  4. \d{4}(\d)\1{2}(\d)\2{2}
  5. \d{4}(\d)\1{3}(\d)\2
  6. \d{4}(\d)\1(\d)\2(\d)\3

Explanation for the firs one:

  • \d{4} matches any for digits,
  • (\d{3}) matches any three digits, and captures them into group #1,
  • \1 matches exact content of the group #1.

I hope that based on this explanation and general description how backreferences work, others expressions should be pretty clear.

Demo for the first one here.

答案2

得分: -2

以下是您要求的翻译内容:

最简单的方法是为每种格式编写一个模式。
然后,使用|字符将它们连接起来。

模式1和4。

\d{4} \d{3} \d{3}

模式2和6。

\d{4} \d{2} \d{2} \d{2}

模式3。

\d{2} \d{2} \d{2} \d{4}

模式5。

\d{4} \d{4} \d{2}

最终模式将是,

\d{4} \d{3} \d{3}|\d{4} \d{2} \d{2} \d{2}|\d{2} \d{2} \d{2} \d{4}|\d{4} \d{4} \d{2}

或者,简化为,

\d{4}(?: \d{3}){2}|\d{4}(?: \d{2}){3}|(?:\d{2} ){3}\d{4}|(?:\d{4} ){2}\d{2}

然后,您可以使用 PatternMatcher 类来获取每个值。
随后,使用 String#replace 方法来删除空格。

这假设它们在文本中,并且由某些其他字符分隔。

如果值是连续的而不是分隔的,我不建议依赖这个模式。

这是输出示例:

3527 432 432         = 3527432432
3527 89 89 89        = 3527898989
35 35 35 8745        = 3535358745
5432 888 999         = 5432888999
5432 8888 99         = 5432888899
5432 33 44 22        = 5432334422

以下是有关正则表达式的 Wikipedia 文章链接:
Wikipedia – 正则表达式

英文:

The simplest approach would be to write a pattern for each of the formats.
And then, append them with the | character.

Pattern 1 and 4.

\d{4} \d{3} \d{3}

Pattern 2 and 6.

\d{4} \d{2} \d{2} \d{2}

Pattern 3.

\d{2} \d{2} \d{2} \d{4}

Pattern 5.

\d{4} \d{4} \d{2}

The final pattern would be,

\d{4} \d{3} \d{3}|\d{4} \d{2} \d{2} \d{2}|\d{2} \d{2} \d{2} \d{4}|\d{4} \d{4} \d{2}

Or, simplified to,

\d{4}(?: \d{3}){2}|\d{4}(?: \d{2}){3}|(?:\d{2} ){3}\d{4}|(?:\d{4} ){2}\d{2}

You could then use the Pattern and Matcher classes to obtain each value.
Subsequently, use the String#replace method to remove the spaces.

This presumes they are within a text, and are each delimited by some other character.

I wouldn't rely on this pattern, if the values are sequential and not delimited.

String string = "3527 432 432, 3527 89 89 89, 35 35 35 8745, 5432 888 999, 5432 8888 99, 5432 33 44 22";
Pattern pattern = Pattern.compile("\\d{4}(?: \\d{3}){2}|\\d{4}(?: \\d{2}){3}|(?:\\d{2} ){3}\\d{4}|(?:\\d{4} ){2}\\d{2}");
Matcher matcher = pattern.matcher(string);
while (matcher.find())
    System.out.printf("%-20s = %s%n", matcher.group(), matcher.group().replace(" ", ""));

Output

3527 432 432         = 3527432432
3527 89 89 89        = 3527898989
35 35 35 8745        = 3535358745
5432 888 999         = 5432888999
5432 8888 99         = 5432888899
5432 33 44 22        = 5432334422

Here is a link to the Wikipedia article on regular expressions.
Wikipedia – Regular expression.

huangapple
  • 本文由 发表于 2023年6月30日 00:08:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582818.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定