英文:
Matching numbers patterns using regular expressions (regex)
问题
以下是翻译好的部分:
需要您的友善帮助,使用正则表达式匹配一些数字模式。
我有成千上万个10位数字,按照以下模式需要提取它们。
注意 - 我不需要数字之间有空格。
模式1:
数字:3527432432
让我们考虑上面的数字是
ABCD XYZ XYZ 模式
模式2:
数字:3527898989
ABCD XY XY XY
模式3:
数字:3535358745
XY XY XY ABCD
模式4:
数字:5432888999
ABCD XXX YYY
模式5:
数字:5432888899
ABCD XXXX YY
模式6:
数字:5432334422
ABCD XX YY ZZ
非常感谢在任何方式上的任何帮助。
对于像
5432888899
这样的模式,我正在使用类似的正则表达式
\d\d\d\d8888\d\d
然后通过将8888更改为其他数字(例如1111)来手动查找匹配的数字。
英文:
Need your kind help in matching some numbers patterns using regex.
I have thousands of 10-digit numbers in the following patterns and need to extract them according to their patterns.
Note - I don't need spaces in between.
Pattern 1:
Number: 3527 432 432
Let's consider the above number as
ABCD XYZ XYZ pattern
Pattern 2:
Number:
3527 89 89 89
ABCD XY XY XY
Pattern 3:
Number:
35 35 35 8745
XY XY XY ABCD
Pattern 4:
5432 888 999
ABCD XXX YYY
Pattern 5:
5432 8888 99
ABCD XXXX YY
Pattern 6
5432 33 44 22
ABCD XX YY ZZ
Any kind of help in any manner is much appreciated.
I am a complete beginner in regex and know basic things.
For patterns like
5432 8888 99
I am using regex like
\d\d\d\d8888\d\d
and then manually find the matching numbers from the list by changing 8888 to other digits like 1111.
答案1
得分: 0
以下是已经翻译好的内容:
要检查是否有相同的符号,您可以使用backreferences。
简而言之,像 (.)\1
这样的正则表达式将匹配连续两个相同的符号,而 (.)(.)\1\2
将匹配类似 abab
的情况。
以下是针对您的情况的表达式:
\d{4}(\d{3})\1
\d{4}(\d{2})\1{2}
(\d{2})\1{2}\d{4}
\d{4}(\d)\1{2}(\d)\2{2}
\d{4}(\d)\1{3}(\d)\2
\d{4}(\d)\1(\d)\2(\d)\3
对于第一个表达式的解释:
\d{4}
匹配任意四个数字,(\d{3})
匹配任意三个数字,并将它们捕获到第一组中,\1
匹配第一组中的确切内容。
希望基于这个解释和关于backreferences如何工作的一般描述,其他表达式应该相当清晰。
第一个表达式的演示在此处。
英文:
To check for same symbols, you can use backreferences.
In short, regex like (.)\1
will match two same symbols in a row, and (.)(.)\1\2
will match occurrences like abab
.
Here expressions for your cases:
\d{4}(\d{3})\1
\d{4}(\d{2})\1{2}
(\d{2})\1{2}\d{4}
\d{4}(\d)\1{2}(\d)\2{2}
\d{4}(\d)\1{3}(\d)\2
\d{4}(\d)\1(\d)\2(\d)\3
Explanation for the firs one:
\d{4}
matches any for digits,(\d{3})
matches any three digits, and captures them into group #1,\1
matches exact content of the group #1.
I hope that based on this explanation and general description how backreferences work, others expressions should be pretty clear.
Demo for the first one here.
答案2
得分: -2
以下是您要求的翻译内容:
最简单的方法是为每种格式编写一个模式。
然后,使用|
字符将它们连接起来。
模式1和4。
\d{4} \d{3} \d{3}
模式2和6。
\d{4} \d{2} \d{2} \d{2}
模式3。
\d{2} \d{2} \d{2} \d{4}
模式5。
\d{4} \d{4} \d{2}
最终模式将是,
\d{4} \d{3} \d{3}|\d{4} \d{2} \d{2} \d{2}|\d{2} \d{2} \d{2} \d{4}|\d{4} \d{4} \d{2}
或者,简化为,
\d{4}(?: \d{3}){2}|\d{4}(?: \d{2}){3}|(?:\d{2} ){3}\d{4}|(?:\d{4} ){2}\d{2}
然后,您可以使用 Pattern 和 Matcher 类来获取每个值。
随后,使用 String#replace 方法来删除空格。
这假设它们在文本中,并且由某些其他字符分隔。
如果值是连续的而不是分隔的,我不建议依赖这个模式。
这是输出示例:
3527 432 432 = 3527432432
3527 89 89 89 = 3527898989
35 35 35 8745 = 3535358745
5432 888 999 = 5432888999
5432 8888 99 = 5432888899
5432 33 44 22 = 5432334422
以下是有关正则表达式的 Wikipedia 文章链接:
Wikipedia – 正则表达式。
英文:
The simplest approach would be to write a pattern for each of the formats.
And then, append them with the |
character.
Pattern 1 and 4.
\d{4} \d{3} \d{3}
Pattern 2 and 6.
\d{4} \d{2} \d{2} \d{2}
Pattern 3.
\d{2} \d{2} \d{2} \d{4}
Pattern 5.
\d{4} \d{4} \d{2}
The final pattern would be,
\d{4} \d{3} \d{3}|\d{4} \d{2} \d{2} \d{2}|\d{2} \d{2} \d{2} \d{4}|\d{4} \d{4} \d{2}
Or, simplified to,
\d{4}(?: \d{3}){2}|\d{4}(?: \d{2}){3}|(?:\d{2} ){3}\d{4}|(?:\d{4} ){2}\d{2}
You could then use the Pattern and Matcher classes to obtain each value.
Subsequently, use the String#replace method to remove the spaces.
This presumes they are within a text, and are each delimited by some other character.
I wouldn't rely on this pattern, if the values are sequential and not delimited.
String string = "3527 432 432, 3527 89 89 89, 35 35 35 8745, 5432 888 999, 5432 8888 99, 5432 33 44 22";
Pattern pattern = Pattern.compile("\\d{4}(?: \\d{3}){2}|\\d{4}(?: \\d{2}){3}|(?:\\d{2} ){3}\\d{4}|(?:\\d{4} ){2}\\d{2}");
Matcher matcher = pattern.matcher(string);
while (matcher.find())
System.out.printf("%-20s = %s%n", matcher.group(), matcher.group().replace(" ", ""));
Output
3527 432 432 = 3527432432
3527 89 89 89 = 3527898989
35 35 35 8745 = 3535358745
5432 888 999 = 5432888999
5432 8888 99 = 5432888899
5432 33 44 22 = 5432334422
Here is a link to the Wikipedia article on regular expressions.
Wikipedia – Regular expression.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论