使用正则表达式提取第二次出现任何类型大写字母之后的所有内容。

huangapple go评论72阅读模式
英文:

Extract everything after second occurence of any kind of capital letter using regex

问题

我需要帮助构建一个使用Java正则表达式的代码,它可以提取任何大写字母的第二次出现之后的所有内容。

例如,对于示例字符串“98A02D1”,结果应该是“D1”。不管在第二次出现大写字母之前出现了什么。

我想到的是这个:(?:.*?[A-Z]){2}(.*)。

但它不起作用,只捕获从第三次出现开始的部分。将花括号中的数字降低也没有帮助。感激任何帮助!
英文:

I need help with building a regex with Java flavor, which extracts everything after second occurence of any kind of capital letter.

For example, for the sample string "98A02D1" the result should be D1. Doesn't matter what comes before the second occurence of a capital letter.

What I came up with is this: (?:.*?[A-Z]){2}(.*)

Which doesn't work and only captures from the third occurence. Lowering the number in curly brackets doesn't help. Appreciate any help!

答案1

得分: 2

以下是翻译好的部分:

你可以从第二个大写字母开始匹配。或者你可以匹配在第二个大写字母之前的任何内容并将其替换为空白。

以下正则表达式将匹配应该被移除的部分:

^[^A-Z]*[A-Z][^A-Z]*

正则表达式解释

  • ^:字符串的开头
  • [^A-Z]*:任何非大写字符的序列
  • [A-Z]:一个大写字符
  • [^A-Z]*:任何非大写字符的序列

在你的Java代码中:

String string = "98A02D1";
String pattern = "^[^A-Z]*[A-Z][^A-Z]*";

System.out.println(string.replaceFirst(pattern, ""));

输出

D1

检查正则表达式演示这里,Java演示这里

英文:

You can match from the second uppercase letter onwards. Or you can match anything that comes before the second uppercase letter and replace it with blank.

The following regex will match the part that should be removed:

^[^A-Z]*[A-Z][^A-Z]*

Regex Explanation:

  • ^: start of string
  • [^A-Z]*: any sequence of non-uppercase characters
  • [A-Z]: an uppercase character
  • [^A-Z]*: any sequence of non-uppercase characters

In your Java code:

String string = "98A02D1";
String pattern = "^[^A-Z]*[A-Z][^A-Z]*";

System.out.println(string.replaceFirst(pattern, ""));

Output:

D1

Check the regex demo here and the java demo here.

答案2

得分: 2

Using [A-Z]){2} matches 2 consecutive uppercase chars, which is different from "the second occurrence" as there can be other characters in between in that case.

如果使用 [A-Z]){2},会匹配两个连续的大写字符,这与 "第二次出现" 不同,因为在这种情况下可能会有其他字符在它们之间。

If there has to be a second uppercase char present, you can use a capture group, and use that in the replacement with $1

如果必须存在第二个大写字符,您可以使用捕获组,并在替换中使用 $1

If you don't want to cross newlines, you can exclude those in the negated character class.

如果您不想跨越换行符,可以在否定的字符类中排除它们。

String pattern = (?m)^[^A-Z\\r\\n]*[A-Z][^A-Z\\r\\n]*([A-Z].*);
String string = 98A02D1\n98A02;
System.out.println(string.replaceAll(pattern, $1));

Output

D1
98A02

另一种选项可能是例如使用 \p{Lu} 来匹配任何大写字母

^[^\p{Lu}][\p{Lu}][^\p{Lu}\r\n](\p{Lu})

或者断言右侧有一个大写字符,并在替换中使用空字符串而不是第1组:

^[^A-Z\r\n][A-Z][^A-Z\r\n](?=[A-Z])

英文:

Using [A-Z]){2} matches 2 consecutive uppercase chars, which is different from "the second occurrence" as there can be other characters in between in that case.

If there has to be a second uppercase char present, you can use a capture group, and use that in the replacement with $1

If you don't want to cross newlines, you can exclude those in the negated character class.

^[^A-Z\r\n]*[A-Z][^A-Z\r\n]*([A-Z])

Regex demo | Java demo

String pattern = "(?m)^[^A-Z\\r\\n]*[A-Z][^A-Z\\r\\n]*([A-Z].*)";
String string = "98A02D1\n98A02";
System.out.println(string.replaceAll(pattern, "$1"));

Output

D1
98A02

<hr>

An other option could be for example matching any uppercase letter using \p{Lu}

^[^\p{Lu}]*[\p{Lu}][^\p{Lu}\r\n]*(\p{Lu})

Or asserting an uppercase char to the right, and in the replacement use an empty string instead of group 1:

^[^A-Z\r\n]*[A-Z][^A-Z\r\n]*(?=[A-Z])

答案3

得分: 1

以下是您要翻译的内容:

我能够使用以下正则表达式匹配第二次出现的内容。

第一组是正向回顾,尝试匹配任何大写字母。
第二组是捕获组,用于匹配任何大写字母,后跟一个单字符。
在这两个组之间是.*?模式,它将尝试匹配任何字符,0次或更多次。

然后,您可以使用以下方法返回值。

String secondOccurrence(String string) {
    Pattern pattern = Pattern.compile("(?<=[A-Z]).*?([A-Z].)");
    Matcher matcher = pattern.matcher(string);
    if (matcher.find()) 
        return matcher.group(1);
    return null;
}

或者,您可以创建一个for循环,在找到第二次出现时中断。

String secondOccurrence(String string) {
    int index = 0;
    boolean first = false;
    for (char character : string.toCharArray()) {
        if (String.valueOf(character).matches("[A-Z]")) {
            if (!first) first = true;
            else break;
        }
        index++;
    }
    return string.substring(index, index + 2);
}
英文:

I was able to match the second occurrence using the following.

(?&lt;=[A-Z]).*?([A-Z].)
  • The first group is a positive look-behind, attempting to match any uppercase letter.
  • The second group is a capture, for any uppercase letter, followed by a single character.
  • Between the two groups is a .*? pattern, which will attempt to match any character, 0 or more.

You can then use the following method, to return the value.

String secondOccurrence(String string) {
    Pattern pattern = Pattern.compile(&quot;(?&lt;=[A-Z]).*?([A-Z].)&quot;);
    Matcher matcher = pattern.matcher(string);
    if (matcher.find()) 
        return matcher.group(1);
    return null;
}

Alternatively, you could create a for-loop, breaking when the second occurrence is found.

String secondOccurrence(String string) {
    int index = 0;
    boolean first = false;
    for (char character : string.toCharArray()) {
        if (String.valueOf(character).matches(&quot;[A-Z]&quot;)) {
            if (!first) first = true;
            else break;
        }
        index++;
    }
    return string.substring(index, index + 2);
}

huangapple
  • 本文由 发表于 2023年5月21日 02:52:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76296874.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定