2020年10月8日 21:33:35go评论78阅读模式

英文:

Split on capitalized words not between underscores

问题

给定以下字符串：ThisIsA_SimpleTest_Case

我想要在所有不在下划线之间且在下划线之间的大写单词上进行分割，并在下划线之间的字符串的第一个下划线上进行分割。

预期的分割结果：This Is A SimpleTest Case

我想出了以下在 Java 正则表达式中不起作用的正则表达式：

(?=_[a-zA-Z]*_|[A-Z])

但显然这不起作用，因为它是一个或（or）而不是一个与（and）。而且这会在下划线之间的所有大写单词上进行分割，而我想要忽略这一点。

英文:

Given the following string: ThisIsA_SimpleTest_Case

I want to split on all capitalized words not between underscores and on the first underscore of a string between underscores.

The expected splitted result: This Is A SimpleTest Case

I came up with the following none working regex, for the Java regex flavor:

(?=_[a-zA-Z]*_|[A-Z])

But this ofcourse doesn't work since it's an or and not an and. Also this splits on all capitalized words within underscores which is something I want to ignore.

答案1

得分: 1

Wiktor是对的，尝试匹配而不是拆分你不想要的部分应该更容易。

但因为这是一个有趣的挑战，我得到了一个可以按照你想要的方式拆分它的方法。
_|(?<!_)(?=[A-Z])(?=[^_]*(?:_[^_]*_[^_]*)*[^_]*$)

它也适用于多组下划线。
（当然可以进行改进，我可能会尝试简化它）

思路是：

_| 在任何下划线处进行拆分，并从最终列表中移除它。
(?<!_) 不是紧跟在下划线后面。如果不这样做，拆分后可能会得到空匹配（这些情况已经由 _| 处理）。如果您不关心这一点，可以跳过此步骤。
(?=[A-Z]) 在大写字母前进行拆分。
(?=[^_]*(?:_[^_]*_[^_]*)*[^_]*$) 但在拆分前必须跟随偶数个下划线。如果有奇数个下划线，意味着您位于两个下划线之间，不应进行拆分。我假设字符串中不会出现奇数个下划线。

在 https://regex101.com/r/Iov1Yl/1/ 进行测试。

英文:

Wiktor is right, it should be easier to try to match instead of splitting on what you don't want.

But because it's a fun challenge, I got one that will split it like you wanted.
_|(?<!_)(?=[A-Z])(?=[^_]*(?:_[^_]*_[^_]*)*[^_]*$)

Also works with multiple pairs of underscores.
(It can certainly be improved, I might try to simplify it)

The idea is :

_| Split on any underscore removing it from the final list.
(?<!_) Not right after an underscore. If you don't do that, you might get empty matches after the split (cases already handled by the _|). Can be skipped if you don't care.
(?=[A-Z]) Split before capital letters.
(?=[^_]*(?:_[^_]*_[^_]*)*[^_]*$) But it must be followed by an even number of underscores. If there are an odd number, it means you're between 2 and it should not split. I assume there can't be an odd number of underscores in the string.

Test at https://regex101.com/r/Iov1Yl/1/

答案2

得分: 1

你可能会在以下情况下进行分割：

(?=(?<!_)[A-Z](?![A-Za-z]*_)) 如果它是一个位置，一个 A-Z 字符不直接在 _ 之前，并且右侧没有 _
| 或者
(?<!_[A-Za-z]{0,1000}|^)(?=[A-Z]) 如果它是一个位置，在左侧的内容不是下划线或字符串的开头，并且右侧直接是一个 A-Z 字符
| 或者
_ 匹配下划线

示例代码：

String regex = "(?=(?&lt;!_)[A-Z](?![A-Za-z]*_))|(?&lt;!_[A-Za-z]{0,1000}|^)(?=[A-Z])|_";
String str = "ThisIsA_SimpleTest_Case";
String[] parts = str.split(regex);

for (String part : parts)
    System.out.println(part);

输出：

This
Is
A
SimpleTest
Case

英文:

You might split on:

(?=(?&lt;!_)[A-Z](?![A-Za-z]*_))|(?&lt;!_[A-Za-z]{0,1000}|^)(?=[A-Z])|_

(?=(?<!_)[A-Z](?![A-Za-z]*_)) If it is a position where a char A-Z is not directly preceded by _ and has no _ at the right
| Or
(?<!_[A-Za-z]{0,1000}|^)(?=[A-Z]) If it is a position where what is at the left is not an underscore or the start of the string, and what is directly at the right is a char A-Z
| Or
_ Match an underscore

Regex demo | Java demo

Example code

String regex = &quot;(?=(?&lt;!_)[A-Z](?![A-Za-z]*_))|(?&lt;!_[A-Za-z]{0,1000}|^)(?=[A-Z])|_&quot;;
String str = &quot;ThisIsA_SimpleTest_Case&quot;;
String[] parts = str.split(regex);

for (String part : parts)
	System.out.println(part);

Output

This
Is
A
SimpleTest
Case

答案3

得分: 1

在分割之前的另一种方法：

在分割之前对字符串进行了更改，查看上下文：

public static void main(String[] args) {
    String input = "ThisIsA_SimpleTest_Case";
    String inputReplace1 = input.replaceAll("_(\\w+[a-z])([A-Z]\\w+)_", ",$1#$2");
    String inputReplace2 = inputReplace1.replaceAll("(?<=[a-z])(?=[A-Z])", ",");
    String inputReplace3 = inputReplace2.replaceAll("#", "");
    System.out.println(Arrays.asList(inputReplace3.split(",")));
}

输出：

[This, Is, A, SimpleTest, Case]

英文:

Another approach before split:

The string is changed before split, see context:

public static void main(String[] args) {
    String input = &quot;ThisIsA_SimpleTest_Case&quot;;
    String inputReplace1 =  input.replaceAll(&quot;_(\\w+[a-z])([A-Z]\\w+)_&quot;, &quot;,$1#$2&quot;);
    String inputReplace2 = inputReplace1.replaceAll(&quot;(?&lt;=[a-z])(?=[A-Z])&quot;, &quot;,&quot;);
    String inputReplace3 = inputReplace2.replaceAll(&quot;#&quot;, &quot;&quot;);
    System.out.println(Arrays.asList(inputReplace3.split(&quot;,&quot;)));
}

Output:

[This, Is, A, SimpleTest, Case]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据大写字母分割，但不在下划线之间分割。

问题

答案1

答案2

答案3

如何在传统的Android设备上测试EMDK功能？

在Java中将双精度值向上取整。

如何在Java中旋转一个由空格分隔的整数字符串？

在Java中使用FFT对.wav文件进行频谱图生成，但未产生预期输出。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论