2023年6月1日 22:58:42go评论70阅读模式

英文:

Removing all occurrences of the specified substring, even overlapping ones

问题

例如，源字符串是"appleappleapplebanana"，我想要删除的模式是"appleapple"。

我希望它能删除所有"appleapple"，即使它们重叠，这样只剩下"banana"。

如果我使用replaceAll，结果是"applebanana"，因为删除第一个后，剩下的部分就是"applebanana"。

期望结果：

输入	模式	结果
"appleapplebanana"	"appleapple"	"banana"
"appleapplebanana"	"appleapple"	"banana"
"appleappleapplebanana"	"appleapple"	"banana"
"applebanana"	"appleapple"	"applebanana"
"aaabbbaaabbbaaa"	"aaabbbaaa"	""(空字符串)

我需要处理任意输入模式，所以仅仅使用replace("apple")是行不通的。

尽管我有一个想法：

获取所有出现的位置（使用类似KMP的方法）
将对应的字符标记为"待删除"
删除标记的字符

不过，我想知道是否有更好（更高级的）的方法来实现这个功能。

我最终根据上面的思路编写了自己的函数，因为似乎没有常见的库或包支持这个功能。

英文:

For example, the source string is "appleappleapplebanana" and pattern I want to delete "appleapple".

I want it to delete all "appleapple" even if they overlap, so that only "banana" is left.

appleappleapplebanana
^^^^^^^^^^              &lt;-first  occurrence
     ^^^^^^^^^^         &lt;-second occurrence

If I use replaceAll, the result is "applebanana" since after deleting the first one, the remaining part is just "applebanana".

Expected results:

Input	Pattern	Result
"appleapplebanana"	"appleapple"	"banana"
"appleapplebanana"	"appleapple"	"banana"
"appleappleapplebanana"	"appleapple"	"banana"
"applebanana"	"appleapple"	"applebanana"
"aaabbbaaabbbaaa"	"aaabbbaaa"	""(empty string)

I need to process arbitrary input patterns, so just using replace("apple") wouldn't work.

Though I have an idea for this:

Get all occurences (using something like KMP)
Mark corresponding characters as "to-be deleted"
Delete marked characters

However, I would like to know if there is a better (<s>fancier</s> ready made) way to achieve this.
<br>
<br>
<br>

I ended up making my own function using the idea above, since no common libraries nor packages seems to support this feature.

答案1

得分: 2

这个问题一开始有点令人困惑。在更新之后，我认为最好的示例来说明这个问题是在aaabbbaaabbbaaa中匹配"pattern" aaabbbaaa。

aaabbbaaabbbaaa
aaabbbaaa
      aaabbbaaa
      ^-^        < 重叠部分
^-------------^  < 匹配这部分：'aaa' 重叠

如果可以在正则表达式中使用"pattern"字符串的长度，则可以使用回顾后查找：

.{1,9}(?<=aaabbbaaa)

这个正则表达式（演示）将匹配从一个到字符串长度的字符，只要aaabbbaaa在前面。这将匹配aaabbbaaa，但也会匹配bbbaaa，因为最后一个a也是由aaabbbaaa前导的，并且由于长度限制，它不会跳过任何其他子字符串。它还会在aaabbbaaaaaabbbaaa中匹配不重叠部分，但会在aaabbbaaacccaaabbbaaa中保留例如ccc。

在tio.run上的Java演示中，包括长度：

String regex = ".{1," + pat.length() + "}(?<=(" + pat + "))";
Pattern p = Pattern.compile(regex);
String result = p.matcher(str).replaceAll("");

更新，包括部分@markalex的想法：为了提高性能，特别是对于较长的输入，首先匹配一次"pattern"，然后将回顾后查找部分包装到一个重复的组中（regex101演示）。

aaabbbaaa(?:.{1,9}(?<=aaabbbaaa))*

这也将导致获得相邻部分的一次匹配，这也可能是所需的。此外，如果输入包含非单词字符，您可以使用\w（单词字符）代替点。

英文:

The question was a bit confusing at first. After the updates I think the best provided example to illustrate the problem is matching the "pattern" aaabbbaaa in aaabbbaaabbbaaa.

aaabbbaaabbbaaa
aaabbbaaa
      aaabbbaaa
      ^-^        &lt; overlapping part
^-------------^  &lt; match this part: &#39;aaa&#39; is overlapping

If length of the "pattern"-string may be used in the regex, a lookbehind could be used:

.{1,9}(?&lt;=aaabbbaaa)

This regex (demo) will match from one to the strings length characters as long as aaabbbaaa is behind. So that will match aaabbbaaa but also bbbaaa because the last a is also preceded by aaabbbaaa and due to the length restriction it will not skip over any other substring. It will also match non-overlaps in aaabbbaaaaaabbbaaa but leave e.g. ccc in aaabbbaaacccaaabbbaaa.

A Java demo at tio.run with incorporating the length:

String regex = &quot;.{1,&quot; + pat.length() + &quot;}(?&lt;=&quot; + pat + &quot;)&quot;;
Pattern p = Pattern.compile(regex);
String result = p.matcher(str).replaceAll(&quot;&quot;);

Update including parts of @markalex idea: For better performance, especially with longer inputs first match the "pattern" once and wrap the lookbehind part into a repeated group (regex101 demo).

aaabbbaaa(?:.{1,9}(?&lt;=aaabbbaaa))*

This will also lead to getting one match for the adjacent parts which might be desired anyways. Further you can use \w (word character) instead of the dot if input contains non-word characters.

答案2

得分: 0

这在技术上是重叠的。

appleapple
appleappleappleapple
appleapple


而这是重复的。

```none
appleapple
     appleapple
          appleapple

尽管如此，您可以将后者称为“具有重叠”。

这在本质上不是被视为具有重复特性的模式的属性。

在这一点上，它是固有的 - 冗余的 - 它只是一种描述。

除了String#replace之外，还有String#replaceAll。

它使用正则表达式模式作为第一个参数。

您可以使用以下模式来替换具有重叠的重复值。

(apple)+

replaceAll(&quot;(apple)\\1+&quot;, &quot;&quot;)

我不确定是否有一种方法可以使用单一模式删除重叠的值。

我想这会更加复杂。

您提到了“...标记相应的字符为 '待删除'”。

这很可能是删除真正重叠值的逻辑方式。


<details>
<summary>英文:</summary>

Technically, this is over-lapping.

appleapple
appleappleappleapple
appleapple


And, this is repeating.

```none
appleapple
     appleapple
          appleapple

Although, you could refer to the latter as, having over-lapped.
Which, intrinsically, is not a property of a pattern that is considered to have a repeating quality.
It would be inherent at that point—redundant—it's just a description.

In addition to String#replace there is also String#replaceAll.
It uses a regular expression pattern as the first argument.

You could use the following pattern to replace repeating values that have over-lapped.

(apple)+

replaceAll(&quot;(apple)\\1+&quot;, &quot;&quot;)

I'm not sure if there is a way to remove over-lapping values using a single pattern.
I imagine it would be much more complex.

You mentioned "... mark corresponding characters as 'to-be deleted'".
This would most likely be the logical way to remove truly over-lapping values.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

移除所有指定子字符串的出现，即使它们是重叠的。

问题

答案1

答案2

为什么布尔值没有被识别？

Java Spring WebClient如何从响应体中获取属性并设置给定的类？

为什么我的程序在Java中没有输出就终止？

Java：如何对从方法返回的列表进行写入/读取？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论