2023年2月27日 02:19:55go评论109阅读模式

英文:

How to remove consecutively repeated strings only if this strings are in the middle of "((VERB)" and ")"?

问题

在这段代码中，您已经成功地移除了位于((VERB)和)之间的连续重复的字符串"a nosotros"。虽然代码能够编辑输入字符串，但您似乎有一些疑问或需要进一步的修改。如果您需要进一步的修改或有其他问题，请提供更多具体的信息，以便我能够提供更多帮助。

英文:

import re
input_text = &quot;((VERB) saltar a nosotros a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar a nosotros a nosotros)&quot;
input_text = re.sub(r&quot;\(\(VERB\)&quot; + r&quot;((?:\w\s*)+)&quot; + r&quot;\)&quot;, 
                    lambda x: re.sub(r&quot;(a nosotros)\s*+&quot;, r&quot;&quot;, x.group()), 
                    input_text)
print(input_text) # --&gt; output

In this code I was trying to remove consecutively repeated "a nosotros" strings only if this strings are in the middle of "((VERB)" and )", that is, that string that captures the capturing group r"\(\(VERB\)" + r"((?:\w\s*)+)" + r"\)"

This is the output you should be getting when running this script:

&quot;((VERB) saltar a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar a nosotros)&quot;

Although the code that I have placed in the question does edit the input string, what should i modify?

答案1

得分: 1

你可以使用以下代码：

input_text = re.sub(r"\(\(VERB\)[\w\s]*\)",  lambda x: re.sub(r"\ba nosotros(?:\s+a nosotros)*\b", "a nosotros", x.group()), input_text)

主要的模式是 \(\(VERB\)[\w\s]*\)，它匹配 ((VERB) 后面跟着零个或多个单词字符或空格字符，然后是一个 ) 字符。

re.sub(r"\ba nosotros(?:\s+a nosotros)*\b", "a nosotros", x.group()) 部分会移除匹配中所有连续的完整单词 a nosotros。

英文:

You can use

input_text = re.sub(r&quot;\(\(VERB\)[\w\s]*\)&quot;,  lambda x: re.sub(r&quot;\ba nosotros(?:\s+a nosotros)*\b&quot;, &quot;a nosotros&quot;, x.group()), input_text)

The main pattern is \(\(VERB\)[\w\s]*\), it matches ((VERB) + zero or more word or whitespace chars and then a ) char.

The re.sub(r"\ba nosotros(?:\s+a nosotros)*\b", "a nosotros", x.group()) part removes all consecutive whole words a nosotros inside the match.

答案2

得分: 1

Python的可选正则表达式引擎模块（由Matthew Barnett开发）支持\K指令，它将报告的匹配起始点重置为当前字符串指针位置，并丢弃先前消耗的字符从最终匹配中。通过使用该指令，可以简单地将字符串中的匹配替换为空字符串。

以下是执行此操作的代码。

import regex
rgx = r"\(\(VERB\)(?:(?!\ba nosotros\b|\)).)*\K\ba nosotros\b(?=[^)]*\ba nosotros\b)"
txt_in = "((VERB) saltar a nosotros a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar a nosotros a nosotros)"
txt_out = regex.sub(rgx, '', txt_in)
print(txt_out)
-> ((VERB) saltar  a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar  a nosotros)

正则表达式的拆解如下。

\(\(VERB\)          # 匹配文字
(?:                 # 开始非捕获组
  (?!               # 开始负向前瞻
    \ba nosotros\b  # 匹配由单词边界包围的文字
    |               # 或
    \)              # 匹配文字
  )                 # 结束负向前瞻
  .                 # 匹配任何字符，除了行终止符
)*                  # 结束非捕获组，执行零次或多次
\K                  # 请参阅本答案的第一段
\ba nosotros\b      # 匹配由单词边界包围的文字
(?=                 # 开始正向前瞻
  [^)]*             # 匹配任何字符，除了')'，零次或多次
  \ba nosotros\b    # 匹配由单词边界包围的文字
)                   # 结束正向前瞻

Python演示 <-> 正则表达式演示

一次匹配一个字符的技巧（在此为(?:(?!\ba nosotros\b|\)).)）被称为温和贪婪令牌解决方案。

英文:

Python's optional regex engine module (developed by Matthew Barnett) supports the \K directive, which resets the starting point of the reported match to the current string pointer locations and discards any previously consumed characters from the final match. By employing that directive one can simply replace matches in the string with empty strings.

The code for doing that is as follows.

import regex
rgx = r&quot;\(\(VERB\)(?:(?!\ba nosotros\b|\)).)*\K\ba nosotros\b(?=[^)]*\ba nosotros\b)&quot;
txt_in = &quot;((VERB) saltar a nosotros a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar a nosotros a nosotros)&quot;
txt_out = regex.sub(rgx, &#39;&#39;, txt_in)
print(txt_out)
-&gt; ((VERB) saltar  a nosotros) a nosotros a nosotros a nosotros ((VERB)correr a nosotros) sdsdsd ((VERB) saltar  a nosotros)

The regular expression can be broken down as follows.

\(\(VERB\)          # match literal
(?:                 # begin non-capture group
  (?!               # begin negative lookahead
    \ba nosotros\b  # match literal surrounded by word boundaries
    |               # or 
    \)              # match literal 
  )                 # end of negative lookahead
  .                 # match any character other than a line terminator
)*                  # end non-capture group and execute zero or more times
\K                  # see the first paragraph of this answer
\ba nosotros\b      # match literal surrounded by word boundaries
(?=                 # begin positive lookahead
  [^)]*             # match any characters other than &#39;)&#39; zero or more times
  \ba nosotros\b    # match literal surrounded by word boundaries
)                   # end positive lookahead

Python demo<-\(ツ)/->Regex demo

The technique of matching one character at a time with a negative lookahead (here (?:(?!\ba nosotros\b|\)).)) is called the tempered greedy token solution.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何仅删除连续重复的字符串，前提是这些字符串位于”((VERB)”和”)”之间？

问题

答案1

答案2

在整个API响应字符串中，我想在Flutter中只获取特定的单词，如何做到？

在PyTorch中，是否可以通过系数来冻结一个模块？

Context 不会发送到 render_to_string Python/Django。

如何合并年份和月份列并添加日期

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。