2023年2月14日 05:13:36go评论156阅读模式

英文:

How to perform string separations using regex as a reference and that a part of the used separator pattern is not removed from the following string?

问题

这是你需要的翻译结果，只包括代码部分：

import re

sentences_list = [
    "El coche ((VERB) es) rojo, la bicicleta ((VERB)est&#225;) all&#237;; el monopat&#237;n ((VERB)ha sido pintado) de color rojo, y el cami&#243;n tambi&#233;n ((VERB)funciona) con cargas pesadas",
    "El &#225;rbol ((VERB es)) grande, las hojas ((VERB)son) doradas y ((VERB)son) secas, los juegos del parque ((VERB)estan) algo oxidados y ((VERB)es) peligroso subirse a ellos"
]

aux_list = []
for i_input_text in sentences_list:

    separator_symbols = r'(?:(?:,|;|\.|)\s*y\s+|,\s*|;\s*)'  # 分隔符正则表达式
    
    pattern = r"\(\(VERB\)\s*\w+(?:\s+\w+)*\)"  # 查找模式的正则表达式
    
    # 使用分隔符将文本拆分为短语
    frases = re.split(separator_symbols, i_input_text)
    
    aux_frases_list = []
    # 在每个分离的短语中查找模式
    for i_frase in frases:
        verbos = re.findall(pattern, i_frase)
        if verbos:
            aux_frases_list.append(i_frase)
    aux_list = aux_list + aux_frases_list
    
sentences_list = aux_list
print(sentences_list)

希望这对你有所帮助。如果你有其他问题，请随时提出。

英文:

import re

sentences_list = [&quot;El coche ((VERB) es) rojo, la bicicleta ((VERB)est&#225;) all&#237;; el monopat&#237;n ((VERB)ha sido pintado) de color rojo, y el cami&#243;n tambi&#233;n ((VERB)funciona) con cargas pesadas&quot;, &quot;El &#225;rbol ((VERB es)) grande, las hojas ((VERB)son) doradas y ((VERB)son) secas, los juegos del parque ((VERB)estan) algo oxidados y ((VERB)es) peligroso subirse a ellos&quot;]

aux_list = []
for i_input_text in sentences_list:

    #separator_symbols = r&#39;(?:(?:,|;|\.|\s+)\s*y\s+|,\s*|;\s*)&#39;
    separator_symbols = r&#39;(?:(?:,|;|\.|)\s*y\s+|,\s*|;\s*)(?:[A-Z]|l[oa]s|la|[e&#233;]l)&#39;
    
    pattern = r&quot;\(\(VERB\)\s*\w+(?:\s+\w+)*\)&quot;
    
    # Separar la frase usando separator_symbols
    frases = re.split(separator_symbols, i_input_text)
    
    aux_frases_list = []
    # Buscar el patr&#243;n en cada frase separada
    for i_frase in frases:
        verbos = re.findall(pattern, i_frase)
        if verbos:
            #print(f&quot;Frase: {i_frase}&quot;)
            #print(f&quot;Verbos encontrados: {verbos}&quot;)
            aux_frases_list.append(i_frase)
    aux_list = aux_list + aux_frases_list
    
sentences_list = aux_list
print(sentences_list)

How to make these separations without what is identified by (?:[A-Z]|l[oa]s|la|[eé]l) be removed from the following string after the split?

Using this code I am getting this wrong output:

[&#39;El coche ((VERB) es) rojo&#39;, &#39; bicicleta ((VERB)est&#225;) all&#237;&#39;, &#39; monopat&#237;n ((VERB)ha sido pintado) de color rojo&#39;, &#39; cami&#243;n tambi&#233;n ((VERB)funciona) con cargas pesadas&#39;, &#39; hojas ((VERB)son) doradas y ((VERB)son) secas&#39;, &#39; juegos del parque ((VERB)estan) algo oxidados y ((VERB)es) peligroso subirse a ellos&#39;]

It is curious that the sentence "El árbol ((VERB es)) grande" directly dasappeared from the final list, although it should be

Instead you should get this list of strings:

[&quot;El coche ((VERB) es) rojo&quot;, &quot;la bicicleta ((VERB)est&#225;) all&#237;&quot;, &quot;el monopat&#237;n ((VERB)ha sido pintado) de color rojo&quot;, &quot;el cami&#243;n tambi&#233;n ((VERB)funciona) con cargas pesadas&quot;, &quot;El &#225;rbol ((VERB es)) grande&quot;, &quot;las hojas ((VERB)son) doradas y ((VERB)son) secas&quot;, &quot;los juegos del parque ((VERB)estan) algo oxidados y ((VERB)es) peligroso subirse a ellos&quot;]

答案1

得分: 1

I'm taking a guess the splitter regex should be this:

(?:[,.;]?\s*y\s+|[,;]\s*)(?=[A-Z]|l(?:[ao]s|a)|[eé]l)

https://regex101.com/r/jpWfvq/1

(?: [,.;]? \s* y \s+ | [,;] \s* ) # consumed
(?= # not consumed
[A-Z]
| l
(?: [ao] s | a )
| [eé] l
)

which splits on punctuation and y (ands, optional) at the boundaries
while maintaining a forward-looking group of qualifying text without consuming them. And trimming leading whitespace as a bonus.

英文:

I'm taking a guess the splitter regex should be this:

(?:[,.;]?\s*y\s+|[,;]\s*)(?=[A-Z]|l(?:[ao]s|a)|[eé]l)

https://regex101.com/r/jpWfvq/1

 (?: [,.;]? \s* y \s+ | [,;] \s* )   # consumed
 (?=                                 # not consumed
    [A-Z] 
  | l
    (?: [ao] s | a )
  | [e&#233;] l
 )

which splits on punctuation and y (ands, optional) at the boundarys
while maintaining a forward looking group of qualifying text without consuming them. And trimming leading whitespace as a bonus.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to perform string separations using regex as a reference and that a part of the used separator pattern is not removed from the following string?

问题

答案1

用matplotlib填充线下特定区域

构建Go应用程序失败：…：找不到导入项：”code.google.com/p/go.net/html”

Python POST请求带有JSON数据的工作在Requestbin上，但在本地不起作用。

如何找到一个不大的列表的大小

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论