使用正则表达式从字符串列表中移除所有位于’a’和’b’之间的字符串。

huangapple go评论98阅读模式
英文:

How to remove from the list of strings, using regex for all the strings in between 'a' and 'b'?

问题

I wrote a function that searches for a given text in the file and returns all the lines where this text appeared (in the new_list). I would like to delete all the text that's between a character '/' and '/' (including these characters), as these are comments, and are not necessary in the returned list. So now it returns something like:

new_list = ['123456_xxxx /* cccccccccccccc */ /\n', '67890_yyyy /* cccccccccccccc */ /\n', ...]

and the goal is:

new_list = ['123456_xxxx', '67890_yyyy', ...]

I tried using the re library and regex (?<=/*).*?(?=*/), but this one gives an error:

error: nothing to repeat at position 13

PS I also tried with replace, but it only deletes the characters and not everything between them.

英文:

I wrote a function that searches for a given text in the file and returns all the lines where this text appeared (in the new_list). I would like to delete all the text that's between a character '/* ' and ' */' (including these characters), as these are comments, and are not necessary in the returned list. So now it returns something like:

new_list = [ &#39;123456_xxxx /* cccccccccccccc */ /\n&quot;, &#39;67890_yyyy /* cccccccccccccc */ /\n&quot;, ... ]

and the goal is:

new_list = [ &#39;123456_xxxx&quot;, &#39;67890_yyyy&quot;, ... ]

I tried using re library and regex (?<=§).*?(?=;)

new_list = re.sub(&#39;(?&lt;=/*).*?(?=*/)&#39;, &#39;&#39;, str(new_list)) but this one gives an error:

error: nothing to repeat at position 13

PS I also tried with replace but it only deletes the characters and not everything between them:

new_list = [s.replace(&#39; /*&#39;, &#39; &#39;) for s in new_list]

答案1

得分: 1

import re

list = [f'123456_xxxx /* cccccccccccccc */ /\n', f'67890_yyyy /* cccccccccccccc */ /\n']
newList = []

pattern = f'\/\*.*\*\/'
for str in list:
    newList.append(re.sub(pattern, '', str))
    
print(newList)
['123456_xxxx  /\n', '67890_yyyy  /\n']

如果不想要空格和换行符,更改模式:

pattern = f' \/\*.*\*\/ \/\n'
['123456_xxxx', '67890_yyyy']
英文:
import re

list = [ f&#39;123456_xxxx /* cccccccccccccc */ /\n&#39;,  f&#39;67890_yyyy /* cccccccccccccc */ /\n&#39;]
newList = []

pattern = f&#39;\/\*.*\*\/&#39;
for str in list:
    newList.append( re.sub(pattern, &#39;&#39;, str))
    
print(newList)
[&#39;123456_xxxx  /\n&#39;, &#39;67890_yyyy  /\n&#39;]

if you don't want and \n
change the pattern:

pattern = f&#39; \/\*.*\*\/ \/\\n&#39;
[&#39;123456_xxxx&#39;, &#39;67890_yyyy&#39;]

答案2

得分: 1

只需使用 \/\*.*?\*\/,并记得提供 re.S 标志以跨多行搜索。

斜杠 (/) 和星号 (*) 都是特殊字符,需要在前面加上反斜杠 () 以使其被视为字面字符。

英文:

Just use \/\*.*?\*\/ and remember to supply the re.S flag to search across multiple lines.

Both the / and the * are special and need to be prefaced with a backslash () to be taken literally.

答案3

得分: 1

你可以使用 re.search 和以下正则表达式:

^(?:(?!\/\*).)+(?<!\s)

Python 示例 <-\(ツ)/-> 正则表达式示例

(?:(?!\/\*).) 匹配字符串中的单个字符 (.),从字符串的开头开始 (^)。(?!\/\*) 是一个 负向先行断言,用于断言字符串中的下两个字符不是 /*。换句话说,匹配任何字符,直到如果它是 /,且下一个字符是 *。这被称为温和的贪婪标记技术

我添加了 负向回顾断言 (?<!\s) 以确保匹配不以空格结尾。

英文:

You could use re.search with the following regular expression.

^(?:(?!\/\*).)+(?&lt;!\s)

Python demo<sup><sub><-</sup></sub><sub>\(ツ)/</sub><sup><sub>-></sub></sup>Regex demo

(?:(?!\/\*).) matches a single character in the string (.), starting at the beginning of the string (^). (?!\/\*) is a negative lookahead that asserts that the following two characters in the string are not /*. In other words any character is matched until and if it is / and the next character is *. This is called the tempered greedy token technique.

I've added the negative lookbehind (?&lt;!\s) to ensure that the match does not end in a whitespace.

答案4

得分: 0

代码

old_list = ['123456_xxxx /* cccccccccccccc */ /\n', '67890_yyyy /* cccccccccccccc */ /\n']

new_list = []

for i in old_list:
    new_list.append(i[:10])

print(new_list)

输出

['123456_xxx', '67890_yyyy']
英文:

Using index of strings

Code

old_list = [&#39;123456_xxxx /* cccccccccccccc */ /\n&#39;,  &#39;67890_yyyy /* cccccccccccccc */ /\n&#39;]

new_list = []

for i in old_list:
    new_list.append(i[:10])

print(new_list)

Output

[&#39;123456_xxx&#39;, &#39;67890_yyyy&#39;]

huangapple
  • 本文由 发表于 2023年3月7日 15:48:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75659212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定