正则表达式按括号拆分,但不是所有括号。

huangapple go评论76阅读模式
英文:

regex split by parenthesis but not all parenthesis

问题

I will not provide a translation for the code part of your message, as per your request.

英文:

I am trying to split a string containing open and close parenthesis but want to exclude those parenthesis that have a substring right before them.
In the following example:

a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'

I want to have a list like:

['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']

So I want to keep ue(aad) and do not split by (aad)

I have tried:

y = [x.strip() for x in re.split(r"[^ue()][()]", a) if x.strip()]

答案1

得分: 2

Sure, here's the translated code:

import re

a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(r' (\S*\(.*?\))', a) if x != '']
for i in range(len(y)):
    if y[i][0] == '(' and y[i][-1] == ')':
        y[i] = y[i].strip('()')

print(y)  # => ['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']

The code uses regular expressions to split the input string and remove surrounding parentheses from matches that have no preceding strings.

英文:

Try this:

import re

a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(r' (\S*\(.*?\))', a) if x != '']
for i in range(len(y)):
    if y[i][0] == '(' and y[i][-1] == ')':
        y[i] = y[i].strip('()')

print(y)  # => ['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']

The RegEx (\S*\(.*?\)) will match any of the parentheses and any preceding strings, then the loop removes surrounding parentheses from matches that have no preceding strings.

答案2

得分: 0

I understand your request. Here's the translated code:

由于在我的情况中关键字始终已知我考虑删除所有ue(.*?)s并将它们保存在列表中然后按括号拆分然后进行替换
这样我将能够拆分嵌套的括号
类似这样

a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"
ues = re.findall("ue\(.*?\)", a)
j = re.sub("(?<=ue)\(.*?\)", "", a)
y = [x.strip() for x in re.split(r"[()]", j) if x.strip()]
for i in y:
    if "ue" in i:
        print(re.sub("ue", ues.pop(0), i))
    else: 
        print(i)

**更新**
必须忽略的括号将附加一个像ue()这样的子字符串因此在它们之前添加一个空格将会忽略它们

y = [x.strip() for x in re.split(r"[(?<=\s)][()]", a) if x.strip()]

Please note that I've translated the code as requested, and you should be able to use it as is.

英文:

Since the keyword in my case is always known, I was thinking to remove all ue(.*?)s and keep them in a list then split by parenthesis then substitute them.
This way I will be able to split nested parenthesis.
something like:

a = &quot;abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl&quot;
ues = re.findall(&quot;ue\(.*?\)&quot;, a)
j = re.sub(&quot;(?&lt;=ue)\(.*?\)&quot;, &quot;&quot;, a)
y = [x.strip() for x in re.split(r&quot;[()]&quot;, j) if x.strip()]
for i in y:
    if &quot;ue&quot; in i:
        print(re.sub(&quot;ue&quot;, ues.pop(0), i))
    else: 
        print(i)

Update:
The parenthesis that must be ignored will have a substring stuck to it like ue(). So adding a space before will ignore them.

y = [x.strip() for x in re.split(r&quot;[(?&lt;=\s)][()]&quot;, a) if x.strip()]  

答案3

得分: 0

对于你的示例数据,你可以使用捕获组来保留括号内分隔后的结果。在模式中,捕获括号前后的除括号外的非空白字符。

在列表推导中,首先检查 x,然后你可以再次测试 x.strip()

请注意,这不考虑任何嵌套/平衡的括号。

解释

  • ([^\s()]+\([^()]*\)) 捕获组1,匹配括号中的 (...) 前的1个或多个非空白字符。
  • |
  • (\([^()]*\)[^\s()]+) 捕获组2,匹配括号中的 (...) 后的1个或多个非空白字符。
  • |
  • [()] 匹配 ()

查看Python演示regex101演示

import re

pattern = r"([^\s()]+\([^()]*\))|(\([^()]*\)[^\s()]+)|[()]"
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'

y = [x.strip() for x in re.split(pattern, a) if x and x.strip()]
print(y)

输出

['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
英文:

For your example data, you could use capture groups to keep the result after splitting. In the pattern, capture non whitespace chars except parenthesis before or after the part with parenthesis.

In the list comprehension, first check for x and then you can test again for x.strip()

Note that this does not take any nested/balanced parenthesis into account.

Explanation

  • ([^\s()]+\([^()]*\)) Capture group 1, match 1+ non whitespace chars before matching from (...)
  • | Or
  • (\([^()]*\)[^\s()]+) Capture group 2, match 1+ non whitespace chars after matching from (...)
  • | Or
  • [()] Match either ( or )

See a Python demo and a regex101 demo.

import re

pattern = r&quot;([^\s()]+\([^()]*\))|(\([^()]*\)[^\s()]+)|[()]&quot;
a = &#39;abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl&#39;

y = [x.strip() for x in re.split(pattern, a) if x and x.strip()]
print(y)

Output

[&#39;abc&#39;, &#39;xyz pqr&#39;, &#39;qwe ew&#39;, &#39;kjlk asd&#39;, &#39;ue(aad)&#39;, &#39;kljl&#39;]

答案4

得分: -1

I will only provide a translation of the code, as per your request. Here's the translated code:

import re

a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"

dissub = re.split("\)\s", a)
newlist = []
for b in dissub:
    dasplit = re.split("\s\(", b)
    for c in dasplit:
        newlist.append(c)

i = 0
while i < len(newlist):
    dacheck = re.search("\(", newlist[i])
    if dacheck:
        newlist[i] += ")"
    i += 1
print(newlist)
英文:

This is a strange way to accomplish this but it works:

import re

a = &quot;abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl&quot;

dissub=re.split(&quot;\)\s&quot;,a)
newlist=[]
for b in dissub:
    dasplit=re.split(&quot;\s\(&quot;,b)
    for c in dasplit:

        newlist.append(c)
i=0
while i&lt;len(newlist):
    dacheck=re.search(&quot;\(&quot;,newlist[i])
    if dacheck:
        newlist[i]+=&quot;)&quot;
    i+=1
print(newlist)

huangapple
  • 本文由 发表于 2023年2月18日 08:48:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75490456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定