英文:
regex split by parenthesis but not all parenthesis
问题
I will not provide a translation for the code part of your message, as per your request.
英文:
I am trying to split a string containing open and close parenthesis but want to exclude those parenthesis that have a substring right before them.
In the following example:
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
I want to have a list like:
['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
So I want to keep ue(aad) and do not split by (aad)
I have tried:
y = [x.strip() for x in re.split(r"[^ue()][()]", a) if x.strip()]
答案1
得分: 2
Sure, here's the translated code:
import re
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(r' (\S*\(.*?\))', a) if x != '']
for i in range(len(y)):
if y[i][0] == '(' and y[i][-1] == ')':
y[i] = y[i].strip('()')
print(y) # => ['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
The code uses regular expressions to split the input string and remove surrounding parentheses from matches that have no preceding strings.
英文:
Try this:
import re
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(r' (\S*\(.*?\))', a) if x != '']
for i in range(len(y)):
if y[i][0] == '(' and y[i][-1] == ')':
y[i] = y[i].strip('()')
print(y) # => ['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
The RegEx (\S*\(.*?\))
will match any of the parentheses and any preceding strings, then the loop removes surrounding parentheses from matches that have no preceding strings.
答案2
得分: 0
I understand your request. Here's the translated code:
由于在我的情况中关键字始终已知,我考虑删除所有ue(.*?)s并将它们保存在列表中,然后按括号拆分,然后进行替换。
这样我将能够拆分嵌套的括号。
类似这样:
a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"
ues = re.findall("ue\(.*?\)", a)
j = re.sub("(?<=ue)\(.*?\)", "", a)
y = [x.strip() for x in re.split(r"[()]", j) if x.strip()]
for i in y:
if "ue" in i:
print(re.sub("ue", ues.pop(0), i))
else:
print(i)
**更新:**
必须忽略的括号将附加一个像ue()这样的子字符串。因此,在它们之前添加一个空格将会忽略它们。
y = [x.strip() for x in re.split(r"[(?<=\s)][()]", a) if x.strip()]
Please note that I've translated the code as requested, and you should be able to use it as is.
英文:
Since the keyword in my case is always known, I was thinking to remove all ue(.*?)s and keep them in a list then split by parenthesis then substitute them.
This way I will be able to split nested parenthesis.
something like:
a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"
ues = re.findall("ue\(.*?\)", a)
j = re.sub("(?<=ue)\(.*?\)", "", a)
y = [x.strip() for x in re.split(r"[()]", j) if x.strip()]
for i in y:
if "ue" in i:
print(re.sub("ue", ues.pop(0), i))
else:
print(i)
Update:
The parenthesis that must be ignored will have a substring stuck to it like ue(). So adding a space before will ignore them.
y = [x.strip() for x in re.split(r"[(?<=\s)][()]", a) if x.strip()]
答案3
得分: 0
对于你的示例数据,你可以使用捕获组来保留括号内分隔后的结果。在模式中,捕获括号前后的除括号外的非空白字符。
在列表推导中,首先检查 x
,然后你可以再次测试 x.strip()
。
请注意,这不考虑任何嵌套/平衡的括号。
解释
([^\s()]+\([^()]*\))
捕获组1,匹配括号中的(...)
前的1个或多个非空白字符。|
或(\([^()]*\)[^\s()]+)
捕获组2,匹配括号中的(...)
后的1个或多个非空白字符。|
或[()]
匹配(
或)
。
import re
pattern = r"([^\s()]+\([^()]*\))|(\([^()]*\)[^\s()]+)|[()]"
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(pattern, a) if x and x.strip()]
print(y)
输出
['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
英文:
For your example data, you could use capture groups to keep the result after splitting. In the pattern, capture non whitespace chars except parenthesis before or after the part with parenthesis.
In the list comprehension, first check for x
and then you can test again for x.strip()
Note that this does not take any nested/balanced parenthesis into account.
Explanation
([^\s()]+\([^()]*\))
Capture group 1, match 1+ non whitespace chars before matching from(...)
|
Or(\([^()]*\)[^\s()]+)
Capture group 2, match 1+ non whitespace chars after matching from(...)
|
Or[()]
Match either(
or)
See a Python demo and a regex101 demo.
import re
pattern = r"([^\s()]+\([^()]*\))|(\([^()]*\)[^\s()]+)|[()]"
a = 'abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl'
y = [x.strip() for x in re.split(pattern, a) if x and x.strip()]
print(y)
Output
['abc', 'xyz pqr', 'qwe ew', 'kjlk asd', 'ue(aad)', 'kljl']
答案4
得分: -1
I will only provide a translation of the code, as per your request. Here's the translated code:
import re
a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"
dissub = re.split("\)\s", a)
newlist = []
for b in dissub:
dasplit = re.split("\s\(", b)
for c in dasplit:
newlist.append(c)
i = 0
while i < len(newlist):
dacheck = re.search("\(", newlist[i])
if dacheck:
newlist[i] += ")"
i += 1
print(newlist)
英文:
This is a strange way to accomplish this but it works:
import re
a = "abc (xyz pqr) qwe ew (kjlk asd) ue(aad) kljl"
dissub=re.split("\)\s",a)
newlist=[]
for b in dissub:
dasplit=re.split("\s\(",b)
for c in dasplit:
newlist.append(c)
i=0
while i<len(newlist):
dacheck=re.search("\(",newlist[i])
if dacheck:
newlist[i]+=")"
i+=1
print(newlist)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论