英文:
Match everything until the whole regex matches again
问题
[VC1000]
:[VC1000]
是我试图匹配的字符串(实际字符串要长得多)。[Venture Capital]
: 这是有关风险投资的课程。[4]
:[4]
代表学分。[This is a class about venture capital and more description, that could mention a future course like VC2000 but might not]
: 这是关于风险投资的课程,以及更多描述,可能提到未来课程,如VC2000,但也可能不提到。
你目前的正则表达式 (^\*?[A-Z]{2}\s?[0-9]{4}) (.*?)([0-9]|[0-9]-[0-9]+)\s?cr\.
已经很接近了,但是你需要修改它来捕获描述部分。你可以使用以下正则表达式来实现:
(^\*?[A-Z]{2}\s?[0-9]{4})\s(.*?)([0-9]|[0-9]-[0-9]+)\s?cr\.
这个正则表达式在原有的基础上做了以下修改:
- 去掉了描述部分前面的空格,以便正确捕获描述。
- 将描述部分的捕获括号移到了描述部分的前面,这样它就会捕获所有描述内容。
这个正则表达式应该能够匹配你所需的所有组。
英文:
This is the string I'm trying to match on (real one is much longer).
VC1000 Venture Capital 4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
VC2000 venture capital II 4 cr.
Another description about blah
VC 3000 venture capital III 4-6 cr.
back again
I'm trying to get groups that would look like
[VC1000]
[Venture Capital]
[4]
[This is a class about venture capital and more description, that could mention a future course like VC2000 but might not]
I almost got it but I'm not sure how to get the description between class listings. Right now I have:
(^\*?[A-Z]{2}\s?[0-9]{4}) (.*?)([0-9]|[0-9]-[0-9]+)\s?cr\.
But i'm not sure how to proceed. Adding .*
matches too much, and doing .*
with the first group from above prevents the first group getting caught every other match.
What's the trick I'm missing?
答案1
得分: 2
尝试(regex101):
import re
pat = r'^([A-Z]{2}\s*\d{4})\s+([^\n]+?)(\d+-?\d*\s+cr\.)$(.*?)(?=^[A-Z]{2}\s*\d{4}\s+[^\n]+?\d+-?\d*\s+cr\.$|\Z)'
pat = re.compile(pat, flags=re.S|re.M)
text = '''\
VC1000 Venture Capital 4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
VC2000 venture capital II 4 cr.
Another description about blah
VC 3000 venture capital III 4-6 cr.
back again'''
for a, b, c, d in pat.findall(text):
print(a)
print(b)
print(c)
print(d)
print('-' * 80)
打印:
VC1000
Venture Capital
4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
--------------------------------------------------------------------------------
VC2000
venture capital II
4 cr.
Another description about blah
--------------------------------------------------------------------------------
VC 3000
venture capital III
4-6 cr.
back again
--------------------------------------------------------------------------------
英文:
Try (regex101):
import re
pat = r'^([A-Z]{2}\s*\d{4})\s+([^\n]+?)(\d+-?\d*\s+cr\.)$(.*?)(?=^[A-Z]{2}\s*\d{4}\s+[^\n]+?\d+-?\d*\s+cr\.$|\Z)'
pat = re.compile(pat, flags=re.S|re.M)
text = '''\
VC1000 Venture Capital 4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
VC2000 venture capital II 4 cr.
Another description about blah
VC 3000 venture capital III 4-6 cr.
back again'''
for a, b, c, d in pat.findall(text):
print(a)
print(b)
print(c)
print(d)
print('-' * 80)
Prints:
VC1000
Venture Capital
4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
--------------------------------------------------------------------------------
VC2000
venture capital II
4 cr.
Another description about blah
--------------------------------------------------------------------------------
VC 3000
venture capital III
4-6 cr.
back again
--------------------------------------------------------------------------------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论