英文:
Capturing any character between two specified words including new lines
问题
我需要捕获在单词TITLE和JOURNAL之间的标题,并排除捕获字符串为Direct Submission
的情况。
例如,在以下文本中,
TITLE The Identification of Novel Diagnostic Marker Genes for the
Detection of Beer Spoiling Pediococcus damnosus Strains Using the
BlAst Diagnostic Gene findEr
JOURNAL PLoS One 11 (3), e0152747 (2016)
PUBMED 27028007
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 462)
AUTHORS Behr,J., Geissler,A.J. and Vogel,R.F.
TITLE Direct Submission
JOURNAL Submitted (04-AUG-2015) Technische Mikrobiologie, Technische
捕获的字符串应为
'The Identification of Novel Diagnostic Marker Genes for the Detection of Beer Spoiling Pediococcus damnosus Strains Using the BlAst Diagnostic Gene findEr'
,最好不包括换行字符(最好不包括换行字符)。
我尝试应用正则表达式,比如这里和这里提供的正则表达式,但无法满足我的需求。
谢谢。
英文:
I need to capture the title between the words TITLE and JOURNAL and to exclude a scenario in which the captured string is Direct Submission
.
for instance, in the the following text,
TITLE The Identification of Novel Diagnostic Marker Genes for the
Detection of Beer Spoiling Pediococcus damnosus Strains Using the
BlAst Diagnostic Gene findEr
JOURNAL PLoS One 11 (3), e0152747 (2016)
PUBMED 27028007
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 462)
AUTHORS Behr,J., Geissler,A.J. and Vogel,R.F.
TITLE Direct Submission
JOURNAL Submitted (04-AUG-2015) Technische Mikrobiologie, Technische
the captured string needs to be only
'The Identification of Novel Diagnostic Marker Genes for the Detection of Beer Spoiling Pediococcus damnosus Strains Using the BlAst Diagnostic Gene findEr'
, either with or without new line characters (preferably without new line characters).
I tried applying regular expressions such as those offered here and here, but couldn't apply them to my needs.
Thanks.
答案1
得分: 3
(?<=TITLE)[\S\s]*?(?=JOURNAL)
应该可以工作。(?<=TITLE)
用于确保匹配前面有 TITLE。(?=JOURNAL)
用于确保匹配后面有 JOURNAL。
为了排除 Direct Submission
,可以使用 (?<=TITLE)(?!\s*Direct Submission)[\S\s]*?(?=JOURNAL)
。然而,这种方法也会排除以 Direct Submission
开头的字符串。这里是结果。
英文:
(?<=TITLE)[\S\s]*?(?=JOURNAL)
Should work. (?<=TITLE) is to make sure that match is preceded by TITLE. (?=JOURNAL) is to make sure that it is followed by JOURNAL.
To exclude Direct Submission
, use (?<=TITLE)(?!\s*Direct Submission)[\S\s]*?(?=JOURNAL)
. However, this approach will also exclude string that starts with Direct Submission
. Here is the result.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论