捕获两个指定词之间的任何字符,包括换行符。

huangapple go评论54阅读模式
英文:

Capturing any character between two specified words including new lines

问题

我需要捕获在单词TITLE和JOURNAL之间的标题,并排除捕获字符串为Direct Submission的情况。
例如,在以下文本中,

TITLE The Identification of Novel Diagnostic Marker Genes for the
Detection of Beer Spoiling Pediococcus damnosus Strains Using the
BlAst Diagnostic Gene findEr
JOURNAL PLoS One 11 (3), e0152747 (2016)
PUBMED 27028007
REMARK Publication Status: Online-Only
REFERENCE 2 (bases 1 to 462)
AUTHORS Behr,J., Geissler,A.J. and Vogel,R.F.
TITLE Direct Submission
JOURNAL Submitted (04-AUG-2015) Technische Mikrobiologie, Technische

捕获的字符串应为
'The Identification of Novel Diagnostic Marker Genes for the Detection of Beer Spoiling Pediococcus damnosus Strains Using the BlAst Diagnostic Gene findEr',最好不包括换行字符(最好不包括换行字符)。
我尝试应用正则表达式,比如这里这里提供的正则表达式,但无法满足我的需求。
谢谢。

英文:

I need to capture the title between the words TITLE and JOURNAL and to exclude a scenario in which the captured string is Direct Submission.
for instance, in the the following text,

  TITLE     The Identification of Novel Diagnostic Marker Genes for the
            Detection of Beer Spoiling Pediococcus damnosus Strains Using the
            BlAst Diagnostic Gene findEr
  JOURNAL   PLoS One 11 (3), e0152747 (2016)
   PUBMED   27028007
  REMARK    Publication Status: Online-Only
REFERENCE   2  (bases 1 to 462)
  AUTHORS   Behr,J., Geissler,A.J. and Vogel,R.F.
  TITLE     Direct Submission
  JOURNAL   Submitted (04-AUG-2015) Technische Mikrobiologie, Technische

the captured string needs to be only
'The Identification of Novel Diagnostic Marker Genes for the Detection of Beer Spoiling Pediococcus damnosus Strains Using the BlAst Diagnostic Gene findEr', either with or without new line characters (preferably without new line characters).
I tried applying regular expressions such as those offered here and here, but couldn't apply them to my needs.
Thanks.

答案1

得分: 3

(?<=TITLE)[\S\s]*?(?=JOURNAL) 应该可以工作。(?<=TITLE) 用于确保匹配前面有 TITLE。(?=JOURNAL) 用于确保匹配后面有 JOURNAL。

为了排除 Direct Submission,可以使用 (?<=TITLE)(?!\s*Direct Submission)[\S\s]*?(?=JOURNAL)。然而,这种方法也会排除以 Direct Submission 开头的字符串。这里是结果

英文:

(?<=TITLE)[\S\s]*?(?=JOURNAL)

Should work. (?<=TITLE) is to make sure that match is preceded by TITLE. (?=JOURNAL) is to make sure that it is followed by JOURNAL.

To exclude Direct Submission, use (?&lt;=TITLE)(?!\s*Direct Submission)[\S\s]*?(?=JOURNAL). However, this approach will also exclude string that starts with Direct Submission. Here is the result.

huangapple
  • 本文由 发表于 2023年3月12日 16:58:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712029.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定