英文:
How to use spaCy Matcher to create a pattern for rule-based matching for a sequence that is only interpreted as a single token
问题
我是新手NLP和spaCy,但我正在用它来做我的项目。我尝试使用spaCy的Matcher类来创建一个模式,从临床摘要中提取信息,具体来说是提取IQ分数的提及。我想使用Matcher来提取临床摘要中频繁出现的序列,如"IQ=68"。最初看起来似乎相对简单,我创建了以下模式:
pattern2 = [{"LOWER": "iq"}, {"TEXT": "=", "OP": "?"}, {"IS_DIGIT": True}]
但是,这似乎没有起作用,我认为这是因为spaCy的标记化处理将序列"IQ=68"视为单个标记,因此模式2不起作用。是否可以使用Matcher解决这个问题?我一直在使用Matcher,因为它对于创建从序列中提取IQ的模式非常有帮助,比如"IQ is 68",因为这被视为三个容易识别的标记,可以轻松制作和使用模式。
非常感谢任何帮助!如果有任何不正确的术语使用,请谅解,我还在学习中!
谢谢。
Tarran
英文:
I am new to nlp and spaCy but I am using it for my project. I am trying to use spaCy's Matcher class to create a pattern to extract information from clinical summaries, specifically mentions of IQ scores. I want to use Matcher to extract sequences in the clinical summaries that occur frequently such as "IQ=68". Initially it seemed relatively straight forward and I created the following pattern:
pattern2 = [{"LOWER": "iq"}, {"TEXT": "=", "OP": "?"}, {"IS_DIGIT": True}]
However, this hasn't worked and I think this is because spaCy's tokenization is treating the sequence "IQ=68" as a single token and so pattern2 does not work. Is there a solution to this using Matcher? I have been using Matcher as it has been helpful with creating patterns that extract IQ from sequences such as "IQ is 68" as this is treated as three easy to identify tokens, where a pattern can be easily made and used.
Any help would be really appreciated! Apologies for any incorrect terms used, I am still learning!
Thanks.
Tarran
答案1
得分: 1
Ah,我认为我通过使用正则表达式解决了它,我有一种直觉,正则表达式会有用,但不知道如何将其合并到模式中。但在一些尝试和查看文档以及 spaCy 指南 https://spacy.io/usage/rule-based-matching 之后,我提出了这个解决方案,看起来它有效:
pattern2 = [{"LOWER": {"REGEX":"iq[=><=]\d+"}}]
希望这对其他使用 spaCy 和 Matcher 类的人有帮助!
英文:
Ah I think I solved it by using regex, I had a hunch that regex would be useful but didn't know how to incorporate it into the pattern. But after some playing around and looking at the documentation and spaCy guides https://spacy.io/usage/rule-based-matching I came up with this solution and it works by the looks of it:
pattern2 = [{"LOWER": {"REGEX":"iq[=><]\d+"}}]
Hope this helps others using spaCy and Matcher class!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论