2023年8月10日 23:54:08go评论113阅读模式

英文:

How to use spaCy Matcher to create a pattern for rule-based matching for a sequence that is only interpreted as a single token

问题

我是新手NLP和spaCy，但我正在用它来做我的项目。我尝试使用spaCy的Matcher类来创建一个模式，从临床摘要中提取信息，具体来说是提取IQ分数的提及。我想使用Matcher来提取临床摘要中频繁出现的序列，如"IQ=68"。最初看起来似乎相对简单，我创建了以下模式：

pattern2 = [{"LOWER": "iq"}, {"TEXT": "=", "OP": "?"}, {"IS_DIGIT": True}]

但是，这似乎没有起作用，我认为这是因为spaCy的标记化处理将序列"IQ=68"视为单个标记，因此模式2不起作用。是否可以使用Matcher解决这个问题？我一直在使用Matcher，因为它对于创建从序列中提取IQ的模式非常有帮助，比如"IQ is 68"，因为这被视为三个容易识别的标记，可以轻松制作和使用模式。

非常感谢任何帮助！如果有任何不正确的术语使用，请谅解，我还在学习中！

谢谢。
Tarran

英文:

I am new to nlp and spaCy but I am using it for my project. I am trying to use spaCy's Matcher class to create a pattern to extract information from clinical summaries, specifically mentions of IQ scores. I want to use Matcher to extract sequences in the clinical summaries that occur frequently such as "IQ=68". Initially it seemed relatively straight forward and I created the following pattern:

pattern2 = [{"LOWER": "iq"}, {"TEXT": "=", "OP": "?"}, {"IS_DIGIT": True}]

However, this hasn't worked and I think this is because spaCy's tokenization is treating the sequence "IQ=68" as a single token and so pattern2 does not work. Is there a solution to this using Matcher? I have been using Matcher as it has been helpful with creating patterns that extract IQ from sequences such as "IQ is 68" as this is treated as three easy to identify tokens, where a pattern can be easily made and used.

Any help would be really appreciated! Apologies for any incorrect terms used, I am still learning!

Thanks.
Tarran

答案1

得分: 1

Ah,我认为我通过使用正则表达式解决了它，我有一种直觉，正则表达式会有用，但不知道如何将其合并到模式中。但在一些尝试和查看文档以及 spaCy 指南 https://spacy.io/usage/rule-based-matching 之后，我提出了这个解决方案，看起来它有效：

pattern2 = [{"LOWER": {"REGEX":"iq[=><=]\d+"}}]

希望这对其他使用 spaCy 和 Matcher 类的人有帮助！

英文:

Ah I think I solved it by using regex, I had a hunch that regex would be useful but didn't know how to incorporate it into the pattern. But after some playing around and looking at the documentation and spaCy guides https://spacy.io/usage/rule-based-matching I came up with this solution and it works by the looks of it:

pattern2 = [{"LOWER": {"REGEX":"iq[=><]\d+"}}]

Hope this helps others using spaCy and Matcher class!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to use spaCy Matcher to create a pattern for rule-based matching for a sequence that is only interpreted as a single token

问题

答案1

使用Go打开需要字符转义的文件路径

正则表达式替换文本以标记以字符开头的HTML标签

正则表达式用于匹配多个字符串条件

Split a column into 2 columns like alphabetic text in one column and alphanumeric or numbers or anything in 2nd column

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论