2023年2月14日 20:56:01go评论92阅读模式

英文:

Extracting Named Entity from Input Utterance in python using regex

问题

有一种方法可以在Python中使用rules.list文件来提取句子中的命名实体，从而获得Youtube，PlayStore，Call of Duty等输出。以下是一个Python代码示例，可以帮助你实现这个目标：

import re
# 定义规则字典
rules = {}
# 从rules.list文件中读取规则
with open('rules.list', 'r') as file:
    lines = file.readlines()
for line in lines:
    parts = line.strip().split('\t')
    entity_type, pattern, group = parts[0], parts[1], int(parts[2])
    if entity_type not in rules:
        rules[entity_type] = []
    rules[entity_type].append((re.compile(pattern), group))
# 定义要匹配的句子列表
sentences = [
    "Open Youtube",
    "Install PlayStore App",
    "Go to Call of Duty app"
]
# 提取命名实体
named_entities = {}
for sentence in sentences:
    for entity_type, rule_list in rules.items():
        for rule, group in rule_list:
            match = rule.search(sentence)
            if match:
                entity = match.group(group)
                named_entities[entity_type] = entity
                break
# 打印提取的命名实体
for entity_type, entity in named_entities.items():
    print(f"{entity_type}: {entity}")

这段代码将rules.list文件中的规则加载到字典中，然后对给定的句子列表进行匹配，提取命名实体，并将其存储在named_entities字典中。最后，它打印出提取的命名实体。

英文:

Say I have some strings

&quot;Open Youtube&quot;
&quot;Install PlayStore App&quot;
&quot;Go to Call of Duty app&quot;

Now I have a rules.list file which contains all the rules in it to extract the named entity out of the above commands.

Say the contents of rules.list file is like this

app	install (.*)	1
app	install app (.*)	1
app	install the (.*) app	1
app	uninstall the app (.*)	1
app	uninstall app (.*)	1
app	uninstall the (.*) app	1
app	go to (.*) app	1
app	download (.*)	1
app	download (.*) app	1
app	download app (.*)	1
app	download the app (.*)	1
app	download the (.*) app	1
app	install the app (.*)	1
app	open the (.*) app	1
app	open (.*)	1
app	uninstall (.*)	1
app	launch (.*) app	1
app	launch (.*)	1

Is there any way I can use this rules.list file in python to extract the Named Entities from my sentences, so that I will have Youtube, PlayStore , Call of Duty as my output?

答案1

得分: 1

以下是翻译的内容：

如果您从开头删除规则中的“app”和结尾的“1”，那么您将得到一个重新表达的结果。(.*)将返回包含所需值的组。

字符串中使用的大写字母在规则中没有，这使得我在使用正则表达式之前将字符串转换为小写。

规则如下：

安装应用程序（.*）
安装应用程序应用程序（.*）
安装应用程序应用程序（.*）
卸载应用程序应用程序（.*）
卸载应用程序（.*）
卸载应用程序应用程序（.*）
转到应用程序（.*）
安装应用程序应用程序（.*）
打开应用程序应用程序（.*）
打开应用程序（.*）
启动应用程序（.*）

对于每个规则，它会去掉开头的4个字符（"app "）和结尾的1个字符（" 1"），然后去掉两端的空格。

然后，对于给定的字符串列表，将规则应用于小写的字符串。如果找到匹配的规则，将打印出规则、原始字符串和匹配的值。

输出示例：

规则 - 安装 (.*)
字符串 - 安装 PlayStore 应用程序
结果 - playstore 应用程序
规则 - 转到 (.* app)
字符串 - 转到使命召唤应用程序
结果 - 使命召唤
规则 - 打开 (.*)
字符串 - 打开 Youtube
结果 - youtube

我认为这应该能帮助您入门。

英文:

If you strip the rules from the start "app " and the end " 1" then you get a re-expression. The (.*) will return a group containing the wanted value.

A bit tricky are the capitals which you use in the strings but not in the rules.
Because of that I make the string lowercase before using re.

rules = [
    &quot;app install (.*)    1&quot;,
    &quot;app install app (.*)    1&quot;,
    &quot;app install the (.*) app    1&quot;,
    &quot;app uninstall the app (.*)  1&quot;,
    &quot;app uninstall app (.*)  1&quot;,
    &quot;app uninstall the (.*) app  1&quot;,
    &quot;app go to (.*) app  1&quot;,
    &quot;app install the app (.*)    1&quot;,
    &quot;app open the (.*) app   1&quot;,
    &quot;app open (.*)   1&quot;,
    &quot;app launch (.*) 1&quot;,
    ]
for rule in rules:
    rule = rule[4:-1].strip()
    # print(rule)
    for string in strings:
        result = re.search(rule, string.lower())
        if result:
            print(&#39;-----------------------------&#39;)
            print(f&#39;rule   - {rule}&#39;)
            print(f&#39;string - {string}&#39;)
            print(f&#39;result - {result.group(1)}&#39;)

Output

-----------------------------
rule   - install (.*)
string - Install PlayStore App
result - playstore app
-----------------------------
rule   - go to (.*) app
string - Go to Call of Duty app
result - call of duty
-----------------------------
rule   - open (.*)
string - Open Youtube
result - youtube

I think this should get you started.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从输入的话语中使用正则表达式提取命名实体的 Python 代码。

问题

答案1

你可以使用Python正则表达式将匹配项替换为匹配项的修改版本。

Which metrics are printed (train or validation) when validation_split and validation_data is not specified in the keras model.fit function?

按照第二个单字对双字组列表进行排序如何？

Python字典、列表和for循环错误

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。