英文:
Extracting Named Entity from Input Utterance in python using regex
问题
有一种方法可以在Python中使用rules.list
文件来提取句子中的命名实体,从而获得Youtube
,PlayStore
,Call of Duty
等输出。以下是一个Python代码示例,可以帮助你实现这个目标:
import re
# 定义规则字典
rules = {}
# 从rules.list文件中读取规则
with open('rules.list', 'r') as file:
lines = file.readlines()
for line in lines:
parts = line.strip().split('\t')
entity_type, pattern, group = parts[0], parts[1], int(parts[2])
if entity_type not in rules:
rules[entity_type] = []
rules[entity_type].append((re.compile(pattern), group))
# 定义要匹配的句子列表
sentences = [
"Open Youtube",
"Install PlayStore App",
"Go to Call of Duty app"
]
# 提取命名实体
named_entities = {}
for sentence in sentences:
for entity_type, rule_list in rules.items():
for rule, group in rule_list:
match = rule.search(sentence)
if match:
entity = match.group(group)
named_entities[entity_type] = entity
break
# 打印提取的命名实体
for entity_type, entity in named_entities.items():
print(f"{entity_type}: {entity}")
这段代码将rules.list
文件中的规则加载到字典中,然后对给定的句子列表进行匹配,提取命名实体,并将其存储在named_entities
字典中。最后,它打印出提取的命名实体。
英文:
Say I have some strings
"Open Youtube"
"Install PlayStore App"
"Go to Call of Duty app"
Now I have a rules.list
file which contains all the rules in it to extract the named entity out of the above commands.
Say the contents of rules.list file is like this
app install (.*) 1
app install app (.*) 1
app install the (.*) app 1
app uninstall the app (.*) 1
app uninstall app (.*) 1
app uninstall the (.*) app 1
app go to (.*) app 1
app download (.*) 1
app download (.*) app 1
app download app (.*) 1
app download the app (.*) 1
app download the (.*) app 1
app install the app (.*) 1
app open the (.*) app 1
app open (.*) 1
app uninstall (.*) 1
app launch (.*) app 1
app launch (.*) 1
Is there any way I can use this rules.list
file in python to extract the Named Entities from my sentences, so that I will have Youtube
, PlayStore
, Call of Duty
as my output?
答案1
得分: 1
以下是翻译的内容:
如果您从开头删除规则中的“app”和结尾的“1”,那么您将得到一个重新表达的结果。(.*)将返回包含所需值的组。
字符串中使用的大写字母在规则中没有,这使得我在使用正则表达式之前将字符串转换为小写。
规则如下:
- 安装应用程序(.*)
- 安装应用程序应用程序(.*)
- 安装应用程序应用程序(.*)
- 卸载应用程序应用程序(.*)
- 卸载应用程序(.*)
- 卸载应用程序应用程序(.*)
- 转到应用程序(.*)
- 安装应用程序应用程序(.*)
- 打开应用程序应用程序(.*)
- 打开应用程序(.*)
- 启动应用程序(.*)
对于每个规则,它会去掉开头的4个字符("app ")和结尾的1个字符(" 1"),然后去掉两端的空格。
然后,对于给定的字符串列表,将规则应用于小写的字符串。如果找到匹配的规则,将打印出规则、原始字符串和匹配的值。
输出示例:
-
规则 - 安装 (.*)
字符串 - 安装 PlayStore 应用程序
结果 - playstore 应用程序 -
规则 - 转到 (.* app)
字符串 - 转到使命召唤应用程序
结果 - 使命召唤 -
规则 - 打开 (.*)
字符串 - 打开 Youtube
结果 - youtube
我认为这应该能帮助您入门。
英文:
If you strip the rules from the start "app " and the end " 1" then you get a re-expression. The (.*) will return a group containing the wanted value.
A bit tricky are the capitals which you use in the strings but not in the rules.
Because of that I make the string lowercase before using re.
rules = [
"app install (.*) 1",
"app install app (.*) 1",
"app install the (.*) app 1",
"app uninstall the app (.*) 1",
"app uninstall app (.*) 1",
"app uninstall the (.*) app 1",
"app go to (.*) app 1",
"app install the app (.*) 1",
"app open the (.*) app 1",
"app open (.*) 1",
"app launch (.*) 1",
]
for rule in rules:
rule = rule[4:-1].strip()
# print(rule)
for string in strings:
result = re.search(rule, string.lower())
if result:
print('-----------------------------')
print(f'rule - {rule}')
print(f'string - {string}')
print(f'result - {result.group(1)}')
Output
-----------------------------
rule - install (.*)
string - Install PlayStore App
result - playstore app
-----------------------------
rule - go to (.*) app
string - Go to Call of Duty app
result - call of duty
-----------------------------
rule - open (.*)
string - Open Youtube
result - youtube
I think this should get you started.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论