从输入的话语中使用正则表达式提取命名实体的 Python 代码。

huangapple go评论49阅读模式
英文:

Extracting Named Entity from Input Utterance in python using regex

问题

有一种方法可以在Python中使用rules.list文件来提取句子中的命名实体,从而获得YoutubePlayStoreCall of Duty等输出。以下是一个Python代码示例,可以帮助你实现这个目标:

import re

# 定义规则字典
rules = {}

# 从rules.list文件中读取规则
with open('rules.list', 'r') as file:
    lines = file.readlines()

for line in lines:
    parts = line.strip().split('\t')
    entity_type, pattern, group = parts[0], parts[1], int(parts[2])
    if entity_type not in rules:
        rules[entity_type] = []
    rules[entity_type].append((re.compile(pattern), group))

# 定义要匹配的句子列表
sentences = [
    "Open Youtube",
    "Install PlayStore App",
    "Go to Call of Duty app"
]

# 提取命名实体
named_entities = {}

for sentence in sentences:
    for entity_type, rule_list in rules.items():
        for rule, group in rule_list:
            match = rule.search(sentence)
            if match:
                entity = match.group(group)
                named_entities[entity_type] = entity
                break

# 打印提取的命名实体
for entity_type, entity in named_entities.items():
    print(f"{entity_type}: {entity}")

这段代码将rules.list文件中的规则加载到字典中,然后对给定的句子列表进行匹配,提取命名实体,并将其存储在named_entities字典中。最后,它打印出提取的命名实体。

英文:

Say I have some strings

"Open Youtube"
"Install PlayStore App"
"Go to Call of Duty app"

Now I have a rules.list file which contains all the rules in it to extract the named entity out of the above commands.

Say the contents of rules.list file is like this

app	install (.*)	1
app	install app (.*)	1
app	install the (.*) app	1
app	uninstall the app (.*)	1
app	uninstall app (.*)	1
app	uninstall the (.*) app	1
app	go to (.*) app	1
app	download (.*)	1
app	download (.*) app	1
app	download app (.*)	1
app	download the app (.*)	1
app	download the (.*) app	1
app	install the app (.*)	1
app	open the (.*) app	1
app	open (.*)	1
app	uninstall (.*)	1
app	launch (.*) app	1
app	launch (.*)	1

Is there any way I can use this rules.list file in python to extract the Named Entities from my sentences, so that I will have Youtube, PlayStore , Call of Duty as my output?

答案1

得分: 1

以下是翻译的内容:

如果您从开头删除规则中的“app”和结尾的“1”,那么您将得到一个重新表达的结果。(.*)将返回包含所需值的组。

字符串中使用的大写字母在规则中没有,这使得我在使用正则表达式之前将字符串转换为小写。

规则如下:

  1. 安装应用程序(.*)
  2. 安装应用程序应用程序(.*)
  3. 安装应用程序应用程序(.*)
  4. 卸载应用程序应用程序(.*)
  5. 卸载应用程序(.*)
  6. 卸载应用程序应用程序(.*)
  7. 转到应用程序(.*)
  8. 安装应用程序应用程序(.*)
  9. 打开应用程序应用程序(.*)
  10. 打开应用程序(.*)
  11. 启动应用程序(.*)

对于每个规则,它会去掉开头的4个字符("app ")和结尾的1个字符(" 1"),然后去掉两端的空格。

然后,对于给定的字符串列表,将规则应用于小写的字符串。如果找到匹配的规则,将打印出规则、原始字符串和匹配的值。

输出示例:

  1. 规则 - 安装 (.*)
    字符串 - 安装 PlayStore 应用程序
    结果 - playstore 应用程序

  2. 规则 - 转到 (.* app)
    字符串 - 转到使命召唤应用程序
    结果 - 使命召唤

  3. 规则 - 打开 (.*)
    字符串 - 打开 Youtube
    结果 - youtube

我认为这应该能帮助您入门。

英文:

If you strip the rules from the start "app " and the end " 1" then you get a re-expression. The (.*) will return a group containing the wanted value.

A bit tricky are the capitals which you use in the strings but not in the rules.
Because of that I make the string lowercase before using re.

rules = [
    "app install (.*)    1",
    "app install app (.*)    1",
    "app install the (.*) app    1",
    "app uninstall the app (.*)  1",
    "app uninstall app (.*)  1",
    "app uninstall the (.*) app  1",
    "app go to (.*) app  1",
    "app install the app (.*)    1",
    "app open the (.*) app   1",
    "app open (.*)   1",
    "app launch (.*) 1",
    ]

for rule in rules:
    rule = rule[4:-1].strip()
    # print(rule)

    for string in strings:

        result = re.search(rule, string.lower())

        if result:
            print('-----------------------------')
            print(f'rule   - {rule}')
            print(f'string - {string}')
            print(f'result - {result.group(1)}')

Output

-----------------------------
rule   - install (.*)
string - Install PlayStore App
result - playstore app
-----------------------------
rule   - go to (.*) app
string - Go to Call of Duty app
result - call of duty
-----------------------------
rule   - open (.*)
string - Open Youtube
result - youtube

I think this should get you started.

huangapple
  • 本文由 发表于 2023年2月14日 20:56:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75448154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定