2023年7月23日 14:48:32go评论97阅读模式

英文:

How to get a name with random prefixes and suffixes

问题

我有一个相对独特的问题，但我毫无头绪从哪里开始。我正在使用Python。

所以，我试图从两个API中获取关于物品的大量信息，这两个API使用两种不同的ID方法。

名称和ID

名称将看起来像这样：Divan的头盔

ID将如下所示：DIVAN_HELMET

对我来说，将它们连接在字典中很容易。我的问题是有时名称会有后缀和前缀。比如：

Divan的智慧头盔或Divan的清洁头盔，甚至包含Unicode字符，如✪ Divan的头盔 ✪。

我想从这些名称中获取ID DIVAN_HELMET，但我无法确定前缀有多少个字符，甚至是否有后缀/前缀。我需要批量处理超过3,000个物品，其中包含数十个后缀和前缀。

英文:

I have a semi unique problem and I have no clue where to start. I'm using python

So im trying to get a bunch of info about items off two API's and these API uses two different id methods

Name and ID

The Name will look something like: Helmet of Divan

The ID will look like: DIVAN_HELMET

This is easy for me connect the two in a dictionary. My problem is sometimes the names will have suffixes and prefixs. Such as:

Wise Helmet of Divan or Clean Helmet of Divan or even have Unicode like ✪ Helmet of Divan ✪.

I want to get the ID DIVAN_HELMET from these names, but I can't know how many characters the prefix is or even if there is a suffix/prefix. I need to do this in mass for over 3 thousand items with dozens of suffixes and prefixes.

答案1

得分: 0

# 你想要从以下输入中获得这样的输出：```DIVAN_HELMET```
# 从这样的输入：```Wise Helmet of Divan``` 或 ```Clean Helmet of Divan``` 或 ```✪ Helmet of Divan ✪```
# 首先，您可以删除所有非ASCII字符，例如：[此答案](https://stackoverflow.com/a/8689826/3706717)：
import string
printable = set(string.printable)
str_input = ''.join(filter(lambda x: x in printable, str_input))
# 然后，您需要将它们全部转换为小写，如```str_input = str_input.lower()```
# 接下来，您需要对输入进行 [tokenize](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization)；最简单的方法就是通过空格拆分它，例如：```arr_str_input = str_input.split(" ")```
# 然后，您需要移除 [stopwords](https://en.wikipedia.org/wiki/Stop_word) 如 'of' 或 'the'。对于这一步，您可以使用公开可用的停用词列表，例如[这个](https://github.com/stopwords-iso/stopwords-en/blob/master/stopwords-en.txt)，或者如果您的输入文本中只有一个停用词 'of'，也可以硬编码删除它，例如：```arr_str_input.remove("of")```
# 接下来，您需要移除前缀或后缀。在这一步中，您可以自己提供所有前缀/后缀的列表，或者使用已经准备好的列表，例如[这个](https://github.com/Rayraegah/adjectives)（要小心，因为这个列表可能非常大）
# 完成所有这些步骤后，您应该得到一个仅包含2个词的列表/数组，例如```['helmet', 'divan']```。最后一步应该只是排列它们并将它们转换为大写，例如：
result = ['helmet', 'divan']
result.reverse()
print('_'.join(result).upper())
# 输出 DIVAN_HELMET

英文:

So you want to get such output: DIVAN_HELMET

From such inputs: Wise Helmet of Divan or Clean Helmet of Divan or ✪ Helmet of Divan ✪

First you can remove all non-ASCII characters, e.g. like this answer :

import string
printable = set(string.printable)
str_input = &#39;&#39;.join(filter(lambda x: x in printable, str_input))

Then you need to convert them to all lowercase like str_input = str_input.lower()

Then, you need to tokenize the input; the easiest way is just to split it by space, e.g.: arr_str_input = str_input.split(" ")

Then you need to remove the stopwords like 'of' or 'the'. For this step you can use publicly available stopword list like this or just hardcode removal of word 'of' if that's all the stopword in your input text. e.g.: arr_str_input.remove("of")

Then you need to remove the prefix or suffix. In this step you can just supply the list of all prefix/suffix yourself or use readily made one like this (be careful since it can be very big list)

After all that, you should have a list/array of only 2 word like ['helmet','divan']. Last step should be just arranging them and making them uppercase, e.g.:

result = [&#39;helmet&#39;,&#39;divan&#39;]
result.reverse()
print(&#39;_&#39;.join(result).upper())
# outputs DIVAN_HELMET

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取带有随机前缀和后缀的名称

问题

答案1

使用`numpy.ndarray`在Matplotlib标题图中以指定格式绘制。

文件监视器循环无法在重新运行代码时继续上次的位置。

提高Python函数的速度

为什么文本列出现浮点数数据类型错误？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。