2023年7月3日 08:44:32go评论94阅读模式

英文:

Calculating similarity score in contexto.me clone

问题

我目前正在尝试克隆流行的浏览器游戏 contexto.me，我在计算两个单词（目标单词和用户输入的猜测单词）之间的相似度分数时遇到了问题。我可以得到两个单词之间的余弦相似度，但是如何将得分适当地量化为像游戏中的整数得分，我感到困惑。

例如，如果目标单词是'helicopter'，我猜测单词是'plane'，contexto 会返回大约13的相似度分数，但如果我猜测单词像'king'，contexto 会返回'2000'的得分，例如。

这是我的代码，目前 sim_score 打印出来的分数大约是0.77（对于输入'truck'）和0.29（对于输入'king'）（单词越相似，分数越接近1）。

target_word = "helicopter"
glove = torchtext.vocab.GloVe(name="6B", dim=100)
@app.route('/', methods=["GET", "POST"])
def getSimScore():
    if request.method == "POST":
        text = request.form.get("word")
        new_text = singularize(text)
        sim_score = ((torch.cosine_similarity(glove[target_word].unsqueeze(0), glove[new_text].unsqueeze(0))).numpy()[0])
        print(sim_score)
    return render_template('homepage.html', messageText='sample text', gameNum=1, guessNum=1, wordAccuracy=999)

（请注意：这里使用了 " 和 & 来转义引号和符号，您可能需要根据您的编程环境和需要进行调整。）

如果您有任何进一步的问题，请随时提出。

英文:

I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted guess word). I am able to get the cosine similarity between the two words, but as to how to properly quantify the score into a clean integer like in the game, I am confused as to how it is done.

For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.

target_word = &quot;helicopter&quot;
glove = torchtext.vocab.GloVe(name=&quot;6B&quot;, dim=100)
@app.route(&#39;/&#39;, methods=[&quot;GET&quot;, &quot;POST&quot;])
def getSimScore():
    if request.method == &quot;POST&quot;:
        text = request.form.get(&quot;word&quot;)
        new_text = singularize(text)
        sim_score = ((torch.cosine_similarity(glove[target_word].unsqueeze(0), glove[new_text].unsqueeze(0))).numpy()[0])
        print(sim_score)
    return render_template(&#39;homepage.html&#39;, messageText=&#39;sample text&#39;, gameNum=1, guessNum=1, wordAccuracy=999)

This is my code so far with sim_score printing to be ~0.77 for the input 'truck' and ~0.29 for the input 'king' (closer to 1 the more similar the word is to the target word).

答案1

得分: 0

例如，如果目标词是 'helicopter'，而我猜测的词是 'plane'，contexto 将返回一个相似度分数，例如 13，但如果我猜测像 'king' 这样的词，contexto 将返回一个分数，例如 '2000'。

这个度量通常被称为 "rank"，你可以使用以下算法来计算它。

计算用户可以输入的每个单词的相似度分数。
对这个列表进行排序。
给定一个特定的分数，找出它在列表上出现的位置。如果分数出现在索引 0，那么它就是排名第 1。如果它出现在索引 4，那么它就是排名第 5，依此类推。

为了提高速度，步骤 1 和步骤 2 可以提前计算。

英文:

>For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.

This metric is typically called "rank," and you can calculate it with the following algorithm.

Compute the similarity score of every word the user can enter.
Sort this list.
Given a specific score, find what position it appears on the list. If the score appears at index 0, then it is rank 1. If it appears at index 4, then it is rank 5, and so on.

For speed, steps 1 and 2 can be computed ahead of time, if you want.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Calculando la puntuación de similitud en el clon de contexto.me.

问题

答案1

如何使用NumPy函数添加Polar数据框的列

在Python Selenium中按顺序拖动表格行。

Flask重定向到url_for。

尝试在Python中用新的“names”替换旧的“names”。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。