Calculando la puntuación de similitud en el clon de contexto.me.

huangapple go评论59阅读模式
英文:

Calculating similarity score in contexto.me clone

问题

我目前正在尝试克隆流行的浏览器游戏 contexto.me,我在计算两个单词(目标单词和用户输入的猜测单词)之间的相似度分数时遇到了问题。我可以得到两个单词之间的余弦相似度,但是如何将得分适当地量化为像游戏中的整数得分,我感到困惑。

例如,如果目标单词是'helicopter',我猜测单词是'plane',contexto 会返回大约13的相似度分数,但如果我猜测单词像'king',contexto 会返回'2000'的得分,例如。

这是我的代码,目前 sim_score 打印出来的分数大约是0.77(对于输入'truck')和0.29(对于输入'king')(单词越相似,分数越接近1)。

target_word = "helicopter"
glove = torchtext.vocab.GloVe(name="6B", dim=100)

@app.route('/', methods=["GET", "POST"])
def getSimScore():
    if request.method == "POST":
        text = request.form.get("word")
        new_text = singularize(text)
        sim_score = ((torch.cosine_similarity(glove[target_word].unsqueeze(0), glove[new_text].unsqueeze(0))).numpy()[0])
        print(sim_score)
    return render_template('homepage.html', messageText='sample text', gameNum=1, guessNum=1, wordAccuracy=999)

(请注意:这里使用了 " 和 & 来转义引号和符号,您可能需要根据您的编程环境和需要进行调整。)

如果您有任何进一步的问题,请随时提出。

英文:

I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted guess word). I am able to get the cosine similarity between the two words, but as to how to properly quantify the score into a clean integer like in the game, I am confused as to how it is done.

For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.

target_word = "helicopter"
glove = torchtext.vocab.GloVe(name="6B", dim=100)


@app.route('/', methods=["GET", "POST"])
def getSimScore():
    if request.method == "POST":
        text = request.form.get("word")
        new_text = singularize(text)
        sim_score = ((torch.cosine_similarity(glove[target_word].unsqueeze(0), glove[new_text].unsqueeze(0))).numpy()[0])
        print(sim_score)
    return render_template('homepage.html', messageText='sample text', gameNum=1, guessNum=1, wordAccuracy=999)

This is my code so far with sim_score printing to be ~0.77 for the input 'truck' and ~0.29 for the input 'king' (closer to 1 the more similar the word is to the target word).

答案1

得分: 0

例如,如果目标词是 'helicopter',而我猜测的词是 'plane',contexto 将返回一个相似度分数,例如 13,但如果我猜测像 'king' 这样的词,contexto 将返回一个分数,例如 '2000'。

这个度量通常被称为 "rank",你可以使用以下算法来计算它。

  1. 计算用户可以输入的每个单词的相似度分数。
  2. 对这个列表进行排序。
  3. 给定一个特定的分数,找出它在列表上出现的位置。如果分数出现在索引 0,那么它就是排名第 1。如果它出现在索引 4,那么它就是排名第 5,依此类推。

为了提高速度,步骤 1 和步骤 2 可以提前计算。

英文:

>For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.

This metric is typically called "rank," and you can calculate it with the following algorithm.

  1. Compute the similarity score of every word the user can enter.
  2. Sort this list.
  3. Given a specific score, find what position it appears on the list. If the score appears at index 0, then it is rank 1. If it appears at index 4, then it is rank 5, and so on.

For speed, steps 1 and 2 can be computed ahead of time, if you want.

huangapple
  • 本文由 发表于 2023年7月3日 08:44:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601293.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定