Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset

huangapple go评论72阅读模式
英文:

Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset

问题

我正在为我的毕业项目创建一个自然语言处理模型。我目前正在使用Tensorflow Keras LSTM模型来训练模型。我发现了一份在线指南,因为我的数据集是用僧伽罗语编写的,但该教程中的代码过时或存在一些问题。当前每次运行以下代码时:

import fasttext
import fasttext.util

ft = fasttext.load_model("cc.si.300.bin")
ft.get_dimension()

# 将FastText单词向量与数据集中的单词进行映射
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    print(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

我收到以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-797ce5392360> in <cell line: 9>()
     11     print(word)
     12     if embedding_vector is not None:
---> 13         embeddings_matrix[i] = embedding_vector

ValueError: could not broadcast input array from shape (300,) into shape (16,)

由于我是新手,我不知道如何准确解决这个问题,而且我在网站上也找不到合适的答案。

英文:

I am working on creating a NLP model for my final year project. I am currently using Tensorflow Keras LSTM model to train the model. I have found a online guide for this as my dataset is in Sinhala but the code in that tutorial is old or has some issues. Currently when ever I runs

import fasttext
import fasttext.util

ft = fasttext.load_model(&quot;cc.si.300.bin&quot;)
ft.get_dimension()

# Mapping FastText word vectors with word in the dataset 
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim));
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    print(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector;

I get the error below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
&lt;ipython-input-44-797ce5392360&gt; in &lt;cell line: 9&gt;()
     11     print(word)
     12     if embedding_vector is not None:
---&gt; 13         embeddings_matrix[i] = embedding_vector;

ValueError: could not broadcast input array from shape (300,) into shape (16,)

As I am new to this, I don't know how to fix this issue exactly and I was not able to find a proper answer in the site too.

答案1

得分: 1

你没有显示创建embeddings_matrix时使用的vocab_sizeembedding_dim的值。(此时,word_index尚未被赋任何值。)

但是,这似乎是尝试将一个300维向量放入另一个不适合的数组中可能引发的错误。

vocab_sizeembedding_dim的值是多少?或者等效地,embeddings_matrix.shape在出现错误之前是什么?如果其第二维不是300,那么应该是,你应该更改设置控制值的(未显示的)代码,以确保embedding_matrix的大小正确。

(更基础地说,你确定需要将每个单词的向量从ft模型复制到自己的矩阵中吗?根据你接下来的计划,可能并不需要。)

英文:

You're not showing the values of vocab_size & embedding_dim that you're using to create your embeddings_matrix. (Also, at the time you're operating on it, word_index hasn't yet been established to hold any value.)

But, that seems like an error one might get trying to put a 300-dimensional vector into another array where it doesn't fit.

What are the values of vocab_size & embedding_dim – or equivalently what's embeddings_matrix.shape just before you get the error? If its 2nd dimension isn't 300, it should be, and you should change your (unshown) code that sets up the controlling values to ensure that embedding_matrix is the right size.

(More foundationally, are you sure you need to copy each word's vector out of the ft model into your own matrix? You might not, depending on what you're planning to do next.)

答案2

得分: 0

我发现了我犯的错误。我不够了解来解释这个,但似乎我在代码中之前定义的embedding_dim在实现fasttext功能时不支持。

# 将FastText词向量与数据集中的单词进行映射
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

谢谢你的帮助。

英文:

I have found out the mistake i made. I am not knowledge enough to explain this but it seems the embedding_dim I defied earlier in the code doesn't support when it comes to the implementing the fasttext functionalities.

# Mapping FastText word vectors with word in the dataset 
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

Thank you for your help

huangapple
  • 本文由 发表于 2023年4月20日 01:54:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057512.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定