英文:
Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset
问题
我正在为我的毕业项目创建一个自然语言处理模型。我目前正在使用Tensorflow Keras LSTM模型来训练模型。我发现了一份在线指南,因为我的数据集是用僧伽罗语编写的,但该教程中的代码过时或存在一些问题。当前每次运行以下代码时:
import fasttext
import fasttext.util
ft = fasttext.load_model("cc.si.300.bin")
ft.get_dimension()
# 将FastText单词向量与数据集中的单词进行映射
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
embedding_vector = ft.get_word_vector(word)
print(word)
if embedding_vector is not None:
embeddings_matrix[i] = embedding_vector
我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-44-797ce5392360> in <cell line: 9>()
11 print(word)
12 if embedding_vector is not None:
---> 13 embeddings_matrix[i] = embedding_vector
ValueError: could not broadcast input array from shape (300,) into shape (16,)
由于我是新手,我不知道如何准确解决这个问题,而且我在网站上也找不到合适的答案。
英文:
I am working on creating a NLP model for my final year project. I am currently using Tensorflow Keras LSTM model to train the model. I have found a online guide for this as my dataset is in Sinhala but the code in that tutorial is old or has some issues. Currently when ever I runs
import fasttext
import fasttext.util
ft = fasttext.load_model("cc.si.300.bin")
ft.get_dimension()
# Mapping FastText word vectors with word in the dataset
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim));
for word, i in word_index.items():
embedding_vector = ft.get_word_vector(word)
print(word)
if embedding_vector is not None:
embeddings_matrix[i] = embedding_vector;
I get the error below.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-44-797ce5392360> in <cell line: 9>()
11 print(word)
12 if embedding_vector is not None:
---> 13 embeddings_matrix[i] = embedding_vector;
ValueError: could not broadcast input array from shape (300,) into shape (16,)
As I am new to this, I don't know how to fix this issue exactly and I was not able to find a proper answer in the site too.
答案1
得分: 1
你没有显示创建embeddings_matrix
时使用的vocab_size
和embedding_dim
的值。(此时,word_index
尚未被赋任何值。)
但是,这似乎是尝试将一个300维向量放入另一个不适合的数组中可能引发的错误。
vocab_size
和embedding_dim
的值是多少?或者等效地,embeddings_matrix.shape
在出现错误之前是什么?如果其第二维不是300
,那么应该是,你应该更改设置控制值的(未显示的)代码,以确保embedding_matrix
的大小正确。
(更基础地说,你确定需要将每个单词的向量从ft
模型复制到自己的矩阵中吗?根据你接下来的计划,可能并不需要。)
英文:
You're not showing the values of vocab_size
& embedding_dim
that you're using to create your embeddings_matrix
. (Also, at the time you're operating on it, word_index
hasn't yet been established to hold any value.)
But, that seems like an error one might get trying to put a 300-dimensional vector into another array where it doesn't fit.
What are the values of vocab_size
& embedding_dim
– or equivalently what's embeddings_matrix.shape
just before you get the error? If its 2nd dimension isn't 300
, it should be, and you should change your (unshown) code that sets up the controlling values to ensure that embedding_matrix
is the right size.
(More foundationally, are you sure you need to copy each word's vector out of the ft
model into your own matrix? You might not, depending on what you're planning to do next.)
答案2
得分: 0
我发现了我犯的错误。我不够了解来解释这个,但似乎我在代码中之前定义的embedding_dim在实现fasttext功能时不支持。
# 将FastText词向量与数据集中的单词进行映射
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
embedding_vector = ft.get_word_vector(word)
if embedding_vector is not None:
embeddings_matrix[i] = embedding_vector
谢谢你的帮助。
英文:
I have found out the mistake i made. I am not knowledge enough to explain this but it seems the embedding_dim I defied earlier in the code doesn't support when it comes to the implementing the fasttext functionalities.
# Mapping FastText word vectors with word in the dataset
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
embedding_vector = ft.get_word_vector(word)
if embedding_vector is not None:
embeddings_matrix[i] = embedding_vector
Thank you for your help
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论