2023年4月20日 01:54:52go评论72阅读模式

英文:

Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset

问题

我正在为我的毕业项目创建一个自然语言处理模型。我目前正在使用Tensorflow Keras LSTM模型来训练模型。我发现了一份在线指南，因为我的数据集是用僧伽罗语编写的，但该教程中的代码过时或存在一些问题。当前每次运行以下代码时：

import fasttext
import fasttext.util

ft = fasttext.load_model("cc.si.300.bin")
ft.get_dimension()

# 将FastText单词向量与数据集中的单词进行映射
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    print(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

我收到以下错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-797ce5392360> in <cell line: 9>()
     11     print(word)
     12     if embedding_vector is not None:
---> 13         embeddings_matrix[i] = embedding_vector

ValueError: could not broadcast input array from shape (300,) into shape (16,)

由于我是新手，我不知道如何准确解决这个问题，而且我在网站上也找不到合适的答案。

英文:

I am working on creating a NLP model for my final year project. I am currently using Tensorflow Keras LSTM model to train the model. I have found a online guide for this as my dataset is in Sinhala but the code in that tutorial is old or has some issues. Currently when ever I runs

import fasttext
import fasttext.util

ft = fasttext.load_model(&quot;cc.si.300.bin&quot;)
ft.get_dimension()

# Mapping FastText word vectors with word in the dataset 
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim));
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    print(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector;

I get the error below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
&lt;ipython-input-44-797ce5392360&gt; in &lt;cell line: 9&gt;()
     11     print(word)
     12     if embedding_vector is not None:
---&gt; 13         embeddings_matrix[i] = embedding_vector;

ValueError: could not broadcast input array from shape (300,) into shape (16,)

As I am new to this, I don't know how to fix this issue exactly and I was not able to find a proper answer in the site too.

答案1

得分: 1

你没有显示创建embeddings_matrix时使用的vocab_size和embedding_dim的值。（此时，word_index尚未被赋任何值。）

但是，这似乎是尝试将一个300维向量放入另一个不适合的数组中可能引发的错误。

vocab_size和embedding_dim的值是多少？或者等效地，embeddings_matrix.shape在出现错误之前是什么？如果其第二维不是300，那么应该是，你应该更改设置控制值的（未显示的）代码，以确保embedding_matrix的大小正确。

（更基础地说，你确定需要将每个单词的向量从ft模型复制到自己的矩阵中吗？根据你接下来的计划，可能并不需要。）

英文:

You're not showing the values of vocab_size & embedding_dim that you're using to create your embeddings_matrix. (Also, at the time you're operating on it, word_index hasn't yet been established to hold any value.)

But, that seems like an error one might get trying to put a 300-dimensional vector into another array where it doesn't fit.

What are the values of vocab_size & embedding_dim – or equivalently what's embeddings_matrix.shape just before you get the error? If its 2nd dimension isn't 300, it should be, and you should change your (unshown) code that sets up the controlling values to ensure that embedding_matrix is the right size.

(More foundationally, are you sure you need to copy each word's vector out of the ft model into your own matrix? You might not, depending on what you're planning to do next.)

答案2

得分: 0

我发现了我犯的错误。我不够了解来解释这个，但似乎我在代码中之前定义的embedding_dim在实现fasttext功能时不支持。

# 将FastText词向量与数据集中的单词进行映射
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

谢谢你的帮助。

英文:

I have found out the mistake i made. I am not knowledge enough to explain this but it seems the embedding_dim I defied earlier in the code doesn't support when it comes to the implementing the fasttext functionalities.

# Mapping FastText word vectors with word in the dataset 
embedding_dim = ft.get_dimension()
embeddings_matrix = np.zeros((vocab_size+1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = ft.get_word_vector(word)
    if embedding_vector is not None:
        embeddings_matrix[i] = embedding_vector

Thank you for your help

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Could not broadcast input array from shape (300,) into shape when mapping FastText word vectors with word in the dataset

问题

答案1

答案2

ModuleNotFoundError: No module named 'XYZ', but when I prtint sys.path I can see the absolute path to folder I want to import

如何在Selenium中使用Python语法通过XPath定位

Pyspark – 具有时间条件的列

Python pip 不会下载/安装自制包的源代码。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论