2023年6月27日 20:43:37go评论72阅读模式

英文:

Random forest for classifying, cannot reshape array of size 16884608 into shape (4221152,3)

问题

我正在使用随机森林脚本进行光学图像的目标检测，图像格式为tif。我使用以下脚本，但总是出现相同的错误，我不知道如何解决它。原始脚本在这里链接。

from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()

# 读取图像和ROI图像
img_RS = 'image_satellite.TIF'
roi_ds = gdal.Open('landuse_roi.tif')

# 加载图像数据
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
    img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()

roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)

栅格数据集通过0和1来表示，其中0表示无数据，1表示正确数据。在栅格之前是一个Shapefile，其中1表示必须分类的多边形。但在这里不清楚为什么有3个标签，实际上结果是0、1和3，而我只有两个类。

labels = np.unique(roi[roi > 0])

我们将需要一个包含特征的“X”矩阵和一个包含标签的“y”数组，这些将有n_samples行。在其他语言中，需要分配它们然后循环填充它们，但NumPy可能更快。

X = img[roi >= 0, :]  # 包括第8波段，即Fmask
y = roi[roi >= 0]
# 使用Fmask屏蔽云、云阴影和雪
clear = X[:, 1] <= 4
X = X[clear, :4]  # 现在我们可以去掉Fmask波段了
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df['truth'] = y
df['predict'] = rf.predict(X)

获取我们的完整图像，忽略Fmask波段，并将其重新整形为长2D数组（nrow * ncol，nband）进行分类。

new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)

在这里我遇到了这个错误：ValueError: 无法将大小为16884608的数组重塑为形状(4221152,3)。

英文:

I am applying one script using random forest for object detection on optical image. I use tif as image format. I use this script, below, but I have always te same error, and I don't know as resolved it. The original script is here link.

from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()

# Read in our image and ROI image
img_RS = &#39;image_satellite.TIF&#39;
roi_ds = gdal.Open(&#39;landuse_roi.tif&#39;)

# load image data
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
    img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
    
roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)

The raster dataset features by 0 and 1, whereas 0 identifies no data and 1 correct data. Before the raster was one shapefile where 1 identifies polygons that they have to classified. And here it is not clear, why there are 3 labels, infact the result is 0, 1 and 3, when I have only two classes.

labels = np.unique(roi[roi &gt; 0])

We will need a "X" matrix containing our features, and a "y" array containing our labels
These will have n_samples rows.
In other languages would need to allocate these and them loop to fill them, but NumPy can be faster.

X = img[roi &gt;= 0, :]  # include 8th band, which is Fmask, for now
y = roi[roi &gt;= 0]
# Mask out clouds, cloud shadows, and snow using Fmask
clear = X[:, 1] &lt;= 4
X = X[clear, :4]  # we can ditch the Fmask band now
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df[&#39;truth&#39;] = y
df[&#39;predict&#39;] = rf.predict(X)

Take our full image, ignore the Fmask band, and reshape into long 2d array (nrow * ncol, nband) for classification

new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)

Here I have this error: ValueError: cannot reshape array of size 16884608 into shape (4221152,3)

答案1

得分: 0

阅读 np.reshape() 的文档时，它说

形状应与原始形状兼容。

我理解它的意思是，如果你有一个 34 的数组（12个值），新形状也应该有9个值，这样你可以将 34 重塑为 121、43 或 62，但你不能重塑为 33（它有9个值，你会失去信息）。

在你的情况下，错误提示说

无法将大小为 16884608 的数组重塑为形状 (4221152,3)。

如果我们做数学计算：16884608 的数量不等于 4221152*3 = 12663456。

文档中说一个形状维度可以为 -1（函数会用所需的值来填充它）

一个形状维度可以为 -1。在这种情况下，该值将从数组的长度和剩余维度中推断出来。

也许你可以使用这个来解决问题。

英文:

Reading through the docs of np.reshape(), it says that
>the shape should be compatible with the original shape.

As I understand it, it means that if you have an array of 34 (12 values) the new shape should also have 9 values, so you can respahe a 34 to a 121 or a 43 or a 62, but you cannot reshape into a 33 (it has 9 values, you'd lose info).

In your case, the error says
>cannot reshape array of size 16884608 into shape (4221152,3)

If we do the math: 1688468 is not equal in number to 4221152*3 = 12663456

In the docs it says that one shape dimension can be -1 (and the function will fill that up with what it needs to be)
>One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

Maybe you can use that to solve the problem

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无法将大小为16884608的数组重塑为形状（4221152,3）

问题

答案1

Python模块可以由多个文件组成吗？

Pip install dotenv, Error 1 Windows 10 Pro

如何为Python的click.version_option添加另一个名称选项？

django.db.utils.IntegrityError: 这个错误困扰了我好几天

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论