无法将大小为16884608的数组重塑为形状(4221152,3)

huangapple go评论72阅读模式
英文:

Random forest for classifying, cannot reshape array of size 16884608 into shape (4221152,3)

问题

我正在使用随机森林脚本进行光学图像的目标检测,图像格式为tif。我使用以下脚本,但总是出现相同的错误,我不知道如何解决它。原始脚本在这里链接

from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()

# 读取图像和ROI图像
img_RS = 'image_satellite.TIF'
roi_ds = gdal.Open('landuse_roi.tif')

# 加载图像数据
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
    img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()

roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)

栅格数据集通过0和1来表示,其中0表示无数据,1表示正确数据。在栅格之前是一个Shapefile,其中1表示必须分类的多边形。但在这里不清楚为什么有3个标签,实际上结果是0、1和3,而我只有两个类。

labels = np.unique(roi[roi > 0])

我们将需要一个包含特征的“X”矩阵和一个包含标签的“y”数组,这些将有n_samples行。在其他语言中,需要分配它们然后循环填充它们,但NumPy可能更快。

X = img[roi >= 0, :]  # 包括第8波段,即Fmask
y = roi[roi >= 0]
# 使用Fmask屏蔽云、云阴影和雪
clear = X[:, 1] <= 4
X = X[clear, :4]  # 现在我们可以去掉Fmask波段了
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df['truth'] = y
df['predict'] = rf.predict(X)

获取我们的完整图像,忽略Fmask波段,并将其重新整形为长2D数组(nrow * ncol,nband)进行分类。

new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)

在这里我遇到了这个错误:ValueError: 无法将大小为16884608的数组重塑为形状(4221152,3)

英文:

I am applying one script using random forest for object detection on optical image. I use tif as image format. I use this script, below, but I have always te same error, and I don't know as resolved it. The original script is here link.

from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()

# Read in our image and ROI image
img_RS = &#39;image_satellite.TIF&#39;
roi_ds = gdal.Open(&#39;landuse_roi.tif&#39;)

# load image data
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
    img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
    
roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)

The raster dataset features by 0 and 1, whereas 0 identifies no data and 1 correct data. Before the raster was one shapefile where 1 identifies polygons that they have to classified. And here it is not clear, why there are 3 labels, infact the result is 0, 1 and 3, when I have only two classes.

labels = np.unique(roi[roi &gt; 0])

We will need a "X" matrix containing our features, and a "y" array containing our labels
These will have n_samples rows.
In other languages would need to allocate these and them loop to fill them, but NumPy can be faster.

X = img[roi &gt;= 0, :]  # include 8th band, which is Fmask, for now
y = roi[roi &gt;= 0]
# Mask out clouds, cloud shadows, and snow using Fmask
clear = X[:, 1] &lt;= 4
X = X[clear, :4]  # we can ditch the Fmask band now
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df[&#39;truth&#39;] = y
df[&#39;predict&#39;] = rf.predict(X)

Take our full image, ignore the Fmask band, and reshape into long 2d array (nrow * ncol, nband) for classification

new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)

Here I have this error: ValueError: cannot reshape array of size 16884608 into shape (4221152,3)

答案1

得分: 0

阅读 np.reshape() 的文档时,它说

形状应与原始形状兼容。

我理解它的意思是,如果你有一个 34 的数组(12个值),新形状也应该有9个值,这样你可以将 34 重塑为 121、43 或 62,但你不能重塑为 33(它有9个值,你会失去信息)。

在你的情况下,错误提示说

无法将大小为 16884608 的数组重塑为形状 (4221152,3)。

如果我们做数学计算:16884608 的数量不等于 4221152*3 = 12663456

文档中说一个形状维度可以为 -1(函数会用所需的值来填充它)

一个形状维度可以为 -1。在这种情况下,该值将从数组的长度和剩余维度中推断出来。

也许你可以使用这个来解决问题。

英文:

Reading through the docs of np.reshape(), it says that
>the shape should be compatible with the original shape.

As I understand it, it means that if you have an array of 34 (12 values) the new shape should also have 9 values, so you can respahe a 34 to a 121 or a 43 or a 62, but you cannot reshape into a 33 (it has 9 values, you'd lose info).

In your case, the error says
>cannot reshape array of size 16884608 into shape (4221152,3)

If we do the math: 1688468 is not equal in number to 4221152*3 = 12663456

In the docs it says that one shape dimension can be -1 (and the function will fill that up with what it needs to be)
>One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

Maybe you can use that to solve the problem

huangapple
  • 本文由 发表于 2023年6月27日 20:43:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定