英文:
Random forest for classifying, cannot reshape array of size 16884608 into shape (4221152,3)
问题
我正在使用随机森林脚本进行光学图像的目标检测,图像格式为tif。我使用以下脚本,但总是出现相同的错误,我不知道如何解决它。原始脚本在这里链接。
from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()
# 读取图像和ROI图像
img_RS = 'image_satellite.TIF'
roi_ds = gdal.Open('landuse_roi.tif')
# 加载图像数据
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)
栅格数据集通过0和1来表示,其中0表示无数据,1表示正确数据。在栅格之前是一个Shapefile,其中1表示必须分类的多边形。但在这里不清楚为什么有3个标签,实际上结果是0、1和3,而我只有两个类。
labels = np.unique(roi[roi > 0])
我们将需要一个包含特征的“X”矩阵和一个包含标签的“y”数组,这些将有n_samples行。在其他语言中,需要分配它们然后循环填充它们,但NumPy可能更快。
X = img[roi >= 0, :] # 包括第8波段,即Fmask
y = roi[roi >= 0]
# 使用Fmask屏蔽云、云阴影和雪
clear = X[:, 1] <= 4
X = X[clear, :4] # 现在我们可以去掉Fmask波段了
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df['truth'] = y
df['predict'] = rf.predict(X)
获取我们的完整图像,忽略Fmask波段,并将其重新整形为长2D数组(nrow * ncol,nband)进行分类。
new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)
在这里我遇到了这个错误:ValueError: 无法将大小为16884608的数组重塑为形状(4221152,3)
。
英文:
I am applying one script using random forest for object detection on optical image. I use tif as image format. I use this script, below, but I have always te same error, and I don't know as resolved it. The original script is here link.
from __future__ import print_function, division
from osgeo import gdal, gdal_array
import numpy as np
import matplotlib.pyplot as plt
gdal.UseExceptions()
gdal.AllRegister()
# Read in our image and ROI image
img_RS = 'image_satellite.TIF'
roi_ds = gdal.Open('landuse_roi.tif')
# load image data
img_ds = gdal.Open(img_RS, gdal.GA_ReadOnly)
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
roi = roi_ds.GetRasterBand(1).ReadAsArray().astype(np.uint8)
The raster dataset features by 0 and 1, whereas 0 identifies no data and 1 correct data. Before the raster was one shapefile where 1 identifies polygons that they have to classified. And here it is not clear, why there are 3 labels, infact the result is 0, 1 and 3, when I have only two classes.
labels = np.unique(roi[roi > 0])
We will need a "X" matrix containing our features, and a "y" array containing our labels
These will have n_samples rows.
In other languages would need to allocate these and them loop to fill them, but NumPy can be faster.
X = img[roi >= 0, :] # include 8th band, which is Fmask, for now
y = roi[roi >= 0]
# Mask out clouds, cloud shadows, and snow using Fmask
clear = X[:, 1] <= 4
X = X[clear, :4] # we can ditch the Fmask band now
y = y[clear]
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, oob_score=True, max_leaf_nodes=20, verbose=1, n_jobs=-1)
X = np.nan_to_num(X)
rf2 = rf.fit(X, y)
df = pd.DataFrame()
df['truth'] = y
df['predict'] = rf.predict(X)
Take our full image, ignore the Fmask band, and reshape into long 2d array (nrow * ncol, nband) for classification
new_shape = (img.shape[0] * img.shape[1], img.shape[2] - 1)
img_as_array = img[:, :, :np.int_(img.shape[2])].reshape(new_shape)
Here I have this error: ValueError: cannot reshape array of size 16884608 into shape (4221152,3)
答案1
得分: 0
阅读 np.reshape() 的文档时,它说
形状应与原始形状兼容。
我理解它的意思是,如果你有一个 34 的数组(12个值),新形状也应该有9个值,这样你可以将 34 重塑为 121、43 或 62,但你不能重塑为 33(它有9个值,你会失去信息)。
在你的情况下,错误提示说
无法将大小为 16884608 的数组重塑为形状 (4221152,3)。
如果我们做数学计算:16884608
的数量不等于 4221152*3 = 12663456
。
文档中说一个形状维度可以为 -1(函数会用所需的值来填充它)
一个形状维度可以为 -1。在这种情况下,该值将从数组的长度和剩余维度中推断出来。
也许你可以使用这个来解决问题。
英文:
Reading through the docs of np.reshape(), it says that
>the shape should be compatible with the original shape.
As I understand it, it means that if you have an array of 34 (12 values) the new shape should also have 9 values, so you can respahe a 34 to a 121 or a 43 or a 62, but you cannot reshape into a 33 (it has 9 values, you'd lose info).
In your case, the error says
>cannot reshape array of size 16884608 into shape (4221152,3)
If we do the math: 1688468
is not equal in number to 4221152*3 = 12663456
In the docs it says that one shape dimension can be -1 (and the function will fill that up with what it needs to be)
>One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
Maybe you can use that to solve the problem
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论