英文:
Reshape Input for GRU
问题
train_data.shape
(11458167, 10)
定义输入形状
feature_num = 10 # 特征数量
timesteps = 1
重塑输入形状为 (样本数量, 时间步长, 特征数量)
train_data_reshaped = train_data.reshape(train_data.shape[0], timesteps, feature_num)
test_data_reshaped = test_data.reshape(test_data.shape[0], timesteps, feature_num)
我有一个包含10个特征的数据集,我想尝试不同的时间步长值,因为值为1无法捕捉到我的数据序列,但是当我更改时间步长值时,出现了以下错误:
ValueError: 无法将大小为114581670的数组重塑为形状 (11458167,10,10)
您能解释一下为什么会出现这个错误以及如何解决它吗?
尝试不同的时间步长值以找到最佳值。
英文:
train_data.shape
(11458167, 10)
# define the input shape
feature_num=10 # number of features
timesteps=1
# Reshape the input to shape (num_instances, timesteps, num_features)
train_data_reshaped = train_data.reshape(train_data.shape[0], timesteps, feature_num)
test_data_reshaped=test_data.reshape (test_data.shape[0], timesteps, feature_num)
I have a dataset of 10 features and I want to try different time steps values because the value 1 will no capture the sequence in my data, however, when I change the time steps value I got this error:
ValueError: cannot reshape array of size 114581670 into shape (11458167,10,10)
Can you explain to me why this error is happening and how can I solve it ?
Try different Time steps to find the optimal value
答案1
得分: 2
不能在保持“num_instances”和“features”维度与之前相同的情况下进行“reshape”。正确的方法是“train_data.reshape(-1, timesteps, features)”。但前提是实例的数量可以被时间步骤整除。
此外,您可以创建两种类型的窗口。非重叠窗口只是像上面提到的那样重新塑造数据。滑动窗口或重叠窗口,我们在数据上滑动。因此,一个数据点可以包含在多个窗口中。
然而,您不需要自己这样做。我写了一个小型实用程序库,叫做 mlnext-framework,其中包含这种功能。 "temporalize" 方法是 numpy.lib.stride_tricks.sliding_window_view 生成非重叠(reshape)和重叠(滑动)窗口的包装器。
给定一些数据:
>>> import numpy as np
>>> import mlnext
>>> i, j = np.ogrid[:6, :3]
>>> data = 10 * i + j
>>> print(data)
[[ 0 1 2]
[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]
[50 51 52]]
非重叠窗口:
>>> # 将2D数据转换为3D
>>> mlnext.temporalize(data=data, timesteps=2, verbose=True)
原始形状:(6, 3)。新形状:(3, 2, 3)。
[[[ 0 1 2]
[10 11 12]]
[[20 21 22]
[30 31 32]]
[[40 41 42]
[50 51 52]]]
正如您所看到的,每个数据点只包含在一个窗口中。如果原始形状不能被时间步骤均匀地整除,那么多余的数据点(在末尾)将被丢弃。
滑动窗口:
>>> # 用stride=1将2D转换为3D
>>> mlnext.temporalize(data, timesteps=3, stride=1, verbose=True)
原始形状:(6, 3)。新形状:(4, 3, 3)。
[[[ 0 1 2]
[10 11 12]
[20 21 22]]
[[10 11 12]
[20 21 22]
[30 31 32]]
[[20 21 22]
[30 31 32]
[40 41 42]]
[[30 31 32]
[40 41 42]
[50 51 52]]]
正如您所看到的,第二个窗口以“[10 11 12]”开始,这是总体上的第二个数据点。步长可以使用“stride”进行配置。
英文:
You cannot reshape
while keeping the dimensions of num_instances
and features
the same as before. Correct would be train_data.reshape(-1, timesteps, features)
. But this only works, if the number of instances can be divided by time steps without rest.
Furthermore, there are two types of windows that you can create. Non-overlapping windows that simply reshape the data as mentioned above. Sliding windows or overlapping windows where we slide over the data. Thereby, a data point can be contained in multiple windows.
However, you do not need to do this yourself. I have written a small utility library called mlnext-framework that contains such functionality. The temporalize
method is a wrapper around numpy.lib.stride_tricks.sliding_window_view for generating non-overlapping (reshape) and overlapping (sliding) windows.
Given some data:
>>> import numpy as np
>>> import mlnext
>>> i, j = np.ogrid[:6, :3]
>>> data = 10 * i + j
>>> print(data)
[[ 0 1 2]
[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]
[50 51 52]]
Non-Overlapping windows:
>>> # Transform 2d data into 3d
>>> mlnext.temporalize(data=data, timesteps=2, verbose=True)
Old shape: (6, 3). New shape: (3, 2, 3).
[[[ 0 1 2]
[10 11 12]]
[[20 21 22]
[30 31 32]]
[[40 41 42]
[50 51 52]]]
As you can see, each data point is contained in exactly one window. If the original shape could not be evenly divided by time steps, then the superfluous data points (at the end) would be discarded.
Sliding windows:
>>> # Transform 2d into 3d with stride=1
>>> mlnext.temporalize(data, timesteps=3, stride=1, verbose=True)
Old shape: (6, 3). New shape: (4, 3, 3).
[[[ 0 1 2]
[10 11 12]
[20 21 22]]
[[10 11 12]
[20 21 22]
[30 31 32]]
[[20 21 22]
[30 31 32]
[40 41 42]]
[[30 31 32]
[40 41 42]
[50 51 52]]]
As you can see, the second window starts with [10 11 12]
which is the second data point overall. The step size can be configured with stride
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论