重新塑造GRU的输入

huangapple go评论72阅读模式
英文:

Reshape Input for GRU

问题

train_data.shape
(11458167, 10)

定义输入形状

feature_num = 10 # 特征数量
timesteps = 1

重塑输入形状为 (样本数量, 时间步长, 特征数量)

train_data_reshaped = train_data.reshape(train_data.shape[0], timesteps, feature_num)
test_data_reshaped = test_data.reshape(test_data.shape[0], timesteps, feature_num)

我有一个包含10个特征的数据集,我想尝试不同的时间步长值,因为值为1无法捕捉到我的数据序列,但是当我更改时间步长值时,出现了以下错误:

ValueError: 无法将大小为114581670的数组重塑为形状 (11458167,10,10)

您能解释一下为什么会出现这个错误以及如何解决它吗?

尝试不同的时间步长值以找到最佳值。

英文:
train_data.shape
(11458167, 10)

# define the input shape
feature_num=10 # number of features 
timesteps=1

# Reshape the input to shape (num_instances, timesteps, num_features)
train_data_reshaped = train_data.reshape(train_data.shape[0], timesteps, feature_num)
test_data_reshaped=test_data.reshape (test_data.shape[0], timesteps, feature_num)


I have a dataset of 10 features and I want to try different time steps values because the value 1 will no capture the sequence in my data, however, when I change the time steps value I got this error:

ValueError: cannot reshape array of size 114581670 into shape (11458167,10,10)

Can you explain to me why this error is happening and how can I solve it ?

Try different Time steps to find the optimal value

答案1

得分: 2

不能在保持“num_instances”和“features”维度与之前相同的情况下进行“reshape”。正确的方法是“train_data.reshape(-1, timesteps, features)”。但前提是实例的数量可以被时间步骤整除。

此外,您可以创建两种类型的窗口。非重叠窗口只是像上面提到的那样重新塑造数据。滑动窗口或重叠窗口,我们在数据上滑动。因此,一个数据点可以包含在多个窗口中。

然而,您不需要自己这样做。我写了一个小型实用程序库,叫做 mlnext-framework,其中包含这种功能。 "temporalize" 方法是 numpy.lib.stride_tricks.sliding_window_view 生成非重叠(reshape)和重叠(滑动)窗口的包装器。

给定一些数据:

>>> import numpy as np
>>> import mlnext

>>> i, j = np.ogrid[:6, :3]
>>> data = 10 * i + j
>>> print(data)
[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]
 [50 51 52]]

非重叠窗口:

>>> # 将2D数据转换为3D
>>> mlnext.temporalize(data=data, timesteps=2, verbose=True)
原始形状(6, 3)新形状(3, 2, 3)
[[[ 0  1  2]
  [10 11 12]]
  [[20 21 22]
   [30 31 32]]
  [[40 41 42]
   [50 51 52]]]

正如您所看到的,每个数据点只包含在一个窗口中。如果原始形状不能被时间步骤均匀地整除,那么多余的数据点(在末尾)将被丢弃。

滑动窗口:

>>> # 用stride=1将2D转换为3D
>>> mlnext.temporalize(data, timesteps=3, stride=1, verbose=True)
原始形状(6, 3)新形状(4, 3, 3)
[[[ 0  1  2]
  [10 11 12]
  [20 21 22]]
 [[10 11 12]
  [20 21 22]
  [30 31 32]]
 [[20 21 22]
  [30 31 32]
  [40 41 42]]
 [[30 31 32]
  [40 41 42]
  [50 51 52]]]

正如您所看到的,第二个窗口以“[10 11 12]”开始,这是总体上的第二个数据点。步长可以使用“stride”进行配置。

英文:

You cannot reshape while keeping the dimensions of num_instances and features the same as before. Correct would be train_data.reshape(-1, timesteps, features). But this only works, if the number of instances can be divided by time steps without rest.

Furthermore, there are two types of windows that you can create. Non-overlapping windows that simply reshape the data as mentioned above. Sliding windows or overlapping windows where we slide over the data. Thereby, a data point can be contained in multiple windows.

However, you do not need to do this yourself. I have written a small utility library called mlnext-framework that contains such functionality. The temporalize method is a wrapper around numpy.lib.stride_tricks.sliding_window_view for generating non-overlapping (reshape) and overlapping (sliding) windows.

Given some data:

>>> import numpy as np
>>> import mlnext

>>> i, j = np.ogrid[:6, :3]
>>> data = 10 * i + j
>>> print(data)
[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]
 [50 51 52]]

Non-Overlapping windows:

>>> # Transform 2d data into 3d
>>> mlnext.temporalize(data=data, timesteps=2, verbose=True)
Old shape: (6, 3). New shape: (3, 2, 3).
[[[ 0  1  2]
  [10 11 12]]
  [[20 21 22]
   [30 31 32]]
  [[40 41 42]
   [50 51 52]]]

As you can see, each data point is contained in exactly one window. If the original shape could not be evenly divided by time steps, then the superfluous data points (at the end) would be discarded.

Sliding windows:

>>> # Transform 2d into 3d with stride=1
>>> mlnext.temporalize(data, timesteps=3, stride=1, verbose=True)
Old shape: (6, 3). New shape: (4, 3, 3).
[[[ 0  1  2]
  [10 11 12]
  [20 21 22]]
 [[10 11 12]
  [20 21 22]
  [30 31 32]]
 [[20 21 22]
  [30 31 32]
  [40 41 42]]
 [[30 31 32]
  [40 41 42]
  [50 51 52]]]

As you can see, the second window starts with [10 11 12] which is the second data point overall. The step size can be configured with stride.

huangapple
  • 本文由 发表于 2023年6月12日 19:04:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456032.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定