英文:
Meaning of 2D input in Keras LSTM
问题
In Keras, LSTM的输入形状是[batch, timesteps, feature]。如果我将输入指定为keras.Input(shape=(20, 1)),并将一个(100, 20, 1)的矩阵作为输入,那么在这种情况下考虑的批次数量是多少?在这种情况下,批量大小是100,每个批次中有20个时间步。
英文:
In Keras, LSTM is in the shape of [batch, timesteps, feature]. What if I indicate the input as keras.Input(shape=(20, 1)) and feed a matrix of (100, 20, 1) as input? What's the number of batch that it's considering in this case? Is the batch size 100 with 20 time stems in each batch?
答案1
得分: 1
以下是要翻译的部分:
TL;DR
在您的情况中,batch, timestep, features
被定义为 None, 20, 1
,其中 batch 代表在 model.fit
期间传递的 batch_size
参数。模型在此之前不需要知道这个值。因此,在定义输入层(或 LSTM 层的输入形状)时,您只需定义 (timesteps, features)
,即 (20, 1)
。简单的 model.summary()
将显示输入大小被翻译为 (None, 20, 1)
。
Deeper dive into the subject
了解正在发生的事情的好方法是简单地打印模型的摘要。让我以一个简单的示例来演示步骤 -
#创建一个简单的堆叠LSTM模型
from tensorflow.keras import layers, Model
import numpy as np
inp = layers.Input((20,1)) #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
如您在这里所看到的,展示了张量的流动(更具体地说是随着它们在网络中流动时的形状如何变化)。正如您所见,我使用的是函数式 API,它允许我明确创建形状为 20,1
的输入层,然后将其传递给 LSTM。但有趣的是,您可以看到此“输入”层的实际形状是 (None, 20, 1)
。这就是您所提到的 batch, timesteps, features
。时间步长为20,一个特征,因此很容易理解,但是 None
是 batch_size
参数的占位符,您在 model.fit
期间定义它。
#Fit model
X_train, y_train = np.random.random((100,20,1)), np.random.random((100,))
model.fit(X_train, y_train, batch_size=10, epochs=2)
在这个示例中,我将 batch_size
设置为10。这意味着在训练模型时,每个“步骤”将传递形状为 (10, 20, 1)
的批次到模型中,并且每个时代中会有10个这样的步骤,因为训练数据的总大小是 (100, 20, 1)。这由进度条前面的 10/10
表示。
还有一个有趣的事情需要注意,只要遵守模型训练和批次大小约束的基本规则,您不一定需要定义输入的维度。这是一个例子。在这里,我将时间步长的数量定义为 None
,这意味着现在可以传递具有可变长度时间步长的数据(例如,可变长度的句子)来使用 LSTM 层进行编码。
这意味着模型不需要提前知道它将要处理多少时间步,类似于它不需要提前知道它将获得的 batch_size 是多少。这些信息可以在 model.fit
期间解释或作为参数传递。请注意,model.summary()
仅仅扩展了关于时间步长维度的缺乏信息到后续层。
需要注意的一点是,LSTM 可以处理可变大小的输入,因为您只需在上面的示例中将时间步长设置为
None
,但必须确保每个批次独立地具有相同数量的时间步。换句话说,要处理具有不同大小句子的数据,可以使用批次大小为1,以便每个批次具有一致的大小,或者创建一个生成器,它创建具有相同长度的批次,并将具有恒定长度的句子组合在一起。例如,第一个批次仅包含5个 (20,1) 句子,第二个批次仅包含5个 (25,1) 句子,依此类推。第二种方法比第一种更快,但可能更麻烦设置。
Bonus
此外,对于任何对 batch_size
对模型训练的影响感到好奇的人,较大的 batch_size
可能对加快计算速度很有帮助,因为它被偏好于降低学习速率,但它可能会导致所谓的“泛化差距”。这个主题在这篇出色的论文中得到了很好的探讨。
这两篇论文应该会很好地阐明如何使用 batch_size
作为模型训练的强大参数,这往往被忽视。
英文:
TL;DR
The batch, timestep, features
in your case is defined as None, 20, 1
, where the batch represents the batch_size
parameter passed during model.fit
. The model does not need to know this before hand. Therefore, when you define your input layer (or your LSTM layer's input shape), you simply defined (timesteps, features)
which is (20, 1)
. A simple model.summary()
would show you that that input size is translated to (None, 20, 1)
while creating the computation graph.
Deeper dive into the subject
A good way to understand whats going on is to simply print the summary of your model. Let me take a simple example here and walk you through the steps -
#Creating a simple stacked LSTM model
from tensorflow.keras import layers, Model
import numpy as np
inp = layers.Input((20,1)) #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 20, 1)] 0
lstm_14 (LSTM) (None, 20, 5) 140
lstm_15 (LSTM) (None, 4) 160
dense_8 (Dense) (None, 1) 5
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________
As you see here, the flow of tensors (more specifically how the shapes of tensors change as they flow down the network) are displayed. As you can see, I am using the functional API which allows me to specifically create an input layer of the shape 20,1
which I then pass to the LSTM. But interestingly, you can see that the actual shape of this Input
layer is (None, 20, 1)
. This is the batch, timesteps, features
that you are also referring to.
The time steps are 20, and a single feature, so thats easy to understand, however, the None
is a placeholder for the batch_size
parameter which you define during the model.fit
#Fit model
X_train, y_train = np.random.random((100,20,1)), np.random.random((100,))
model.fit(X_train, y_train, batch_size=10, epochs=2)
Epoch 1/2
10/10 [==============================] - 1s 4ms/step - loss: 0.6938
Epoch 2/2
10/10 [==============================] - 0s 3ms/step - loss: 0.6932
In this example, I set the batch_size
to 10. This means, that when you train the model, each "step" will pass batches of the shape (10, 20, 1)
to the model and there will be 10 such steps in each epoch, because the overall size of the training data is (100, 20, 1). This is indicated by the 10/10
that you see in front of the progress bar for each epoch.
Another interesting thing to note, is that you dont necessarily need to define the dimensions of the input as long as your obey the basic rules of model training and batch size constraints. Here is an example. Here I define the number of timesteps as None
which means that I can now pass variable length timesteps (variable length sentences for an example) to encode using the LSTM layers.
from tensorflow.keras import layers, Model
import numpy as np
inp = layers.Input((None,1)) #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(None, None, 1)] 0
lstm_18 (LSTM) (None, None, 5) 140
lstm_19 (LSTM) (None, 4) 160
dense_10 (Dense) (None, 1) 5
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________
This means that the model doesn't need to know how many timesteps it will have to work with beforehand, similar to the fact that it doesn't need to know what batch_size it would get beforehand. These things can be interpreted during the model.fit
or passed as a parameter. Notice the model.summary()
simply extends this lack of information around the timesteps dimension to the subsequent layers.
> An important note though - LSTMs can work with variable size inputs because all you have to do is pass the timesteps as None
in the example above, however, you have to ensure that each batch independently has the same number of time steps. In other words, to work with variable-sized sentences say [(20,1), (25, 1), (20, 1), ...]
either use a batch size of 1 so that each batch has a consistent size, or create a generator which creates batches of equal batch_size and combine sentences with constant length. For example the first batch is only 5 (20,1) sentences, the second batch is only 5 (25,1) sentences etc. The second method is faster than the first, but may be more painful to setup.
Bonus
Also, for anyone curious around what is the effect of batch_size
on model training, a large batch_size
might be very helpful to speed up computation speed as its preferred over decaying the learning rate but it can cause what is known as a Generalization Gap
. This topic is well explored in this awesome paper.
These 2 papers should give a lot of clarity around how to use batch_size
as a powerful parameter for your model training, which is quite often ignored.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论