Keras LSTM 中的 2D 输入的含义

huangapple go评论48阅读模式
英文:

Meaning of 2D input in Keras LSTM

问题

In Keras, LSTM的输入形状是[batch, timesteps, feature]。如果我将输入指定为keras.Input(shape=(20, 1)),并将一个(100, 20, 1)的矩阵作为输入,那么在这种情况下考虑的批次数量是多少?在这种情况下,批量大小是100,每个批次中有20个时间步。

英文:

In Keras, LSTM is in the shape of [batch, timesteps, feature]. What if I indicate the input as keras.Input(shape=(20, 1)) and feed a matrix of (100, 20, 1) as input? What's the number of batch that it's considering in this case? Is the batch size 100 with 20 time stems in each batch?

答案1

得分: 1

以下是要翻译的部分:

TL;DR

在您的情况中,batch, timestep, features 被定义为 None, 20, 1,其中 batch 代表在 model.fit 期间传递的 batch_size 参数。模型在此之前不需要知道这个值。因此,在定义输入层(或 LSTM 层的输入形状)时,您只需定义 (timesteps, features),即 (20, 1)。简单的 model.summary() 将显示输入大小被翻译为 (None, 20, 1)

Deeper dive into the subject

了解正在发生的事情的好方法是简单地打印模型的摘要。让我以一个简单的示例来演示步骤 -

#创建一个简单的堆叠LSTM模型

from tensorflow.keras import layers, Model
import numpy as np

inp = layers.Input((20,1))                       #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)

model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()

如您在这里所看到的,展示了张量的流动(更具体地说是随着它们在网络中流动时的形状如何变化)。正如您所见,我使用的是函数式 API,它允许我明确创建形状为 20,1 的输入层,然后将其传递给 LSTM。但有趣的是,您可以看到此“输入”层的实际形状是 (None, 20, 1)。这就是您所提到的 batch, timesteps, features。时间步长为20,一个特征,因此很容易理解,但是 Nonebatch_size 参数的占位符,您在 model.fit 期间定义它。

#Fit model
X_train, y_train = np.random.random((100,20,1)), np.random.random((100,))
model.fit(X_train, y_train, batch_size=10, epochs=2)

在这个示例中,我将 batch_size 设置为10。这意味着在训练模型时,每个“步骤”将传递形状为 (10, 20, 1) 的批次到模型中,并且每个时代中会有10个这样的步骤,因为训练数据的总大小是 (100, 20, 1)。这由进度条前面的 10/10 表示。

还有一个有趣的事情需要注意,只要遵守模型训练和批次大小约束的基本规则,您不一定需要定义输入的维度。这是一个例子。在这里,我将时间步长的数量定义为 None,这意味着现在可以传递具有可变长度时间步长的数据(例如,可变长度的句子)来使用 LSTM 层进行编码。

这意味着模型不需要提前知道它将要处理多少时间步,类似于它不需要提前知道它将获得的 batch_size 是多少。这些信息可以在 model.fit 期间解释或作为参数传递。请注意,model.summary() 仅仅扩展了关于时间步长维度的缺乏信息到后续层。

需要注意的一点是,LSTM 可以处理可变大小的输入,因为您只需在上面的示例中将时间步长设置为 None,但必须确保每个批次独立地具有相同数量的时间步。换句话说,要处理具有不同大小句子的数据,可以使用批次大小为1,以便每个批次具有一致的大小,或者创建一个生成器,它创建具有相同长度的批次,并将具有恒定长度的句子组合在一起。例如,第一个批次仅包含5个 (20,1) 句子,第二个批次仅包含5个 (25,1) 句子,依此类推。第二种方法比第一种更快,但可能更麻烦设置。

Bonus

此外,对于任何对 batch_size 对模型训练的影响感到好奇的人,较大的 batch_size 可能对加快计算速度很有帮助,因为它被偏好于降低学习速率,但它可能会导致所谓的“泛化差距”。这个主题在这篇出色的论文中得到了很好的探讨。

这两篇论文应该会很好地阐明如何使用 batch_size 作为模型训练的强大参数,这往往被忽视。

英文:

TL;DR

The batch, timestep, features in your case is defined as None, 20, 1, where the batch represents the batch_size parameter passed during model.fit. The model does not need to know this before hand. Therefore, when you define your input layer (or your LSTM layer's input shape), you simply defined (timesteps, features) which is (20, 1). A simple model.summary() would show you that that input size is translated to (None, 20, 1) while creating the computation graph.


Deeper dive into the subject

A good way to understand whats going on is to simply print the summary of your model. Let me take a simple example here and walk you through the steps -

#Creating a simple stacked LSTM model

from tensorflow.keras import layers, Model
import numpy as np

inp = layers.Input((20,1))                       #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)

model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_10 (InputLayer)       [(None, 20, 1)]           0         
                                                                 
 lstm_14 (LSTM)              (None, 20, 5)             140       
                                                                 
 lstm_15 (LSTM)              (None, 4)                 160       
                                                                 
 dense_8 (Dense)             (None, 1)                 5         
                                                                 
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________

As you see here, the flow of tensors (more specifically how the shapes of tensors change as they flow down the network) are displayed. As you can see, I am using the functional API which allows me to specifically create an input layer of the shape 20,1 which I then pass to the LSTM. But interestingly, you can see that the actual shape of this Input layer is (None, 20, 1). This is the batch, timesteps, features that you are also referring to.

The time steps are 20, and a single feature, so thats easy to understand, however, the None is a placeholder for the batch_size parameter which you define during the model.fit

#Fit model
X_train, y_train = np.random.random((100,20,1)), np.random.random((100,))
model.fit(X_train, y_train, batch_size=10, epochs=2)
Epoch 1/2
10/10 [==============================] - 1s 4ms/step - loss: 0.6938
Epoch 2/2
10/10 [==============================] - 0s 3ms/step - loss: 0.6932

In this example, I set the batch_size to 10. This means, that when you train the model, each "step" will pass batches of the shape (10, 20, 1) to the model and there will be 10 such steps in each epoch, because the overall size of the training data is (100, 20, 1). This is indicated by the 10/10 that you see in front of the progress bar for each epoch.


Another interesting thing to note, is that you dont necessarily need to define the dimensions of the input as long as your obey the basic rules of model training and batch size constraints. Here is an example. Here I define the number of timesteps as None which means that I can now pass variable length timesteps (variable length sentences for an example) to encode using the LSTM layers.

from tensorflow.keras import layers, Model
import numpy as np

inp = layers.Input((None,1))                       #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)

model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_12 (InputLayer)       [(None, None, 1)]         0         
                                                                 
 lstm_18 (LSTM)              (None, None, 5)           140       
                                                                 
 lstm_19 (LSTM)              (None, 4)                 160       
                                                                 
 dense_10 (Dense)            (None, 1)                 5         
                                                                 
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________

This means that the model doesn't need to know how many timesteps it will have to work with beforehand, similar to the fact that it doesn't need to know what batch_size it would get beforehand. These things can be interpreted during the model.fit or passed as a parameter. Notice the model.summary() simply extends this lack of information around the timesteps dimension to the subsequent layers.

> An important note though - LSTMs can work with variable size inputs because all you have to do is pass the timesteps as None in the example above, however, you have to ensure that each batch independently has the same number of time steps. In other words, to work with variable-sized sentences say [(20,1), (25, 1), (20, 1), ...] either use a batch size of 1 so that each batch has a consistent size, or create a generator which creates batches of equal batch_size and combine sentences with constant length. For example the first batch is only 5 (20,1) sentences, the second batch is only 5 (25,1) sentences etc. The second method is faster than the first, but may be more painful to setup.


Bonus

Also, for anyone curious around what is the effect of batch_size on model training, a large batch_size might be very helpful to speed up computation speed as its preferred over decaying the learning rate but it can cause what is known as a Generalization Gap. This topic is well explored in this awesome paper.

These 2 papers should give a lot of clarity around how to use batch_size as a powerful parameter for your model training, which is quite often ignored.

huangapple
  • 本文由 发表于 2023年2月16日 07:19:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466325.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定