2023年7月18日 09:38:44go评论86阅读模式

英文:

LSTM Encoder-Decoder stuck in plateau and not learning

问题

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

molteyhpr	010011000
dlkz	    0000
fabgovmgg	010010000
qgvowdykl	000100100
kgncpiot	00000110
pisvdf	    010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

        self.latent_dim = 256
		enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
		enc_lstm_layer  = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
		enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
		# We discard 'enc_outputs' and only keep the states.
		enc_states = [state_h, state_c]
		# Set up the decoder, using 'enc_states' as initial state.
		dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))
		# We set up our decoder to return full output sequences,
		# and to return internal states as well. We don't use the
		# return states in the training model, but we will use them in inference.
		dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)
		dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
		dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
		dec_outputs = dec_dense_layer(dec_outputs)
		# Define the model that will turn
		# 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'
		model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

    def _generator(self, enc_data, dec_data, is_training):
		enc_oh_input_batch  = None
		dec_oh_input_batch  = None
		dec_oh_output_batch = None
		enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
		dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
		current_idx = 0
		samples_len = len(enc_data)
		while True:
			# Create zero batch arrays
			enc_oh_input_batch = np.zeros(
				(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
			dec_oh_input_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
			dec_oh_output_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
			# Compile batch
			for i in range(self.batch_size):
				# when we get to the end of samples - start over
				if i + current_idx >= samples_len:
					current_idx = 0
					if is_training:
						self.epoch += 1
				tokens_in  = enc_data[i + current_idx]
				tokens_out = dec_data[i + current_idx]
				# vectorize encoder input
				for t, token in enumerate(tokens_in):
					enc_oh_input_batch[i, t, token] = 1
				enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
				# vectorize decoder input and output
				for t, token in enumerate(tokens_out):
					dec_oh_input_batch[i, t, token] = 1
					if t > 0:
						# self.dec_oh_output will be ahead by one timestep
						# and will not include the start character.
						dec_oh_output_batch[i, t - 1, token] = 1
					dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
			current_idx += self.batch_size
			yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

        h = self.model.fit(self.source.train_generator(),
			batch_size       = self.conf.batch_size,
			epochs           = self.conf.epochs,
			initial_epoch    = self.source.epoch,
			steps_per_epoch  = batches_per_epoch,
			validation_steps = batches_per_epoch,
			validation_data  = self.source.validation_generator(),
			validation_freq  = self.conf.validation_freq
		)

With these settings:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = 'categorical_crossentropy'
metrics          = ['accuracy']
optimizer = {
	'name'          : 'Adam',
	'learning_rate' : 0.0001,
}

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy

英文:

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

molteyhpr	010011000
dlkz	    0000
fabgovmgg	010010000
qgvowdykl	000100100
kgncpiot	00000110
pisvdf	    010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

        self.latent_dim = 256
		enc_input_layer = Input(name=&quot;enc_input&quot;, shape=(None, self.source.enc_vocab_len))
		enc_lstm_layer  = LSTM(self.latent_dim, name=&quot;enc_lstm&quot;, return_state=True)
		enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
		# We discard &#39;enc_outputs&#39; and only keep the states.
		enc_states = [state_h, state_c]
		# Set up the decoder, using &#39;enc_states&#39; as initial state.
		dec_input_layer = Input(name=&quot;dec_input&quot;, shape=(None, self.source.dec_vocab_len))
		# We set up our decoder to return full output sequences,
		# and to return internal states as well. We don&#39;t use the
		# return states in the training model, but we will use them in inference.
		dec_lstm_layer = LSTM(self.latent_dim, name=&quot;dec_lstm&quot;, return_sequences=True, return_state=True)
		dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
		dec_dense_layer = Dense(self.source.dec_vocab_len, name=&quot;dec_dense&quot;, activation=&#39;softmax&#39;)
		dec_outputs = dec_dense_layer(dec_outputs)
		# Define the model that will turn
		# &#39;encoder_input_data&#39; &amp; &#39;decoder_input_data&#39; into &#39;decoder_target_data&#39;
		model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

    def _generator(self, enc_data, dec_data, is_training):
		enc_oh_input_batch  = None
		dec_oh_input_batch  = None
		dec_oh_output_batch = None
		enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
		dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
		current_idx = 0
		samples_len = len(enc_data)
		while True:
			# Create zero batch arrays
			enc_oh_input_batch = np.zeros(
				(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype=&#39;int8&#39;)
			dec_oh_input_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype=&#39;int8&#39;)
			dec_oh_output_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype=&#39;int8&#39;)
			# Compile batch
			for i in range(self.batch_size):
				# when we get to the end of samples - start over
				if i + current_idx &gt;= samples_len:
					current_idx = 0
					if is_training:
						self.epoch += 1
				tokens_in  = enc_data[i + current_idx]
				tokens_out = dec_data[i + current_idx]
				# vectorize encoder input
				for t, token in enumerate(tokens_in):
					enc_oh_input_batch[i, t, token] = 1
				enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
				# vectorize decoder input and output
				for t, token in enumerate(tokens_out):
					dec_oh_input_batch[i, t, token] = 1
					if t &gt; 0:
						# self.dec_oh_output will be ahead by one timestep
						# and will not include the start character.
						dec_oh_output_batch[i, t - 1, token] = 1
					dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
			current_idx += self.batch_size
			yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

        h = self.model.fit(self.source.train_generator(),
			batch_size       = self.conf.batch_size,
			epochs           = self.conf.epochs,
			initial_epoch    = self.source.epoch,
			steps_per_epoch  = batches_per_epoch,
			validation_steps = batches_per_epoch,
			validation_data  = self.source.validation_generator(),
			validation_freq  = self.conf.validation_freq
		)

With these settings:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = &#39;categorical_crossentropy&#39;
metrics          = [&#39;accuracy&#39;]
optimizer = {
	&#39;name&#39;          : &#39;Adam&#39;,
	&#39;learning_rate&#39; : 0.0001,
}

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630

What am I doing wrong?

答案1

得分: 1

在原始的逐字符翻译任务中，解码器的输入和目标数据会提前一时间步进行移位，因为解码器需要基于当前和过去的字符来预测下一个字符。

然而，在您的任务中，目标是直接将输入中的每个字符映射到输出中的字符。因此，无需移位目标数据。

我已经更改了对编码器输入数据和解码器目标数据进行预处理的for循环。

尝试使用以下代码：

for t, char in enumerate(target_text):
   decoder_input_data[i, t, target_token_index[char]] = 1.
   decoder_target_data[i, t, target_token_index[char]] = 1.

英文:

In the original character-by-character translation task, the decoder input and target data are shifted by one time step because the decoder needs to predict the next character based on the current and past characters.

However, in your task, the goal is to map each character in the input directly to a character in the output. So, there's no need to shift the target data.

I have changed the for-loop where the encoder_input_data and decoder_target_data are pre-processed.

Try using this:

for t, char in enumerate(target_text):
   decoder_input_data[i, t, target_token_index[char]] = 1.
   decoder_target_data[i, t, target_token_index[char]] = 1.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

LSTM编码器-解码器陷入平稳状态，无法学习。

问题

答案1

如何在Python中加载数据集并处理它，而不会超出内存限制？

如何在Keras中缩放梯度范数

在TensorFlow上训练MNIST数据集的特定标签分类器。

如何合并多通道图像？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。