LSTM编码器-解码器陷入平稳状态,无法学习。

huangapple go评论111阅读模式
英文:

LSTM Encoder-Decoder stuck in plateau and not learning

问题

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

molteyhpr	010011000
dlkz	    0000
fabgovmgg	010010000
qgvowdykl	000100100
kgncpiot	00000110
pisvdf	    010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

        self.latent_dim = 256

		enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
		enc_lstm_layer  = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
		enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)

		# We discard 'enc_outputs' and only keep the states.
		enc_states = [state_h, state_c]

		# Set up the decoder, using 'enc_states' as initial state.
		dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))

		# We set up our decoder to return full output sequences,
		# and to return internal states as well. We don't use the
		# return states in the training model, but we will use them in inference.
		dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)

		dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
		dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
		dec_outputs = dec_dense_layer(dec_outputs)

		# Define the model that will turn
		# 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'

		model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

    def _generator(self, enc_data, dec_data, is_training):
		enc_oh_input_batch  = None
		dec_oh_input_batch  = None
		dec_oh_output_batch = None

		enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
		dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]

		current_idx = 0
		samples_len = len(enc_data)

		while True:
			# Create zero batch arrays
			enc_oh_input_batch = np.zeros(
				(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
			dec_oh_input_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
			dec_oh_output_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')

			# Compile batch
			for i in range(self.batch_size):
				# when we get to the end of samples - start over
				if i + current_idx >= samples_len:
					current_idx = 0
					if is_training:
						self.epoch += 1

				tokens_in  = enc_data[i + current_idx]
				tokens_out = dec_data[i + current_idx]

				# vectorize encoder input
				for t, token in enumerate(tokens_in):
					enc_oh_input_batch[i, t, token] = 1
				enc_oh_input_batch[i, t + 1:, enc_space_token] = 1

				# vectorize decoder input and output
				for t, token in enumerate(tokens_out):
					dec_oh_input_batch[i, t, token] = 1
					if t > 0:
						# self.dec_oh_output will be ahead by one timestep
						# and will not include the start character.
						dec_oh_output_batch[i, t - 1, token] = 1
					dec_oh_input_batch[i, t + 1:, dec_space_token] = 1

			current_idx += self.batch_size

			yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

        h = self.model.fit(self.source.train_generator(),
			batch_size       = self.conf.batch_size,
			epochs           = self.conf.epochs,
			initial_epoch    = self.source.epoch,
			steps_per_epoch  = batches_per_epoch,
			validation_steps = batches_per_epoch,
			validation_data  = self.source.validation_generator(),
			validation_freq  = self.conf.validation_freq
		)

With these settings:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = 'categorical_crossentropy'
metrics          = ['accuracy']
optimizer = {
	'name'          : 'Adam',
	'learning_rate' : 0.0001,
}

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy
英文:

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

molteyhpr	010011000
dlkz	    0000
fabgovmgg	010010000
qgvowdykl	000100100
kgncpiot	00000110
pisvdf	    010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

        self.latent_dim = 256

		enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
		enc_lstm_layer  = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
		enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)

		# We discard 'enc_outputs' and only keep the states.
		enc_states = [state_h, state_c]

		# Set up the decoder, using 'enc_states' as initial state.
		dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))

		# We set up our decoder to return full output sequences,
		# and to return internal states as well. We don't use the
		# return states in the training model, but we will use them in inference.
		dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)

		dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
		dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
		dec_outputs = dec_dense_layer(dec_outputs)

		# Define the model that will turn
		# 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'

		model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

    def _generator(self, enc_data, dec_data, is_training):
		enc_oh_input_batch  = None
		dec_oh_input_batch  = None
		dec_oh_output_batch = None

		enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
		dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]

		current_idx = 0
		samples_len = len(enc_data)

		while True:
			# Create zero batch arrays
			enc_oh_input_batch = np.zeros(
				(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
			dec_oh_input_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
			dec_oh_output_batch = np.zeros(
				(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')

			# Compile batch
			for i in range(self.batch_size):
				# when we get to the end of samples - start over
				if i + current_idx >= samples_len:
					current_idx = 0
					if is_training:
						self.epoch += 1

				tokens_in  = enc_data[i + current_idx]
				tokens_out = dec_data[i + current_idx]

				# vectorize encoder input
				for t, token in enumerate(tokens_in):
					enc_oh_input_batch[i, t, token] = 1
				enc_oh_input_batch[i, t + 1:, enc_space_token] = 1

				# vectorize decoder input and output
				for t, token in enumerate(tokens_out):
					dec_oh_input_batch[i, t, token] = 1
					if t > 0:
						# self.dec_oh_output will be ahead by one timestep
						# and will not include the start character.
						dec_oh_output_batch[i, t - 1, token] = 1
					dec_oh_input_batch[i, t + 1:, dec_space_token] = 1

			current_idx += self.batch_size

			yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

        h = self.model.fit(self.source.train_generator(),
			batch_size       = self.conf.batch_size,
			epochs           = self.conf.epochs,
			initial_epoch    = self.source.epoch,
			steps_per_epoch  = batches_per_epoch,
			validation_steps = batches_per_epoch,
			validation_data  = self.source.validation_generator(),
			validation_freq  = self.conf.validation_freq
		)

With these settings:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = 'categorical_crossentropy'
metrics          = ['accuracy']
optimizer = {
	'name'          : 'Adam',
	'learning_rate' : 0.0001,
}

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630

What am I doing wrong?

答案1

得分: 1

在原始的逐字符翻译任务中,解码器的输入和目标数据会提前一时间步进行移位,因为解码器需要基于当前和过去的字符来预测下一个字符。

然而,在您的任务中,目标是直接将输入中的每个字符映射到输出中的字符。因此,无需移位目标数据。

我已经更改了对编码器输入数据和解码器目标数据进行预处理的for循环。

尝试使用以下代码:

for t, char in enumerate(target_text):
   decoder_input_data[i, t, target_token_index[char]] = 1.
   decoder_target_data[i, t, target_token_index[char]] = 1.
英文:

In the original character-by-character translation task, the decoder input and target data are shifted by one time step because the decoder needs to predict the next character based on the current and past characters.

However, in your task, the goal is to map each character in the input directly to a character in the output. So, there's no need to shift the target data.

I have changed the for-loop where the encoder_input_data and decoder_target_data are pre-processed.

Try using this:

for t, char in enumerate(target_text):
   decoder_input_data[i, t, target_token_index[char]] = 1.
   decoder_target_data[i, t, target_token_index[char]] = 1.

huangapple
  • 本文由 发表于 2023年7月18日 09:38:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定