LSTM编码器-解码器陷入平稳状态,无法学习。

huangapple go评论86阅读模式
英文:

LSTM Encoder-Decoder stuck in plateau and not learning

问题

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

  1. molteyhpr 010011000
  2. dlkz 0000
  3. fabgovmgg 010010000
  4. qgvowdykl 000100100
  5. kgncpiot 00000110
  6. pisvdf 010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

  1. self.latent_dim = 256
  2. enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
  3. enc_lstm_layer = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
  4. enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
  5. # We discard 'enc_outputs' and only keep the states.
  6. enc_states = [state_h, state_c]
  7. # Set up the decoder, using 'enc_states' as initial state.
  8. dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))
  9. # We set up our decoder to return full output sequences,
  10. # and to return internal states as well. We don't use the
  11. # return states in the training model, but we will use them in inference.
  12. dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)
  13. dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
  14. dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
  15. dec_outputs = dec_dense_layer(dec_outputs)
  16. # Define the model that will turn
  17. # 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'
  18. model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

  1. def _generator(self, enc_data, dec_data, is_training):
  2. enc_oh_input_batch = None
  3. dec_oh_input_batch = None
  4. dec_oh_output_batch = None
  5. enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
  6. dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
  7. current_idx = 0
  8. samples_len = len(enc_data)
  9. while True:
  10. # Create zero batch arrays
  11. enc_oh_input_batch = np.zeros(
  12. (self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
  13. dec_oh_input_batch = np.zeros(
  14. (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
  15. dec_oh_output_batch = np.zeros(
  16. (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
  17. # Compile batch
  18. for i in range(self.batch_size):
  19. # when we get to the end of samples - start over
  20. if i + current_idx >= samples_len:
  21. current_idx = 0
  22. if is_training:
  23. self.epoch += 1
  24. tokens_in = enc_data[i + current_idx]
  25. tokens_out = dec_data[i + current_idx]
  26. # vectorize encoder input
  27. for t, token in enumerate(tokens_in):
  28. enc_oh_input_batch[i, t, token] = 1
  29. enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
  30. # vectorize decoder input and output
  31. for t, token in enumerate(tokens_out):
  32. dec_oh_input_batch[i, t, token] = 1
  33. if t > 0:
  34. # self.dec_oh_output will be ahead by one timestep
  35. # and will not include the start character.
  36. dec_oh_output_batch[i, t - 1, token] = 1
  37. dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
  38. current_idx += self.batch_size
  39. yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

  1. h = self.model.fit(self.source.train_generator(),
  2. batch_size = self.conf.batch_size,
  3. epochs = self.conf.epochs,
  4. initial_epoch = self.source.epoch,
  5. steps_per_epoch = batches_per_epoch,
  6. validation_steps = batches_per_epoch,
  7. validation_data = self.source.validation_generator(),
  8. validation_freq = self.conf.validation_freq
  9. )

With these settings:

  1. epochs = 10
  2. validation_freq = 10
  3. validation_split = 0.2
  4. batch_size = 30
  5. loss = 'categorical_crossentropy'
  6. metrics = ['accuracy']
  7. optimizer = {
  8. 'name' : 'Adam',
  9. 'learning_rate' : 0.0001,
  10. }

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

  1. Training model ...
  2. Epoch 1/10
  3. 33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
  4. Epoch 2/10
  5. 33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
  6. Epoch 3/10
  7. 33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
  8. Epoch 4/10
  9. 33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
  10. Epoch 5/10
  11. 33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
  12. Epoch 6/10
  13. 33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
  14. Epoch 7/10
  15. 33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
  16. Epoch 8/10
  17. 33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
  18. Epoch 9/10
  19. 33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy
英文:

I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

  1. molteyhpr 010011000
  2. dlkz 0000
  3. fabgovmgg 010010000
  4. qgvowdykl 000100100
  5. kgncpiot 00000110
  6. pisvdf 010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

  1. self.latent_dim = 256
  2. enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
  3. enc_lstm_layer = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
  4. enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
  5. # We discard 'enc_outputs' and only keep the states.
  6. enc_states = [state_h, state_c]
  7. # Set up the decoder, using 'enc_states' as initial state.
  8. dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))
  9. # We set up our decoder to return full output sequences,
  10. # and to return internal states as well. We don't use the
  11. # return states in the training model, but we will use them in inference.
  12. dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)
  13. dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
  14. dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
  15. dec_outputs = dec_dense_layer(dec_outputs)
  16. # Define the model that will turn
  17. # 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'
  18. model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

  1. def _generator(self, enc_data, dec_data, is_training):
  2. enc_oh_input_batch = None
  3. dec_oh_input_batch = None
  4. dec_oh_output_batch = None
  5. enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
  6. dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
  7. current_idx = 0
  8. samples_len = len(enc_data)
  9. while True:
  10. # Create zero batch arrays
  11. enc_oh_input_batch = np.zeros(
  12. (self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
  13. dec_oh_input_batch = np.zeros(
  14. (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
  15. dec_oh_output_batch = np.zeros(
  16. (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
  17. # Compile batch
  18. for i in range(self.batch_size):
  19. # when we get to the end of samples - start over
  20. if i + current_idx >= samples_len:
  21. current_idx = 0
  22. if is_training:
  23. self.epoch += 1
  24. tokens_in = enc_data[i + current_idx]
  25. tokens_out = dec_data[i + current_idx]
  26. # vectorize encoder input
  27. for t, token in enumerate(tokens_in):
  28. enc_oh_input_batch[i, t, token] = 1
  29. enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
  30. # vectorize decoder input and output
  31. for t, token in enumerate(tokens_out):
  32. dec_oh_input_batch[i, t, token] = 1
  33. if t > 0:
  34. # self.dec_oh_output will be ahead by one timestep
  35. # and will not include the start character.
  36. dec_oh_output_batch[i, t - 1, token] = 1
  37. dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
  38. current_idx += self.batch_size
  39. yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

  1. h = self.model.fit(self.source.train_generator(),
  2. batch_size = self.conf.batch_size,
  3. epochs = self.conf.epochs,
  4. initial_epoch = self.source.epoch,
  5. steps_per_epoch = batches_per_epoch,
  6. validation_steps = batches_per_epoch,
  7. validation_data = self.source.validation_generator(),
  8. validation_freq = self.conf.validation_freq
  9. )

With these settings:

  1. epochs = 10
  2. validation_freq = 10
  3. validation_split = 0.2
  4. batch_size = 30
  5. loss = 'categorical_crossentropy'
  6. metrics = ['accuracy']
  7. optimizer = {
  8. 'name' : 'Adam',
  9. 'learning_rate' : 0.0001,
  10. }

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

  1. Training model ...
  2. Epoch 1/10
  3. 33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
  4. Epoch 2/10
  5. 33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
  6. Epoch 3/10
  7. 33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
  8. Epoch 4/10
  9. 33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
  10. Epoch 5/10
  11. 33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
  12. Epoch 6/10
  13. 33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
  14. Epoch 7/10
  15. 33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
  16. Epoch 8/10
  17. 33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
  18. Epoch 9/10
  19. 33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
  20. Epoch 10/10
  21. 33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630

What am I doing wrong?

答案1

得分: 1

在原始的逐字符翻译任务中,解码器的输入和目标数据会提前一时间步进行移位,因为解码器需要基于当前和过去的字符来预测下一个字符。

然而,在您的任务中,目标是直接将输入中的每个字符映射到输出中的字符。因此,无需移位目标数据。

我已经更改了对编码器输入数据和解码器目标数据进行预处理的for循环。

尝试使用以下代码:

  1. for t, char in enumerate(target_text):
  2. decoder_input_data[i, t, target_token_index[char]] = 1.
  3. decoder_target_data[i, t, target_token_index[char]] = 1.
英文:

In the original character-by-character translation task, the decoder input and target data are shifted by one time step because the decoder needs to predict the next character based on the current and past characters.

However, in your task, the goal is to map each character in the input directly to a character in the output. So, there's no need to shift the target data.

I have changed the for-loop where the encoder_input_data and decoder_target_data are pre-processed.

Try using this:

  1. for t, char in enumerate(target_text):
  2. decoder_input_data[i, t, target_token_index[char]] = 1.
  3. decoder_target_data[i, t, target_token_index[char]] = 1.

huangapple
  • 本文由 发表于 2023年7月18日 09:38:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定