遇到问题在正确导入tensorflow Tokenizer和tensorflow padded_sequences。

huangapple go评论96阅读模式
英文:

Having trouble correctly importing tensorflow Tokenizer and tensorflow padded_sequences

问题

我有一个神经网络,从txt文件中获取数据并使用NLP学习如何像人类一样说话。但每当我加载Tokenizer和padded_sequences时(这两者都是必需的),它们无法正确导入。

我相信我的tensorflow版本或配置可能存在问题,但我已将其更新到最新版本。我可能需要在一个新的虚拟机中测试我的代码以使其正常工作。

这是我的代码:

英文:

I have a neural network that takes data from a txt file and uses nlp to learn how to speak like a human. But whenever I load Tokenizer and padded_sequences, (which are both needed)
they do not correctly import.

I believe that there may be problems with my tensorflow version or configuration but I do have it updated to the latest version. I may need to end up testing my code in a fresh virtual machine to get it working.

Here is my code:

  1. import tensorflow as tf
  2. import numpy as np
  3. from tensorflow.keras.preprocessing.text import Tokenizer
  4. from tensorflow.keras.preprocessing.sequence import pad_sequences
  5. import importlib
  6. if importlib.util.find_spec("tensorflow.keras.preprocessing.text.Tokenizer") is not None:
  7. print("The Tokenizer class has been imported successfully.")
  8. else:
  9. print("The Tokenizer class has not been imported successfully.")
  10. if importlib.util.find_spec("tensorflow.keras.preprocessing.text.pad_sequences") is not None:
  11. print("The pad_sequences class has been imported successfully.")
  12. else:
  13. print("The pad_sequences class has not been imported successfully.")
  14. # Load the text dataset
  15. with open('data.txt', 'r') as f:
  16. data = f.read()
  17. # Split the data into sentences
  18. sentences = data.split('.')
  19. # Create a tokenizer and fit on the sentences
  20. tokenizer = Tokenizer(filters='')
  21. tokenizer.fit_on_texts(sentences)
  22. # Convert the sentences to sequences of integers
  23. sequences = tokenizer.texts_to_sequences(sentences)
  24. # Create input and target sequences
  25. input_sequences = []
  26. target_sequences = []
  27. for sequence in sequences:
  28. for i in range(1, len(sequence)):
  29. input_sequence = sequence[:i]
  30. target_sequence = sequence[i]
  31. input_sequences.append(input_sequence)
  32. target_sequences.append(target_sequence)
  33. # Pad the input sequences
  34. max_sequence_length = max([len(sequence) for sequence in input_sequences])
  35. padded_input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length)
  36. # Convert the target sequences to one-hot vectors
  37. one_hot_target_sequences = tf.keras.utils.to_categorical(target_sequences, num_classes=len(tokenizer.word_index)+1)
  38. # Create the neural network
  39. model = tf.keras.Sequential([
  40. tf.keras.layers.Embedding(len(tokenizer.word_index)+1, 100),
  41. tf.keras.layers.LSTM(128),
  42. tf.keras.layers.Dense(len(tokenizer.word_index)+1, activation='softmax')
  43. ])
  44. # Train the neural network
  45. model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
  46. model.fit(padded_input_sequences, one_hot_target_sequences, epochs=10)
  47. # Generate text
  48. def generate_text(model, tokenizer, max_sequence_length, start_text):
  49. # Create a sequence of tokens
  50. tokens = tokenizer.texts_to_sequences([start_text])[0]
  51. # Generate text until the end of the sentence is reached or max sequence length is reached
  52. while len(tokens) < max_sequence_length:
  53. # Pad the input sequence
  54. padded_sequence = tf.keras.preprocessing.sequence.pad_sequences([tokens], maxlen=max_sequence_length)
  55. # Get the probability distribution for the next token
  56. probabilities = model.predict(padded_sequence)[0]
  57. # Choose the next token with the highest probability
  58. next_token = np.argmax(probabilities)
  59. # Add the next token to the sequence
  60. tokens.append(next_token)
  61. # Check if the end of the sentence has been reached
  62. if next_token == tokenizer.word_index['.'] or len(tokens) == max_sequence_length:
  63. break
  64. # Convert the tokens back to text
  65. generated_text = tokenizer.sequences_to_texts([tokens])[0]
  66. return generated_text
  67. # Generate some text
  68. generated_text = generate_text(model, tokenizer, max_sequence_length, 'Hello, my name is Bard.')
  69. print(generated_text)

Image:
Image from PyCharm underlining the code in red

答案1

得分: 1

以下适用于 TensorFlow 2.12.0。您可以查看链接的 GitHub blobs 上关于 TensorFlow 2.12.0 中 Tokenizer 类和 pad_sequences 函数定义的位置。这些模块的命名以及 TensorFlow/Keras 的结构已经改变了几次,因此正确的导入语句会根据版本而异。

将代码中的:

  1. from tensorflow.keras.preprocessing.text import Tokenizer
  2. from tensorflow.keras.preprocessing.sequence import pad_sequences

改为:

  1. from keras.preprocessing.text import Tokenizer
  2. from keras.utils import pad_sequences
英文:

The following worked with TensorFlow 2.12.0. You can see where the Tokenizer class and pad_sequences function are defined on the linked GitHub blobs for TensorFlow 2.12.0. What these modules are called and how TensorFlow/Keras are structured has changed a few times so the correct import statements will be version specific.

Change:

  1. from tensorflow.keras.preprocessing.text import Tokenizer
  2. from tensorflow.keras.preprocessing.sequence import pad_sequences

To:

  1. from keras.preprocessing.text import Tokenizer
  2. from keras.utils import pad_sequences

huangapple
  • 本文由 发表于 2023年4月17日 02:47:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029717.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定