2023年4月17日 02:47:42go评论152阅读模式

英文:

Having trouble correctly importing tensorflow Tokenizer and tensorflow padded_sequences

问题

我有一个神经网络，从txt文件中获取数据并使用NLP学习如何像人类一样说话。但每当我加载Tokenizer和padded_sequences时（这两者都是必需的），它们无法正确导入。

我相信我的tensorflow版本或配置可能存在问题，但我已将其更新到最新版本。我可能需要在一个新的虚拟机中测试我的代码以使其正常工作。

这是我的代码：

英文:

I have a neural network that takes data from a txt file and uses nlp to learn how to speak like a human. But whenever I load Tokenizer and padded_sequences, (which are both needed)
they do not correctly import.

I believe that there may be problems with my tensorflow version or configuration but I do have it updated to the latest version. I may need to end up testing my code in a fresh virtual machine to get it working.

Here is my code:

import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import importlib

if importlib.util.find_spec(&quot;tensorflow.keras.preprocessing.text.Tokenizer&quot;) is not None:
    print(&quot;The Tokenizer class has been imported successfully.&quot;)
else:
    print(&quot;The Tokenizer class has not been imported successfully.&quot;)

if importlib.util.find_spec(&quot;tensorflow.keras.preprocessing.text.pad_sequences&quot;) is not None:
    print(&quot;The pad_sequences class has been imported successfully.&quot;)
else:
    print(&quot;The pad_sequences class has not been imported successfully.&quot;)

# Load the text dataset
with open(&#39;data.txt&#39;, &#39;r&#39;) as f:
    data = f.read()

# Split the data into sentences
sentences = data.split(&#39;.&#39;)

# Create a tokenizer and fit on the sentences
tokenizer = Tokenizer(filters=&#39;&#39;)
tokenizer.fit_on_texts(sentences)

# Convert the sentences to sequences of integers
sequences = tokenizer.texts_to_sequences(sentences)

# Create input and target sequences
input_sequences = []
target_sequences = []
for sequence in sequences:
    for i in range(1, len(sequence)):
        input_sequence = sequence[:i]
        target_sequence = sequence[i]
        input_sequences.append(input_sequence)
        target_sequences.append(target_sequence)

# Pad the input sequences
max_sequence_length = max([len(sequence) for sequence in input_sequences])
padded_input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length)

# Convert the target sequences to one-hot vectors
one_hot_target_sequences = tf.keras.utils.to_categorical(target_sequences, num_classes=len(tokenizer.word_index)+1)

# Create the neural network
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(len(tokenizer.word_index)+1, 100),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(len(tokenizer.word_index)+1, activation=&#39;softmax&#39;)
])

# Train the neural network
model.compile(optimizer=&#39;adam&#39;, loss=&#39;categorical_crossentropy&#39;, metrics=[&#39;accuracy&#39;])
model.fit(padded_input_sequences, one_hot_target_sequences, epochs=10)

# Generate text
def generate_text(model, tokenizer, max_sequence_length, start_text):
    # Create a sequence of tokens
    tokens = tokenizer.texts_to_sequences([start_text])[0]

    # Generate text until the end of the sentence is reached or max sequence length is reached
    while len(tokens) &lt; max_sequence_length:
        # Pad the input sequence
        padded_sequence = tf.keras.preprocessing.sequence.pad_sequences([tokens], maxlen=max_sequence_length)

        # Get the probability distribution for the next token
        probabilities = model.predict(padded_sequence)[0]

        # Choose the next token with the highest probability
        next_token = np.argmax(probabilities)

        # Add the next token to the sequence
        tokens.append(next_token)

        # Check if the end of the sentence has been reached
        if next_token == tokenizer.word_index[&#39;.&#39;] or len(tokens) == max_sequence_length:
            break

    # Convert the tokens back to text
    generated_text = tokenizer.sequences_to_texts([tokens])[0]
    return generated_text

# Generate some text
generated_text = generate_text(model, tokenizer, max_sequence_length, &#39;Hello, my name is Bard.&#39;)
print(generated_text)

Image:
Image from PyCharm underlining the code in red

答案1

得分: 1

以下适用于 TensorFlow 2.12.0。您可以查看链接的 GitHub blobs 上关于 TensorFlow 2.12.0 中 Tokenizer 类和 pad_sequences 函数定义的位置。这些模块的命名以及 TensorFlow/Keras 的结构已经改变了几次，因此正确的导入语句会根据版本而异。

将代码中的：

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

改为：

from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences

英文:

The following worked with TensorFlow 2.12.0. You can see where the Tokenizer class and pad_sequences function are defined on the linked GitHub blobs for TensorFlow 2.12.0. What these modules are called and how TensorFlow/Keras are structured has changed a few times so the correct import statements will be version specific.

Change:

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

To:

from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

遇到问题在正确导入tensorflow Tokenizer和tensorflow padded_sequences。

问题

答案1

在Taipy创建一个用户友好的链接选择器：如何在选择器中显示链接列表

将数据框列从hhmmss转换为hh:mm:ss在Python中。

TensorFlow减慢了Jupyter Notebook中的智能感知（类型提示）。

如何将输出文件转换为数组

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论