2020年1月3日 17:53:49go评论102阅读模式

英文:

Python kernel dies on Jupyter Notebook with tensorflow 2

问题

我在我的Mac上使用conda按照这些说明安装了tensorflow 2：

conda create -n tf2 tensorflow

然后，我安装了ipykernel，以将这个新环境添加到我的Jupyter Notebook内核，步骤如下：

conda activate tf2
conda install ipykernel
python -m ipykernel install --user --name=tf2

这似乎运行得很顺利，我可以在我的Jupyter Notebook内核中看到我的tf2环境。

然后，我尝试运行简单的MNIST 示例来检查是否一切正常，但当我执行以下代码行时：

model.fit(x_train, y_train, epochs=5)

我的Jupyter Notebook内核突然停止工作，没有更多信息。

我在终端上通过python mnist_test.py以及通过ipython（逐个命令执行）执行了相同的代码，但没有任何问题，这让我认为我的tensorflow 2已经正确安装在我的conda环境中。

对于安装出了什么问题，有什么想法吗？

版本信息：

python==3.7.5
tensorboard==2.0.0
tensorflow==2.0.0
tensorflow-estimator==2.0.0
ipykernel==5.1.3
ipython==7.10.2
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==5.2.0
jupyter-core==4.6.1

这里我也附上了完整的脚本以及执行的标准输出：

import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

nn_model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

nn_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

nn_model.fit(x_train, y_train, epochs=5)

nn_model.evaluate(x_test,  y_test, verbose=2)

(tf2) ➜  tensorflow2 python mnist_test.py  
2020-01-03 10:46:10.854619: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags. 
2020-01-03 10:46:10.854860: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance. 
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 6s 102us/sample - loss: 0.3018 - accuracy: 0.9140 
Epoch 2/5
60000/60000 [==============================] - 6s 103us/sample - loss: 0.1437 - accuracy: 0.9571 
Epoch 3/5
60000/60000 [==============================] - 6s 103us/sample - loss: 0.1054 - accuracy: 0.9679 
Epoch 4/5
60000/60000 [==============================] - 6s 103us/sample - loss: 0.0868 - accuracy: 0.9729 
Epoch 5/5
60000/60000 [==============================] - 6s 103us/sample - loss: 0.0739 - accuracy: 0.9772 
10000/1 - 1s - loss: 0.0359 - accuracy: 0.9782 
(tf2) ➜  tensorflow2

英文:

I installed tensorflow 2 on my mac using conda according these instructions:

conda create -n tf2 tensorflow

Then I installed ipykernel to add this new environment to my jupyter notebook kernels as follows:

conda activate tf2
conda install ipykernel
python -m ipykernel install --user --name=tf2

That seemed to work well, I am able to see my tf2 environment on my jupyter notebook kernels.

Then I tried to run the simple MNIST example to check if all was working properly and I when I execute this line of code:

model.fit(x_train, y_train, epochs=5)

The kernel of my jupyter notebook dies without more information.

I executed the same code on my terminal via python mnist_test.py and also via ipython (command by command) and I don't have any issues, which let's me assume that my tensorflow 2 is correctly installed on my conda environment.

Any ideas on what went wrong during the install?

Versions:

python==3.7.5
tensorboard==2.0.0
tensorflow==2.0.0
tensorflow-estimator==2.0.0
ipykernel==5.1.3
ipython==7.10.2
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==5.2.0
jupyter-core==4.6.1

Here I put the complete script as well as the STDOUT of the execution:

import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

nn_model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation=&#39;relu&#39;),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=&#39;softmax&#39;)
])

nn_model.compile(optimizer=&#39;adam&#39;,
              loss=&#39;sparse_categorical_crossentropy&#39;,
              metrics=[&#39;accuracy&#39;])

nn_model.fit(x_train, y_train, epochs=5)

nn_model.evaluate(x_test,  y_test, verbose=2)

> (tf2) ➜ tensorflow2 python mnist_test.py 2020-01-03 10:46:10.854619:
> I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow
> binary is optimized with Intel(R) MKL-DNN to use the following CPU
> instructions in performance critical operations: SSE4.1 SSE4.2 AVX
> AVX2 FMA To enable them in non-MKL-DNN operations, rebuild TensorFlow
> with the appropriate compiler flags. 2020-01-03 10:46:10.854860: I
> tensorflow/core/common_runtime/process_util.cc:115] Creating new
> thread pool with default inter op setting: 8. Tune using
> inter_op_parallelism_threads for best performance. Train on 60000
> samples Epoch 1/5 60000/60000 [==============================] - 6s
> 102us/sample - loss: 0.3018 - accuracy: 0.9140 Epoch 2/5 60000/60000
> [==============================] - 6s 103us/sample - loss: 0.1437 -
> accuracy: 0.9571 Epoch 3/5 60000/60000
> [==============================] - 6s 103us/sample - loss: 0.1054 -
> accuracy: 0.9679 Epoch 4/5 60000/60000
> [==============================] - 6s 103us/sample - loss: 0.0868 -
> accuracy: 0.9729 Epoch 5/5 60000/60000
> [==============================] - 6s 103us/sample - loss: 0.0739 -
> accuracy: 0.9772 10000/1 - 1s - loss: 0.0359 - accuracy: 0.9782 (tf2)
> ➜ tensorflow2

答案1

得分: 16

经过尝试不同的方法，我使用以下命令在调试模式下运行了jupyter notebook：

jupyter notebook --debug

然后在我的笔记本上执行命令后，我收到了以下错误消息：

OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can
degrade performance or cause incorrect results. The best thing to do
is to ensure that only a single OpenMP runtime is linked into the
process, e.g. by avoiding static linking of the OpenMP runtime in any
library. As an unsafe, unsupported, undocumented workaround you can
set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the
program to continue to execute, but that may cause crashes or silently
produce incorrect results. For more information, please see
http://www.intel.com/software/products/support/.

根据这个讨论，安装在虚拟环境中的nomkl对我有用：

conda install nomkl

英文:

After trying different things I run jupyter notebook on debug mode by using the command:

jupyter notebook --debug

Then after executing the commands on my notebook I got the error message:

> OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
> OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can
> degrade performance or cause incorrect results. The best thing to do
> is to ensure that only a single OpenMP runtime is linked into the
> process, e.g. by avoiding static linking of the OpenMP runtime in any
> library. As an unsafe, unsupported, undocumented workaround you can
> set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the
> program to continue to execute, but that may cause crashes or silently
> produce incorrect results. For more information, please see
> http://www.intel.com/software/products/support/.

And following this discussion, installing nomkl on the virtual environment worked for me.

conda install nomkl

答案2

得分: 2

尝试 conda install nomkl。即使遇到问题，检查您的 anaconda/lib 文件夹，运行 ll lib*omp*，您是否看到一些旧的 libiomp5.dylib 文件？将其删除。

英文:

Try conda install nomkl . Even if you face the problem , Check your anaconda/lib folder, run ll lib*omp*, do you see some old libiomp5.dylib file? Remove it.

答案3

得分: 1

我无法确切猜测你遇到的问题，但似乎与某个版本冲突有关。请执行以下步骤（这是我所做的，对我有效）：

conda create -n tf2 python=3.7 ipython ipykernel
conda activate tf2
conda install -c anaconda tensorflow
python -m ipykernel install --user --name=tf2
再次运行模型，看看是否正常工作。

英文:

I can't exactly guess the problem you are having but looks like it has do with some version clash. Do the following (that's what I did and it works for me):

conda create -n tf2 python=3.7 ipython ipykernel
conda activate tf2
conda install -c anaconda tensorflow
python -m ipykernel install --user --name=tf2
Run the model again and see if it is working.

答案4

得分: 1

在这个Nvidia cudnn文档中找到了我的问题是与zlib.dll文件相关的。

我是如何解决的：

这是针对64位系统的。
这是我正在使用的系统

下载文件：
64位 - http://www.winimage.com/zLibDll/zlib123dllx64.zip<br>
将zlib123dllx64\dll_x64文件夹中的内容解压到<br>
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\

对于32位系统，应该类似。

英文:

Find out that my problem was with zlib.dll file found in this nvidia link Nvidia cudnn docs

How I fixed it:

This is for 64bit.
That is the system I'm working on

Download file :
64 bits - http://www.winimage.com/zLibDll/zlib123dllx64.zip<br>
Extract the content of zlib123dllx64\dll_x64 to<br>
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\

For 32 bits systems should be similar

答案5

得分: 1

以下是翻译好的部分：

对于那些在这个问题上遇到困难的人，我已经阅读了很多帖子，最终发现这是一个环境问题，这是你应该做的事情（在Mac上）：

更改您的环境
只需进入终端（shell），键入 conda activate
它应该在您的bash中创建一个新的环境，您可以看到括号。
安装tensorflow
在终端（shell）中键入 conda install tensorflow。
在jupyter笔记本中重新启动您的内核，这应该可以解决问题。
重新安装numpy（如果需要）
如果在步骤2中遇到任何问题，只需在终端（shell）中键入 conda uninstall numpy 和 conda install numpy。
升级numpy（如果需要）
如果在步骤3之后仍然遇到问题，只需在终端（shell）中键入 pip install --upgrade numpy。

这些步骤应该可以解决这个问题。

如果您想退出这个环境，只需在终端（shell）中键入 conda deactivate。

英文:

For those who are stuck on this problem, I've read lots of posts, and finally found this is a problem of environment, this is what you should do(in mac):

1.change your environment
just go to your terminal(shell), type conda activate
it should create a new environment as you can see brakets in your bash.

2.install tensorflow
type conda install tensorflow
in your terminal(shell).
Restart your kernel in jupyter notebook.this should work.

3.Reinstall numpy (if needed)
If you encounter any problem in step 2, just type conda uninstall numpy and conda install numpy in terminal(shell)

4.upgrade numpy (if needed)
If you still got problem after step 3, just type pip install --upgrade numpy in terminal(shell)

These steps should fix this problem.

If you want to get out of this environment, just type conda deactivate in terminal(shell).

答案6

得分: 0

这个问题发生在我身上，如下图所示，靠近红色箭头。在Jupyter中调试后，我意识到这个问题发生在从TensorBoard目录流式传输序列化数据时。现在，如果我改变model_dir="someothername"，那么它就会像魔法一样运行。

英文:

For me this issue was happening, as show below near to red arrow
After dubugging in jupyter, I realised this issue happens when its streaming serializaed data from tensorboard directory. Now If I change the model_dir="someothername" then I works like charm.

答案7

得分: 0

安装nomkl解决了我的问题。

尝试使用conda install nomkl或从Anaconda Navigator的环境中安装。

英文:

Installing nomkl fixed it for me.

Try conda install nomkl or install from environments in anaconda navigator.

答案8

得分: 0

Tensorflow GPU不支持12.0及更高版本，请使用

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

英文:

Tensorflow GPU won't support for versions of 12.0 and higher, use

import os
os.environ[&#39;KMP_DUPLICATE_LIB_OK&#39;]=&#39;True&#39;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python内核在Jupyter Notebook上使用TensorFlow 2时崩溃。

问题

答案1

答案2

答案3

答案4

答案5

答案6

答案7

答案8

我要翻译的内容：如何总结一个 Polars 数据框的所有列

地图可视化库与滚动

如何使用pandas显示系列的所有元素

Profile matching query does not exist.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论