ModuleNotFoundError for 'sklearn' as subdependency of numpy

huangapple go评论83阅读模式
英文:

ModuleNotFoundError for 'sklearn' as subdependency of numpy

问题

I am using Docker combined with virtualenv to run a project for a client, but getting the error ModuleNotFound for sklearn.

In my Pipfile I have added the numpy dependency

numpy = "==1.21.6"

The error is thrown from the following line

np.load(PATH_TO_NPY_FILE, allow_pickle=True)

with the following stack trace:

development_1  |   File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
development_1  |     pickle_kwargs=pickle_kwargs)
development_1  |   File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/format.py", line 748, in read_array
development_1  |     array = pickle.load(fp, **pickle_kwargs)
development_1  | ModuleNotFoundError: No module named 'sklearn'

I find this strange because sklearn should be installed as part of the numpy dependency tree, right?

Still, I tried the suggestions I found in other posts, like adding the following command explicitly to my Dockerfile

python -m pip install scikit-learn scipy matplotlib

However, the error still persists.

For completeness, I'll provide some extra info below, although the key question remains why installing numpy does not imply its subdependencies to be in place.


Project structure

The project is sort of a bridge between SQS on one hand and the logic of the client on the other. The code from which the error is thrown comes from a git submodule, and the Pipfile is added on the top-level repo. The submodule does not contain a Pipfile. The submodules folder has an __init__.py file because it contains functions that I want to use in my src code.
In the tree below, my code is in main.py, and the error-throwing code is in submodules/module2/bar.py.

|- src/
|  |- main.py
|
|- submodules/
|  |- module1
|  |  |- foo.py
|  |  |- setup.py
|  |
|  |- module2
|  |  |- bar.py
|  |
|  |- __init__.py
|
|- .gitmodules
|- Pipfile
|- Dockerfile

Dockerfile contents

Note that at this point, it is a bit of an aggregate of solutions I took from the other post on the matter. That's why both pip install scikit-learn and apt-get install python3-sklearn are currently included. Will prune later when I finally have fixed this issue.

FROM python:3.7

WORKDIR code/

COPY Pipfile .
COPY submodules/ submodules/

RUN pip install pipenv && \
    pipenv install --deploy  && \
    python -m pip install scikit-learn scipy matplotlib && \
    apt-get update && \
    apt-get install -y locales ffmpeg libsm6 libxext6 libxrender-dev python3-sklearn && \
    sed -i -e 's/# nl_BE.UTF-8 UTF-8/nl_BE.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales

ENV LANG nl_BE.UTF-8
ENV LC_ALL nl_BE.UTF-8

COPY .env .
COPY src/ .
COPY data/ data

CMD [ "pipenv", "run", "python", "main.py" ]

Pipfile contents

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
python-dotenv = "*"
boto3 = "*"
pySqsListener = "*"
xpress = "==9.0.5"
module1 = {path = "./submodules/module1"}
pandas = "==1.3.4"
numpy = "==1.21.6"

[dev-packages]

[requires]
python_version = "3.7"
英文:

I am using Docker combined with virtualenv to run a project for a client, but getting the error ModuleNotFound for sklearn.

In my Pipfile I have added the numpy dependency

numpy = "==1.21.6"

The error is thrown from the following line

np.load(PATH_TO_NPY_FILE, allow_pickle=True)

with the following stack trace:

development_1  |   File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
development_1  |     pickle_kwargs=pickle_kwargs)
development_1  |   File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/numpy/lib/format.py", line 748, in read_array
development_1  |     array = pickle.load(fp, **pickle_kwargs)
development_1  | ModuleNotFoundError: No module named 'sklearn'

I find this strange, because sklearn should be installed as part of the numpy dependency tree, right?

Still I tried the suggestions I found in other posts, like adding the following command explicitly to my Dockerfile

python -m pip install scikit-learn scipy matplotlib

However, the error still persists.

For completeness, I'll provide some extra info below, although the key question remains why installing numpy does not imply its sub dependencies to be in place.


Project structure

The project is sort of a bridge between SQS on one hand and the logic of the client on the other. The code from which the error is thrown comes from a git submodule and the Pipfile is added on the top-level repo. The submodule does not contain a Pipfile. The submodules folder has an __init__.py file because it contains functions that I want to use in my src code.
In the tree below, my code is in main.py and the error throwing code is in submodules/module2/bar.py.

|- src/
|  |- main.py
|
|- submodules/
|  |- module1
|  |  |- foo.py
|  |  |- setup.py
|  |
|  |- module2
|  |  |- bar.py
|  |
|  |- __init__.py
|
|- .gitmodules
|- Pipfile
|- Dockerfile

Dockerfile contents

Note that at this point, it is a bit of an aggregate of solutions I took from the other post on the matter. That's why both pip install scikit-learn and apt-get install python3-sklearn are currently included. Will prune later when I finally have fixed this issue.

FROM python:3.7

WORKDIR code/

COPY Pipfile .
COPY submodules/ submodules/

RUN pip install pipenv && \
    pipenv install --deploy  && \
    python -m pip install scikit-learn scipy matplotlib && \
    apt-get update && \
    apt-get install -y locales ffmpeg libsm6 libxext6 libxrender-dev python3-sklearn && \
    sed -i -e 's/# nl_BE.UTF-8 UTF-8/nl_BE.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales

ENV LANG nl_BE.UTF-8
ENV LC_ALL nl_BE.UTF-8

COPY .env .
COPY src/ .
COPY data/ data

CMD [ "pipenv", "run", "python", "main.py" ]x

Pipfile contents

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
python-dotenv = "*"
boto3 = "*"
pySqsListener = "*"
xpress = "==9.0.5"
module1 = {path = "./submodules/module1"}
pandas = "==1.3.4"
numpy = "==1.21.6"

[dev-packages]

[requires]
python_version = "3.7"

答案1

得分: 2

"I find this strange, because sklearn should be installed as part of the numpy dependency tree, right?"

不好意思,此部分无需翻译。

"No, scikit-learn is not a dependency of numpy (it's the other way around)."

不好意思,此部分无需翻译。

"Since loading that pickle file requires sklearn, just put scikit-learn in the pipfile (and matplotlib if you need it); it's a dependency of your project."

由于加载 pickle 文件需要 sklearn,只需将 scikit-learn 放入 pipfile(如果需要,还可以加入 matplotlib);它是您项目的依赖项。

"Installing them outside the pipenv-generated environment with that python -m pip install scikit-learn scipy matplotlib will likely have no effect (nor will the apt-installed python3-sklearn), since virtualenvs are designed to be separate from the ambient environment."

在由 pipenv 生成的环境之外安装它们,使用 python -m pip install scikit-learn scipy matplotlib,可能不会产生任何效果(apt 安装的 python3-sklearn 也不会),因为虚拟环境被设计为与环境分离。

英文:

> I find this strange, because sklearn should be installed as part of the numpy dependency tree, right?

No, scikit-learn is not a dependency of numpy (it's the other way around).

Since loading that pickle file requires sklearn, just put scikit-learn in the pipfile (and matplotlib if you need it); it's a dependency of your project.

Installing them outside the pipenv-generated environment with that python -m pip install scikit-learn scipy matplotlib will likely have no effect (nor will the apt-installed python3-sklearn), since virtualenvs are designed to be separate from the ambient environment.

huangapple
  • 本文由 发表于 2023年4月13日 16:11:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76003126.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定