英文:
Execute script that uses my local package - ImportErrors
问题
我的项目在一个 Kubernetes 容器中运行了一段时间,直到我决定 "清理" 我在模块顶部使用的 sys.add
调用。这包括在 pyproject.toml
中描述我的依赖关系,并彻底放弃了 setup.py
;它导入了安装工具,并在 __main__
时调用 setup()
。
设计意图是不将任何内容作为脚本运行在 /tnc/app
中。相反,它是一组模块或一个包。代码库中唯一作为 __main__
的部分是 api.py
文件。它初始化并启动 Flask。
实现
我有一个精简的部署设置,包括以下内容:
- 核心库位于
/opt/venv
- 我的包
/app/tnc
- 入口点
/app/bin/api
我使用以下命令启动 Flask 应用程序:python /app/bin/api
。
构建过程发生在 python:3.11-slim
Docker 镜像中。在这里,我安装了建议使用的 gcc,并在 Dockerfile 中指定以下内容:
-- build
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY pyproject.toml project.toml
RUN pip3 install -e . # 除了更好的方式是使用 python -m pip3 install -e .
然后,我从构建中将以下内容复制到我的运行时镜像中。
-- runtime
ENV PATH "/opt/venv/bin:$PATH"
ENV PYTHONPATH "/opt/venv/bin:/app/tnc"
COPY --chown=appuser:appuser bin bin
COPY --chown=appuser:appuser tnc tnc
COPY --chown=appuser:appuser config.py config.py
COPY --from=builder /opt/venv/ /opt/venv
正如我之前提到的,在 Kubernetes 部署中,我使用以下方式启动容器:
command: ["python3"]
args: ["bin/api"]
寻找解决方案时的观察
以可以运行 Python REPL 的方式启动容器时:
import flask
生成AttributeError ...replace(' -> None', '')
- 从
PYTHONPATH
中 移除/app/tnc
后,import flask
生成ModuleNotFoundError ... no tnc
AttributeError ...replace(' -> None', '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/venv/lib/python3.10/site-packages/werkzeug/__init__.py", line 2, in <module>
from .test import Client as Client
File "/opt/venv/lib/python3.10/site-packages/werkzeug/test.py", line 35, in <module>
from .sansio.multipart import Data
File "/opt/venv/lib/python3.10/site-packages/werkzeug/sansio/multipart.py", line 19, in <module>
class Preamble(Event):
File "/usr/local/lib/python3.10/dataclasses.py", line 1175, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
File "/usr/local/lib/python3.10/dataclasses.py", line 1093, in _process_class
str(inspect.signature(cls)).replace(' -> None', ''))
AttributeError: module 'inspect' has no attribute 'signature'
ModuleNotFoundError: No module named 'tnc'
appuser@tnc-py-deployment-set-1:/app$ echo $PYTHONPATH
/opt/venv/bin
appuser@tnc-py-deployment-set-1:/app$ echo $PATH
/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
appuser@tnc-py-deployment-set-1:/app$ python -m /app/bin/api
/opt/venv/bin/python: No module named /app/bin/api
appuser@tnc-py-deployment-set-1:/app$ python /app/bin/api
Traceback (most recent call last):
File "/app/bin/api", line 12, in <module>
from tnc.s3 import S3Session
ModuleNotFoundError: No module named 'tnc'
项目结构
├── bin
│ └── api
├── config.py
├── pyproject.toml
└── tnc
├── __init__.py
├── data
│ ├── __init__.py
│ ├── download.py
│ ├── field_types.py
│ └── storage_providers
├── errors.py
├── inspect
│ ├── __init__.py
│ └── etl_time_index.py
├── test
│ ├── __init__.py
│ └── test_end-to-end.py
├── utils.py
└── www
├── __init__.py
└── routes
├── __init__.py
├── feedback.py
├── livez.py
└── utils.py
pyproject.toml
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["./"]
exclude = [ "res", "notes" ]
dependencies = [ ... with version specs ]
英文:
My project was up-and-running for a while running in a kubernetes container... until, I decided to "clean-up" my use of the sys.add
calls that I had at the top of my modules. This included describing my dependencies in pyproject.toml
, and all-together ditching setup.py
; it imported setup tools, called setup()
when __main__
.
The design intent is not to run anything in /tnc/app
as a script. But rather, a collection of modules, or a package. The only part of the codebase that serves as a __main__
is the api.py
file. It initializes and fires-up flask.
Implementation
I have a lean deployment setup that consists of the following:
- the core library in
/opt/venv
- my package
/app/tnc
- and the entry point
/app/bin/api
I kick-off the flask app with: python /app/bin/api
.
The build takes place in the python:3.11-slim
docker image. Here I install the recommended gcc and specify the following in the dockerfile:
-- build
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY pyproject.toml project.toml
RUN pip3 install -e . -- << aside: better would be to use python -m pip3 install -e .
I then copy the following from the build into my runtime image.
-- runtime
ENV PATH "/opt/venv/bin:$PATH"
ENV PYTHONPATH "/opt/venv/bin:/app/tnc"
COPY --chown=appuser:appuser bin bin
COPY --chown=appuser:appuser tnc tnc
COPY --chown=appuser:appuser config.py config.py
COPY --from=builder /opt/venv/ /opt/venv
As I mentioned, in the kubernetes deployment I fire-up the container with:
command: ["python3"]
args: ["bin/api"]
My observations working to find the solution
Firing up the container in such a way that I can run the python REPL:
import flask
generatesAttributeError ...replace(' -> None', '')
- remove
/app/tnc
from thePYTHONPATH
,import flask
generatesModuleNotFound ... no tnc
AttributeError ...replace(' -> None', '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/venv/lib/python3.10/site-packages/werkzeug/__init__.py", line 2, in <module>
from .test import Client as Client
File "/opt/venv/lib/python3.10/site-packages/werkzeug/test.py", line 35, in <module>
from .sansio.multipart import Data
File "/opt/venv/lib/python3.10/site-packages/werkzeug/sansio/multipart.py", line 19, in <module>
class Preamble(Event):
File "/usr/local/lib/python3.10/dataclasses.py", line 1175, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
File "/usr/local/lib/python3.10/dataclasses.py", line 1093, in _process_class
str(inspect.signature(cls)).replace(' -> None', ''))
AttributeError: module 'inspect' has no attribute 'signature'
ModuleNotFoundError: No module named 'tnc'
appuser@tnc-py-deployment-set-1:/app$ echo $PYTHONPATH
/opt/venv/bin
appuser@tnc-py-deployment-set-1:/app$ echo $PATH
/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
appuser@tnc-py-deployment-set-1:/app$ python -m /app/bin/api
/opt/venv/bin/python: No module named /app/bin/api
appuser@tnc-py-deployment-set-1:/app$ python /app/bin/api
Traceback (most recent call last):
File "/app/bin/api", line 12, in <module>
from tnc.s3 import S3Session
ModuleNotFoundError: No module named 'tnc'
The project structure
├── bin
│   └── api
├── config.py
├── pyproject.toml
└── tnc
├── __init__.py
├── data
│   ├── __init__.py
│   ├── download.py
│   ├── field_types.py
│   └── storage_providers
├── errors.py
├── inspect
│   ├── __init__.py
│   └── etl_time_index.py
├── test
│   ├── __init__.py
│   └── test_end-to-end.py
├── utils.py
└── www
├── __init__.py
└── routes
├── __init__.py
├── feedback.py
├── livez.py
└── utils.py
pyproject.toml
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["./"]
exclude = [ "res", "notes" ]
dependencies = [ ... with version specs ]
答案1
得分: 0
首先,我必须向pyproject.toml
+ setuptools
团队大声喊话:文档和实现已经变得很好。它让我能够更加具体和“确定性”:)) 关于我的设置。更不用说,在构建过程中更加积极。
修复“未找到”错误
修复包括以下内容:
- 使用以下内容更新了
pyproject.toml
:
[tool.setuptools.package-dir]
tnc = "tnc"
bin = "bin"
# 入口点(不是必需的,但是符合人体工程学)
[project.scripts]
run-api = "bin.api:main"
我包括了一个__init__
来标记每个子模块。
- 也许不是必需的,但我将
config.py
文件移到了bin
目录中。这个位置捕捉到了我的设计意图。对api.py
文件的更改...
# 使用对config.py的字符串引用实例化config对象
app.config.from_object("bin.config.DevelopmentConfig")
...
# 添加了一个main()函数,以启用指定入口点的选项
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
app.run(host=app.config['HOST'], port=app.config['PORT'])
def main():
"""如果使用入口点脚本"""
logging.basicConfig(level=logging.DEBUG)
app.run(host=app.config['HOST'], port=app.config['PORT'])
- 在Dockerfile中,我将
PYTHONPATH
环境值设置为"/app",这是"tnc"和"bin"目录的位置。这绝不是最佳实践,但在这种情况下,考虑到我要将"bin"与"tnc"分开,这似乎是唯一有意义的方式。这个用例似乎是正确的方式。
改进的构建过程
最后,虽然有一些众所周知的技巧可以在构建Docker镜像时最大程度地重用缓存,但我想强调使用最新的setuptools
配置的pyproject.toml
时,了解构建过程的确切情况有多容易。
A. 首先使用空存根运行构建非常简单,该存根最终将包含应用程序代码的位置。
# pyproject.dependencies.toml
packages = ["tnc"]
...与2阶段构建配对(镜像是官方的Docker Python镜像)
# 确保使用Python基础映像中的虚拟环境:
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# 阶段1:使用空项目目录进行依赖项构建
COPY pyproject.dependencies.toml pyproject.toml
RUN mkdir tnc
RUN pip3 install .
# 阶段2:完整和最终构建
COPY bin bin
COPY tnc tnc
COPY pyproject.toml pyproject.toml
RUN pip3 install .
B. 从现在已经合并的构建工件中清楚地看出,需要复制到用于分发的镜像中
COPY --from=builder --chown=appuser:appuser /app/build/lib/tnc tnc
COPY --from=builder --chown=appuser:appuser /app/build/lib/bin bin
COPY --from=builder --chown=root:root /opt/venv/ /opt/venv
在Kube部署中,尽管可以使用pyproject.toml
配置的入口点调用,但我选择调用api.py
作为脚本。
# 在用于镜像分发的kube部署中
command: ["python"]
args: ["/app/bin/api.py"]
结论
我有了一个改进的设计,不再包括对sys.path
的“临时”调用,也不再“污染”PYTHONPATH
。我现在只有一个入口点,即/app
,这传达了一个重要的设计选择:希望将入口点放在一个单独的根目录中。
英文:
First, I have to shout-out to the pyproject.toml
+ setuptools
team: the documentation and implementation has gotten good. It allowed me to get a lot more specific and "deterministic" :)) about my setup. Not to mention, a bit more aggressive in the build process.
Fixing the "not found" errors
The fix included the following:
- updated the
pyproject.toml
with the following
[tool.setuptools.package-dir]
tnc = "tnc"
bin = "bin"
# entry point (not required but is ergonomic)
[project.scripts]
run-api = "bin.api:main"
I included a __init__
to mark each submodule.
- Perhaps not required, but I moved the
config.py
file into thebin
directory. This location captured my design intent. Changes to theapi.py
file...
# instantiate the config object using a string ref to the config.py
app.config.from_object("bin.config.DevelopmentConfig")
...
# added a def main() to enable the option of specifying an entry point
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
app.run(host=app.config['HOST'], port=app.config['PORT'])
def main():
""" if using entrypoint script """
logging.basicConfig(level=logging.DEBUG)
app.run(host=app.config['HOST'], port=app.config['PORT'])
- In the Dockerfile I set the
PYTHONPATH
env value to "/app", the location of thetnc
andbin
directories. By no means a best practice, but in this case, given my determination to havebin
separate fromtnc
, the only way that made sense. This use case seemed the right way to go.
Improved build process
Finally, while there are a few well known techniques to maximize the reuse of the cache when building the docker image, I wanted to call out how easy it was to know precisely what was going on during the build, made possible by the latest setuptool
configured with pyproject.toml
.
A. It was trivial to first run the build using empty stub for where the app code would eventually go.
# pyproject.dependencies.toml
packages = ["tnc"]
... paired with the 2 phased build (the image is an official docker python image)
# Make sure to use the venv from the python base img:
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# phase 1: dependency build using an empty project dir
COPY pyproject.dependencies.toml pyproject.toml
RUN mkdir tnc
RUN pip3 install .
# phase 2: full and final build
COPY bin bin
COPY tnc tnc
COPY pyproject.toml pyproject.toml
RUN pip3 install .
B. It was clear what to copy from the now consolidated build artifacts, into my image used for distribution
COPY --from=builder --chown=appuser:appuser /app/build/lib/tnc tnc
COPY --from=builder --chown=appuser:appuser /app/build/lib/bin bin
COPY --from=builder --chown=root:root /opt/venv/ /opt/venv
In the kube deployment, despite being able call the entry point configured using pyproject.toml
, I chose to call the api.py
as a script.
# in the kube deployment for the image
command: ["python"]
args: ["/app/bin/api.py"]
Conclusion
I have an improved design that no longer includes "ad-hoc" calls to sys.path, nor resorts to "polluting" the
PYTHONPATH. The single entry I now have,
/app`, conveys an important design choice: wanting to have the entry point be in a separate root directory.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论