如何在Dataflow Worker上安装私有仓库?

huangapple go评论83阅读模式
英文:

How to install private repository on Dataflow Worker?

问题

我们在Dataflow作业部署过程中遇到了问题。

错误信息

我们使用CustomCommands来在工作节点上安装私有仓库,但是现在我们在作业的worker-startup日志中遇到了以下错误:

Running command: ['pip', 'install', 'git+ssh://git@github.com/my_private_repo.git@v1.0.0']

Command output: b'Traceback (most recent call last):
File "/usr/local/bin/pip", line 6, in <module>
from pip._internal import main\nModuleNotFoundError: No module named \'pip\'\n'

这段代码曾经工作正常,但自从上周五部署服务以来就不再正常。

一些背景信息

  1. 我们使用GAE服务和定时作业来部署Dataflow作业,使用Python SDK。
  2. 在我们的作业中,我们使用存储在私有存储库中的代码。
  3. 为了允许工作节点拉取私有存储库,我们使用一个带有CustomCommands的setup.py,在工作节点启动期间运行这些命令。 (官方仓库的代码示例在这里)
  4. 这些命令会从GCS中检索一个编码的SSH密钥,使用KMS解码它,获取一个SSH配置文件以指定密钥的路径和允许的主机,然后执行pip install git+ssh://git@github.com/my_private_repo.git@v1.0.0(请参阅下面的命令)。
CUSTOM_COMMANDS = [
    # 检索SSH密钥
    ["gsutil", "cp", "gs://{bucket_name}/encrypted_python_repo_ssh_key".format(bucket_name=credentials_bucket), "encrypted_key"],
    [
        "gcloud",
        "kms",
        "decrypt",
        "--location",
        "global",
        "--keyring",
        project,
        "--key",
        project,
        "--plaintext-file",
        "decrypted_key",
        "--ciphertext-file",
        "encrypted_key",
    ],
    ["chmod", "700", "decrypted_key"],
    
    # 安装git和ssh
    ["apt-get", "update"],
    ["apt-get", "install", "-y", "openssh-server"],
    ["apt-get", "install", "-y", "git"],

    # 添加指定密钥位置和主机的SSH配置
    [
        "gsutil",
        "cp",
        "gs://{bucket_name}/ssh_config_gcloud".format(bucket_name=credentials_bucket),
        "~/.ssh/config",
    ],
    [
        "pip",
        "install",
        "git+ssh://git@github.com/my_private_repo.git@v1.0.0",
    ],
]

我们尝试过的方法

  • 根据pip的问题#5599的反馈,似乎存在多个pip版本之间的冲突。
    我们尝试在CustomCommands中重新安装它,添加apt-get --reinstall install -y python-setuptools python-wheel python-pip(以及其他类似的变体,如curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3 get-pip.py --force-reinstall),但没有具体的改进。

需要注意的事项

  • 本地启动的作业可以正常工作(为什么?我很好奇为什么可以工作,因为CustomCommands没有运行)
  • 登录到计算实例并连接到docker进程,手动运行命令不会显示任何错误日志
  • 服务使用自定义Dockerfile部署,定义如下:
FROM gcr.io/google-appengine/python
RUN apt-get update && apt-get install -y openssh-server
RUN virtualenv /env -p python3.7

# 设置这些环境变量与运行 source /env/bin/activate 一样。
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

# 设置用于git的凭证,运行pip以安装所有依赖项到虚拟环境中。
... 指定SSH密钥、主机,以允许私有git仓库拉取

# 添加应用程序源代码。
ADD . /app
RUN pip install -r /app/requirements.txt && python /app/setup.py install && python /app/setup.py build
CMD gunicorn -b :$PORT main:app

有没有关于如何解决这个问题或任何可用的解决方法的想法?

谢谢你的帮助!

编辑

这主要是由于机器的本地状态或我们的计算机造成的。

在运行一些命令,如python setup.py installpython setup.py build之后,我现在无法再部署作业了(在服务部署期间面临与"worker-startup"相同的错误),但我的同事仍然能够部署作业(相同的代码、相同的分支,只是从.gitignore中排除了目录,如builddist等),而这些作业正在运行。在他的情况下,CustomCommands在作业部署时没有运行(但工作节点仍然可以使用本地打包的管道)。

有没有办法指定工作节点使用编译后的包?我找不到相关的文档...

解决方法

由于我们无法从Dataflow工作节点拉取私有代码,我们使用了以下解决方法:

  • 使用python setup.py sdist bdist_wheel构建我们私有仓库的wheel包。
  • 将这个wheel包嵌入到我们的Dataflow包中的lib/my-package-1.0.0-py3-none-any.whl目录下。
  • 将这个wheel包传递给Dataflow选项作为额外的包(参见beam代码这里)。
pipeline_options = PipelineOptions()
pipeline_options.view_as(

<details>
<summary>英文:</summary>

We&#39;re facing issues during Dataflow jobs deployment.

### The error
We are using CustomCommands to install private repo on workers, but we face now an error in the `worker-startup` logs of our jobs:

    Running command: [&#39;pip&#39;, &#39;install&#39;, &#39;git+ssh://git@github.com/my_private_repo.git@v1.0.0&#39;]

    Command output: b&#39;Traceback (most recent call last):
    File &quot;/usr/local/bin/pip&quot;, line 6, in &lt;module&gt;
    from pip._internal import main\nModuleNotFoundError: No module named \&#39;pip\&#39;\n&#39; 

This code was working but since our last deploy of the service on Friday, it&#39;s not.

### Some context

 1. We use a GAE service with a cron job to deploy Dataflow Jobs, using the python sdk
 2. In our jobs, we&#39;re using code stored in private repository
 3. To allow the workers to pull private repositories, we use a `setup.py` with customCommands which are run during worker startup. (code example from official repo [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py))
 4. The commands retrieve an encoded ssh key from GCS, decode it with KMS, get a ssh config file to specify path of the key &amp; allowed hosts then perform a `pip install git+ssh://git@github.com/my_private_repo.git@v1.0.0` (see commands below)

&lt;!-- begin snippet: js hide: true console: true babel: false --&gt;

&lt;!-- language: lang-html --&gt;

    CUSTOM_COMMANDS = [
    	# retrieve ssh key
        [&quot;gsutil&quot;, &quot;cp&quot;,&quot;gs://{bucket_name}/encrypted_python_repo_ssh_key&quot;.format(bucket_name=credentials_bucket), &quot;encrypted_key&quot;],
        [
            &quot;gcloud&quot;,
            &quot;kms&quot;,
            &quot;decrypt&quot;,
            &quot;--location&quot;,
            &quot;global&quot;,
            &quot;--keyring&quot;,
            project,
            &quot;--key&quot;,
            project,
            &quot;--plaintext-file&quot;,
            &quot;decrypted_key&quot;,
            &quot;--ciphertext-file&quot;,
            &quot;encrypted_key&quot;,
        ],
        [&quot;chmod&quot;, &quot;700&quot;, &quot;decrypted_key&quot;],
        
        # install git &amp; ssh
        [&quot;apt-get&quot;, &quot;update&quot;],
        [&quot;apt-get&quot;, &quot;install&quot;, &quot;-y&quot;, &quot;openssh-server&quot;],
        [&quot;apt-get&quot;, &quot;install&quot;, &quot;-y&quot;, &quot;git&quot;],

        # Add ssh config which specify the location of the key &amp; the host
        [
            &quot;gsutil&quot;,
            &quot;cp&quot;,
            &quot;gs://{bucket_name}/ssh_config_gcloud&quot;.format(bucket_name=credentials_bucket),
            &quot;~/.ssh/config&quot;,
        ],
        [
            &quot;pip&quot;,
            &quot;install&quot;,
            &quot;git+ssh://git@github.com/my_private_repo.git@v1.0.0&quot;,
        ],
    ]

&lt;!-- end snippet --&gt;

### What we tried
- Following this issue in pip [#5599](https://github.com/pypa/pip/issues/5599), it seems that there is a conflict between several versions of pip. 
We tried to reinstall it adding `apt-get --reinstall install -y python-setuptools python-wheel python-pip` (and other variations like `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py &amp;&amp; python3 get-pip.py --force-reinstall`) in the CustomCommands but no specific improvement.

### To Note:
 - Jobs started locally are working (How ? I&#39;m quite curious how can it work since the CustomCommands are not run)
 - Logging in the compute instance &amp; connect to the docker process &amp; running the commands manually doesn&#39;t show any error log
 - Service is deployed using a custom Dockerfile defined by following snippet
 
&lt;!-- begin snippet: js hide: true console: true babel: false --&gt;

&lt;!-- language: lang-html --&gt;

    FROM gcr.io/google-appengine/python
    RUN apt-get update &amp;&amp; apt-get install -y openssh-server
    RUN virtualenv /env -p python3.7

    # Setting these environment variables are the same as running
    # source /env/bin/activate.
    ENV VIRTUAL_ENV /env
    ENV PATH /env/bin:$PATH

    # Set credentials for git run pip to install all
    # dependencies into the virtualenv.
    ... specify SSH KEY, host, to allow private git repo pull 

    # Add the application source code.
    ADD . /app
    RUN pip install -r /app/requirements.txt &amp;&amp; python /app/setup.py install &amp;&amp; python /app/setup.py build
    CMD gunicorn -b :$PORT main:app

&lt;!-- end snippet --&gt;

Any idea about how to solve this issue, or any workaround available ?

Thanks for your help !

### Edit
This seems mostly due to local state of the machine, or our computers.

After running some commands like `python setup.py install` or `python setup.py build`, I&#39;m now unable to deploy jobs anymore (facing the same error during `worker-startup` as deployed by the service), but my colleague is still able to deploy jobs (same code, same branch, except excluded directories from .gitignore like `build`, `dist`, ...) which are running. In his case, CustomCommands are not run on job deployment (but workers are still able to use local packaged pipeline).

Any way to specify a compiled package to use by worker ? I was not able to find doc on that...

### Workaround
As we were not able to pull private code from dataflow worker, we used the following workaround:

 - Build a wheel of our private repo using `python setup.py sdist bdist_wheel`
 - Embed this wheel in our dataflow package under `lib/my-package-1.0.0-py3-none-any.whl`
 - Pass the wheel to dataflow options as extra package (see beam code [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L879))

#### Commands used

    pipeline_options = PipelineOptions()
    pipeline_options.view_as(SetupOptions).setup_file = &quot;./setup.py&quot;
    pipeline_options.view_as(SetupOptions).extra_packages = [&quot;./lib/my-package-1.0.0-py3-none-any.whl&quot;]

</details>


# 答案1
**得分**: 2

我建议对于除了非常复杂的公共的依赖项之外的情况使用[自定义容器][1]并提前安装所有依赖项

[1]: https://cloud.google.com/dataflow/docs/guides/using-custom-containers

<details>
<summary>英文:</summary>

For anything but non-trivial, public dependencies I would recommend using [custom containers][1] and installing all the dependencies ahead of time.


  [1]: https://cloud.google.com/dataflow/docs/guides/using-custom-containers

</details>



huangapple
  • 本文由 发表于 2020年1月7日 00:17:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/59615460.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定