如何在Docker容器内使用正确的安全证书下载NLTK包?

huangapple go评论82阅读模式
英文:

How to download NLTK package with proper security certificates inside docker container?

问题

以下是您提供的内容的翻译:

我已尝试了这里和其他地方提到的所有组合,但我一直收到相同的错误消息。

这是我的 Dockerfile

FROM python:3.9

RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt

RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2

# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]

COPY . /app

# Run the application:
CMD ["python", "-u", "app.py"]

Docker 镜像成功构建(我正在使用 platform 参数,因为我正在构建要在 Linux 内运行的镜像,但我的本地机器是 Windows,detectron 库没有在 Windows 上安装):

>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
...

但当尝试运行镜像时,我一直收到以下错误:

>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)>
...

详细信息请参考这里

英文:

I have tried all combinations mentioned here and other places, but I keep getting the same error.

Here is my Dockerfile:

FROM python:3.9

RUN pip install virtualenv &amp;&amp; virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH=&quot;$VIRTUAL_ENV/bin:$PATH&quot;

WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt

RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2

# Install dependencies
RUN apt-get update &amp;&amp; apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ &quot;python3&quot;, &quot;-c&quot;, &quot;import nltk; nltk.download(&#39;punkt&#39;, download_dir=&#39;/usr/local/nltk_data&#39;)&quot; ]

COPY . /app

# Run the application:
CMD [&quot;python&quot;, &quot;-u&quot;, &quot;app.py&quot;]

The docker image gets built fine (I'm using the platform argument as I'm building the image to be run inside Linux, but my local machine where I'm building the image is Windows and the detectron library doesn't get installed in Windows):

&gt;&gt;&gt; docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
 =&gt; [internal] load .dockerignore                                                                                  0.0s
 =&gt; =&gt; transferring context: 2B                                                                                    0.0s
 =&gt; [internal] load build definition from Dockerfile                                                               0.0s
 =&gt; =&gt; transferring dockerfile: 634B                                                                               0.0s
 =&gt; [internal] load metadata for docker.io/library/python:3.9                                                      0.9s
 =&gt; [internal] load build context                                                                                  0.0s
 =&gt; =&gt; transferring context: 1.85kB                                                                                0.0s
 =&gt; [ 1/11] FROM docker.io/library/python:3.9@sha256:6ea9dafc96d7914c5c1d199f1f0195c4e05cf017b10666ca84cb7ce8e269  0.0s
 =&gt; CACHED [ 2/11] RUN pip install virtualenv &amp;&amp; virtualenv venv -p python3                                        0.0s
 =&gt; CACHED [ 3/11] WORKDIR /app                                                                                    0.0s
 =&gt; CACHED [ 4/11] COPY requirements.txt ./                                                                        0.0s
 =&gt; CACHED [ 5/11] RUN pip install -r requirements.txt                                                             0.0s
 =&gt; CACHED [ 6/11] RUN git clone https://github.com/facebookresearch/detectron2.git                                0.0s
 =&gt; CACHED [ 7/11] RUN python -m pip install -e detectron2                                                         0.0s
 =&gt; CACHED [ 8/11] RUN apt-get update &amp;&amp; apt-get install libgl1 -y                                                 0.0s
 =&gt; CACHED [ 9/11] RUN pip install -U nltk                                                                         0.0s
 =&gt; [10/11] RUN [ &quot;python3&quot;, &quot;-c&quot;, &quot;import nltk; nltk.download(&#39;punkt&#39;, download_dir=&#39;/usr/local/nltk_data&#39;)&quot; ]   22.1s
 =&gt; [11/11] COPY . /app                                                                                            0.0s
 =&gt; exporting to image                                                                                             0.1s
 =&gt; =&gt; exporting layers                                                                                            0.1s
 =&gt; =&gt; writing image sha256:83e2495addbc4cdf9b0885e1bb4c5b0fb0777177956eda56950bbf59c095d23b                       0.0s
 =&gt; =&gt; naming to docker.io/library/my_app

But I keep getting the error below when trying to run the image:

&gt;&gt;&gt; docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: &lt;urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)&gt;
[nltk_data] Error loading punkt: &lt;urlopen error EOF occurred in
[nltk_data]     violation of protocol (_ssl.c:1129)&gt;
[nltk_data] Error loading averaged_perceptron_tagger: &lt;urlopen error
[nltk_data]     EOF occurred in violation of protocol (_ssl.c:1129)&gt;
Traceback (most recent call last):
  File &quot;/app/app.py&quot;, line 16, in &lt;module&gt;
    index = VectorstoreIndexCreator().from_loaders(loaders)
  File &quot;/venv/lib/python3.9/site-packages/langchain/indexes/vectorstore.py&quot;, line 72, in from_loaders
    docs.extend(loader.load())
  File &quot;/venv/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py&quot;, line 70, in load
    elements = self._get_elements()
  File &quot;/venv/lib/python3.9/site-packages/langchain/document_loaders/pdf.py&quot;, line 37, in _get_elements
    return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py&quot;, line 75, in partition_pdf
    return partition_pdf_or_image(
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py&quot;, line 137, in partition_pdf_or_image
    return _partition_pdf_with_pdfminer(
  File &quot;/venv/lib/python3.9/site-packages/unstructured/utils.py&quot;, line 43, in wrapper
    return func(*args, **kwargs)
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py&quot;, line 248, in _partition_pdf_with_pdfminer
    elements = _process_pdfminer_pages(
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py&quot;, line 293, in _process_pdfminer_pages
    _elements = partition_text(text=text)
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/text.py&quot;, line 89, in partition_text
    elif is_possible_narrative_text(ctext):
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py&quot;, line 76, in is_possible_narrative_text
    if exceeds_cap_ratio(text, threshold=cap_threshold):
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py&quot;, line 273, in exceeds_cap_ratio
    if sentence_count(text, 3) &gt; 1:
  File &quot;/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py&quot;, line 222, in sentence_count
    sentences = sent_tokenize(text)
  File &quot;/venv/lib/python3.9/site-packages/unstructured/nlp/tokenize.py&quot;, line 38, in sent_tokenize
    return _sent_tokenize(text)
  File &quot;/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py&quot;, line 106, in sent_tokenize
    tokenizer = load(f&quot;tokenizers/punkt/{language}.pickle&quot;)
  File &quot;/venv/lib/python3.9/site-packages/nltk/data.py&quot;, line 750, in load
    opened_resource = _open(resource_url)
  File &quot;/venv/lib/python3.9/site-packages/nltk/data.py&quot;, line 876, in _open
    return find(path_, path + [&quot;&quot;]).open()
  File &quot;/venv/lib/python3.9/site-packages/nltk/data.py&quot;, line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  &gt;&gt;&gt; import nltk
  &gt;&gt;&gt; nltk.download(&#39;punkt&#39;)

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - &#39;/root/nltk_data&#39;
    - &#39;/venv/nltk_data&#39;
    - &#39;/venv/share/nltk_data&#39;
    - &#39;/venv/lib/nltk_data&#39;
    - &#39;/usr/share/nltk_data&#39;
    - &#39;/usr/local/share/nltk_data&#39;
    - &#39;/usr/lib/nltk_data&#39;
    - &#39;/usr/local/lib/nltk_data&#39;
    - &#39;&#39;
**********************************************************************

答案1

得分: 0

我从WiFi断开了我的机器,连接到了我的手机热点,然后它运行时没有任何错误,因为它现在可以下载NLTK包。非常奇怪(和愚蠢)的问题。我想知道是否有更好的解决方案,因为其他方法对我都没有用。

英文:

I disconnected my machine from WiFi and connected it to my phone's hotspot, then it runs without any error, as it is now able to download the NLTK package. Extremely weird (and silly) issue. I wonder if there's a better solution, as nothing else worked for me.

huangapple
  • 本文由 发表于 2023年5月17日 20:15:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定