英文:
How to download NLTK package with proper security certificates inside docker container?
问题
以下是您提供的内容的翻译:
我已尝试了这里和其他地方提到的所有组合,但我一直收到相同的错误消息。
这是我的 Dockerfile
:
FROM python:3.9
RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2
# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
COPY . /app
# Run the application:
CMD ["python", "-u", "app.py"]
Docker 镜像成功构建(我正在使用 platform 参数,因为我正在构建要在 Linux 内运行的镜像,但我的本地机器是 Windows,detectron 库没有在 Windows 上安装):
>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
...
但当尝试运行镜像时,我一直收到以下错误:
>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data] violation of protocol (_ssl.c:1129)>
...
详细信息请参考这里。
英文:
I have tried all combinations mentioned here and other places, but I keep getting the same error.
Here is my Dockerfile
:
FROM python:3.9
RUN pip install virtualenv && virtualenv venv -p python3
ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
RUN git clone https://github.com/facebookresearch/detectron2.git
RUN python -m pip install -e detectron2
# Install dependencies
RUN apt-get update && apt-get install libgl1 -y
RUN pip install -U nltk
RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
COPY . /app
# Run the application:
CMD ["python", "-u", "app.py"]
The docker image gets built fine (I'm using the platform argument as I'm building the image to be run inside Linux, but my local machine where I'm building the image is Windows and the detectron
library doesn't get installed in Windows):
>>> docker buildx build --platform=linux/amd64 -t my_app .
[+] Building 23.2s (16/16) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 634B 0.0s
=> [internal] load metadata for docker.io/library/python:3.9 0.9s
=> [internal] load build context 0.0s
=> => transferring context: 1.85kB 0.0s
=> [ 1/11] FROM docker.io/library/python:3.9@sha256:6ea9dafc96d7914c5c1d199f1f0195c4e05cf017b10666ca84cb7ce8e269 0.0s
=> CACHED [ 2/11] RUN pip install virtualenv && virtualenv venv -p python3 0.0s
=> CACHED [ 3/11] WORKDIR /app 0.0s
=> CACHED [ 4/11] COPY requirements.txt ./ 0.0s
=> CACHED [ 5/11] RUN pip install -r requirements.txt 0.0s
=> CACHED [ 6/11] RUN git clone https://github.com/facebookresearch/detectron2.git 0.0s
=> CACHED [ 7/11] RUN python -m pip install -e detectron2 0.0s
=> CACHED [ 8/11] RUN apt-get update && apt-get install libgl1 -y 0.0s
=> CACHED [ 9/11] RUN pip install -U nltk 0.0s
=> [10/11] RUN [ "python3", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ] 22.1s
=> [11/11] COPY . /app 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:83e2495addbc4cdf9b0885e1bb4c5b0fb0777177956eda56950bbf59c095d23b 0.0s
=> => naming to docker.io/library/my_app
But I keep getting the error below when trying to run the image:
>>> docker run -p 8080:8080 my_app
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data] violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading punkt: <urlopen error EOF occurred in
[nltk_data] violation of protocol (_ssl.c:1129)>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data] EOF occurred in violation of protocol (_ssl.c:1129)>
Traceback (most recent call last):
File "/app/app.py", line 16, in <module>
index = VectorstoreIndexCreator().from_loaders(loaders)
File "/venv/lib/python3.9/site-packages/langchain/indexes/vectorstore.py", line 72, in from_loaders
docs.extend(loader.load())
File "/venv/lib/python3.9/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
File "/venv/lib/python3.9/site-packages/langchain/document_loaders/pdf.py", line 37, in _get_elements
return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 75, in partition_pdf
return partition_pdf_or_image(
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 137, in partition_pdf_or_image
return _partition_pdf_with_pdfminer(
File "/venv/lib/python3.9/site-packages/unstructured/utils.py", line 43, in wrapper
return func(*args, **kwargs)
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 248, in _partition_pdf_with_pdfminer
elements = _process_pdfminer_pages(
File "/venv/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 293, in _process_pdfminer_pages
_elements = partition_text(text=text)
File "/venv/lib/python3.9/site-packages/unstructured/partition/text.py", line 89, in partition_text
elif is_possible_narrative_text(ctext):
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 76, in is_possible_narrative_text
if exceeds_cap_ratio(text, threshold=cap_threshold):
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 273, in exceeds_cap_ratio
if sentence_count(text, 3) > 1:
File "/venv/lib/python3.9/site-packages/unstructured/partition/text_type.py", line 222, in sentence_count
sentences = sent_tokenize(text)
File "/venv/lib/python3.9/site-packages/unstructured/nlp/tokenize.py", line 38, in sent_tokenize
return _sent_tokenize(text)
File "/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load(f"tokenizers/punkt/{language}.pickle")
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 750, in load
opened_resource = _open(resource_url)
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 876, in _open
return find(path_, path + [""]).open()
File "/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/root/nltk_data'
- '/venv/nltk_data'
- '/venv/share/nltk_data'
- '/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
答案1
得分: 0
我从WiFi断开了我的机器,连接到了我的手机热点,然后它运行时没有任何错误,因为它现在可以下载NLTK包。非常奇怪(和愚蠢)的问题。我想知道是否有更好的解决方案,因为其他方法对我都没有用。
英文:
I disconnected my machine from WiFi and connected it to my phone's hotspot, then it runs without any error, as it is now able to download the NLTK package. Extremely weird (and silly) issue. I wonder if there's a better solution, as nothing else worked for me.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论