构建 Apache Nutch Docker 容器

huangapple go评论55阅读模式
英文:

Building Apache Nutch Docker container

问题

我正在遵循安装Apache Nutch的说明:

https://hub.docker.com/r/apache/nutch
https://hub.docker.com/r/apache/nutch/dockerfile

注意给开发者:确保此文件通过运行 https://github.com/replicatedhq/dockerfilelint 进行代码检查

BUILD_MODE可以是

0 == 使用Nutch主分支源安装,带有 'crawl' 和 'nutch' 脚本

1 == 与模式0相同,增加了Nutch REST Server

2 == 与模式1相同,增加了Nutch WebApp

ARG BUILD_MODE=0

FROM alpine:3.13 AS base

ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0
ARG WEBAPP_PORT=8080

LABEL maintainer="Apache Nutch Developers dev@nutch.apache.org"
LABEL org.opencontainers.image.authors="Apache Nutch Developers dev@nutch.apache.org"
LABEL org.opencontainers.image.description="用于运行Apache Nutch的Docker镜像,这是一个高度可扩展和可伸缩的开源网络爬虫软件项目。访问项目网站:https://nutch.apache.org"
LABEL org.opencontainers.image.documentation="https://hub.docker.com/r/apache/nutch"
LABEL org.opencontainers.image.licenses="Apache-2.0"
LABEL org.opencontainers.image.source="https://raw.githubusercontent.com/apache/nutch/master/docker/Dockerfile"
LABEL org.opencontainers.image.title="Apache Nutch 1.x Docker Image"
LABEL org.opencontainers.image.url="https://hub.docker.com/r/apache/nutch"
LABEL org.opencontainers.image.vendor="Apache Nutch https://nutch.apache.org"

WORKDIR /root/

安装依赖项

RUN apk update
RUN apk add apache-ant bash git openjdk11 supervisor

设置环境变量

RUN echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> $HOME/.bashrc
ENV JAVA_HOME='/usr/lib/jvm/java-11-openjdk'
ENV NUTCH_HOME='/root/nutch_source/runtime/local'

检出并构建Nutch主分支(1.x)

RUN git clone https://github.com/apache/nutch.git nutch_source &&
cd nutch_source &&
ant runtime &&
rm -rf build/ &&
rm -rf /root/.ivy2/

创建nutch和crawl的运行时本地bin/nutch和runtime/local/bin/crawl的符号链接

RUN ln -sf $NUTCH_HOME/bin/nutch /usr/local/bin/
RUN ln -sf $NUTCH_HOME/bin/crawl /usr/local/bin/

FROM base AS branch-version-0

RUN echo "Nutch主分支源安装,带有 'crawl' 和 'nutch' 脚本"

FROM base AS branch-version-1

RUN echo "Nutch主分支源安装,带有 'crawl' 和 'nutch' 脚本,以及Nutch REST Server,监听地址为 $SERVER_HOST:$SERVER_PORT"
ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0

ENV SERVER_PORT=$SERVER_PORT
ENV SERVER_HOST=$SERVER_HOST

为supervisord安排必要的设置

RUN mkdir -p /var/log/supervisord
COPY ./config/supervisord_startserver.conf /etc/supervisord.conf

暴露服务器端口,只有在容器运行时发布了相同的端口才能访问

EXPOSE $SERVER_PORT

ENTRYPOINT [ "supervisord", "--nodaemon", "--configuration", "/etc/supervisord.conf" ]

FROM base AS branch-version-2

RUN echo "Nutch主分支源安装,带有 'crawl' 和 'nutch' 脚本,Nutch REST Server,监听地址为 $SERVER_HOST:$SERVER_PORT,以及WebApp,容器端口为 $WEBAPP_PORT"
ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0
ARG WEBAPP_PORT=8080

ENV SERVER_PORT=$SERVER_PORT
ENV SERVER_HOST=$SERVER_HOST
ENV WEBAPP_PORT=$WEBAPP_PORT

安装WebApp

RUN apk add maven
RUN git clone https://github.com/apache/nutch-webapp.git nutch_webapp && cd nutch_webapp && mvn package

为supervisord安排必要的设置

RUN mkdir -p /var/log/supervisord
COPY ./config/supervisord_startserver_webapp.conf /etc/supervisord.conf

暴露服务器和WebApp的端口,只有在容器运行时发布了相同的端口才能访问

EXPOSE $SERVER_PORT
EXPOSE $WEBAPP_PORT

ENTRYPOINT [ "supervisord", "--nodaemon", "--configuration", "/etc/supervisord.conf" ]

FROM branch-version-$BUILD_MODE AS final
RUN echo "成功构建镜像,请查看 https://s.apache.org/m5933 以获取运行容器实例的指导。"

=> 错误 [branch-version-2 5/5] 复制./config/supervisord_startserver_webapp.conf /etc/supervisord.conf
0.0s

[branch-version-2 5/5] 复制./config/supervisord_startserver_webapp.conf /etc/supervisord.conf:
------ 计算缓存键失败:无法遍历/var/lib/docker/tmp/buildkit-mount3360673970/config:lstat
/var/lib/docker/tmp/buildkit-mount3360673970/config:没有这个文件或目录

英文:

I am following the instructions for installing Apache Nutch at:

https://hub.docker.com/r/apache/nutch
https://hub.docker.com/r/apache/nutch/dockerfile

# NOTE TO DEVELOPERS: Make sure this file passes linting tests
# by running https://github.com/replicatedhq/dockerfilelint

# BUILD_MODE can be either
#  0 == Nutch master branch source install with 'crawl' and 'nutch' scripts on PATH
#  1 == Same as mode 0 with addition of Nutch REST Server
#  2 == Same as mode 1 with addition of Nutch WebApp
ARG BUILD_MODE=0

FROM alpine:3.13 AS base

ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0
ARG WEBAPP_PORT=8080

LABEL maintainer="Apache Nutch Developers <dev@nutch.apache.org>"
LABEL org.opencontainers.image.authors="Apache Nutch Developers <dev@nutch.apache.org>"
LABEL org.opencontainers.image.description="Docker image for running Apache Nutch, a highly extensible and scalable open source web crawler software project. Visit the project website at https://nutch.apache.org"
LABEL org.opencontainers.image.documentation="https://hub.docker.com/r/apache/nutch"
LABEL org.opencontainers.image.licenses="Apache-2.0"
LABEL org.opencontainers.image.source="https://raw.githubusercontent.com/apache/nutch/master/docker/Dockerfile"
LABEL org.opencontainers.image.title="Apache Nutch 1.x Docker Image"
LABEL org.opencontainers.image.url="https://hub.docker.com/r/apache/nutch"
LABEL org.opencontainers.image.vendor="Apache Nutch https://nutch.apache.org"

WORKDIR /root/

# Install dependencies
RUN apk update
RUN apk add apache-ant bash git openjdk11 supervisor

# Establish environment variables
RUN echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> $HOME/.bashrc
ENV JAVA_HOME='/usr/lib/jvm/java-11-openjdk'
ENV NUTCH_HOME='/root/nutch_source/runtime/local'

# Checkout and build the Nutch master branch (1.x)
RUN git clone https://github.com/apache/nutch.git nutch_source && \
     cd nutch_source && \
     ant runtime && \
     rm -rf build/ && \
     rm -rf /root/.ivy2/

# Create symlinks for runtime/local/bin/nutch and runtime/local/bin/crawl
RUN ln -sf $NUTCH_HOME/bin/nutch /usr/local/bin/
RUN ln -sf $NUTCH_HOME/bin/crawl /usr/local/bin/

FROM base AS branch-version-0

RUN echo "Nutch master branch source install with 'crawl' and 'nutch' scripts on PATH"

FROM base AS branch-version-1

RUN echo "Nutch master branch source install with 'crawl' and 'nutch' scripts on PATH and Nutch REST Server on $SERVER_HOST:$SERVER_PORT"
ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0

ENV SERVER_PORT=$SERVER_PORT
ENV SERVER_HOST=$SERVER_HOST

# Arrange necessary setup for supervisord
RUN mkdir -p /var/log/supervisord
COPY ./config/supervisord_startserver.conf /etc/supervisord.conf

# Expose port for server which can only be accessed if 
# the same port is published when the container is run.
EXPOSE $SERVER_PORT

ENTRYPOINT [ "supervisord", "--nodaemon", "--configuration", "/etc/supervisord.conf" ]

FROM base AS branch-version-2

RUN echo "Nutch master branch source install with 'crawl' and 'nutch' scripts on PATH, Nutch REST Server on $SERVER_HOST:$SERVER_PORT and WebApp on this container port $WEBAPP_PORT"
ARG SERVER_PORT=8081
ARG SERVER_HOST=0.0.0.0
ARG WEBAPP_PORT=8080

ENV SERVER_PORT=$SERVER_PORT
ENV SERVER_HOST=$SERVER_HOST
ENV WEBAPP_PORT=$WEBAPP_PORT

# Install the webapp
RUN apk add maven
RUN git clone https://github.com/apache/nutch-webapp.git nutch_webapp && cd nutch_webapp && mvn package

# Arrange necessary setup for supervisord
RUN mkdir -p /var/log/supervisord
COPY ./config/supervisord_startserver_webapp.conf /etc/supervisord.conf

# Expose ports for server and webapp, these can only be accessed if 
# the same ports are published when the container is run.
EXPOSE $SERVER_PORT
EXPOSE $WEBAPP_PORT

ENTRYPOINT [ "supervisord", "--nodaemon", "--configuration", "/etc/supervisord.conf" ]

FROM branch-version-$BUILD_MODE AS final
RUN echo "Successfully built image, see https://s.apache.org/m5933 for guidance on running a container instance."

> => ERROR [branch-version-2 5/5] COPY
> ./config/supervisord_startserver_webapp.conf /etc/supervisord.conf
> 0.0s
> ------
> > [branch-version-2 5/5] COPY ./config/supervisord_startserver_webapp.conf /etc/supervisord.conf:
> ------ failed to compute cache key: failed to walk /var/lib/docker/tmp/buildkit-mount3360673970/config: lstat
> /var/lib/docker/tmp/buildkit-mount3360673970/config: no such file or
> directory

答案1

得分: 1

我遇到了相同的问题。
我将https://github.com/apache/nutch项目下载到我的工作目录。
然后,您可以找到docker目录,在该目录中有Dockerfile和配置目录,其中包含您需要构建镜像所缺少的文件。

英文:

I had the same problem.
I downloaded the https://github.com/apache/nutch project to my working dir.

Then you may find the docker dir and inside that dir there is the Dockerfile with the config dir that has the files that are missing for you to build the image.

huangapple
  • 本文由 发表于 2023年2月6日 07:17:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356172.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定