英文:
Flink, Kubernetes, and Linkerd
问题
I am deploying some Flink jobs which require access to some services under a service mesh implemented via Linkerd and I'm running into this error:
java.lang.NoClassDefFoundError: Could not initialize class foo.bar.Job
我正在部署一些需要访问通过Linkerd实现的服务网格下的一些服务的Flink作业,但遇到了以下错误:
java.lang.NoClassDefFoundError: 无法初始化类 foo.bar.Job
I can confirm that the jar file contains the class that cannot be found apparently, so it's not a problem with the jar itself, but seems to be related to Linkerd. In particular, I'm using the following pod annotations for both the jobmanager and the taskmanager pods (taken from my Helm Chart values file):
我可以确认jar文件包含了明显找不到的类,因此这不是jar文件本身的问题,而似乎与Linkerd有关。特别是,我对作业管理器和任务管理器Pod使用了以下Pod注释(来自我的Helm Chart值文件):
podAnnotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: 6123,6124
config.linkerd.io/proxy-await: enabled
For what it's worth, I'm using the Ververica Platform (Community Edition) for deploying my jobs to Kubernetes, although I don't think the issue is VVP-specific:
值得一提的是,我正在使用Ververica Platform(社区版)来将我的作业部署到Kubernetes,尽管我认为问题不是特定于VVP的:
{{- define "vvp.deployment" }}
kind: Deployment
apiVersion: v1
metadata:
name: my-job
spec:
template:
spec:
artifact:
kind: jar
flinkImageRegistry: {{ .Values.flink.imageRegistry }}
flinkVersion: "1.15.1"
flinkImageTag: 1.15.1-stream1-scala_2.12-java11-linkerd
entryClass: foo.bar.Job
kubernetes:
jobManagerPodTemplate:
metadata:
{{- with .Values.flink.podAnnotations }}
annotations:
{{- toYaml . | nindent 14 }}
{{- end }}
spec:
containers:
- name: flink-jobmanager
command:
- linkerd-entrypoint.sh
taskManagerPodTemplate:
metadata:
{{- with .Values.flink.podAnnotations }}
annotations:
{{- toYaml . | nindent 14 }}
{{- end }}
{{- end }}
where the contents of linkerd-entrypoint.sh
are:
linkerd-entrypoint.sh
的内容如下:
#!/bin/bash
set -e
exec linkerd-await --shutdown -- "$@"
For extra context, the VVP and the flink jobs are deployed into different namespaces. Also, for the VVP pods, I'm not using any linkerd annotations whatsoever.
额外的上下文信息,VVP和Flink作业部署到不同的命名空间中。此外,对于VVP Pods,我根本没有使用任何Linkerd注释。
Has anyone encountered similar problems? The closest troubleshooting resource/guide that I've found so far is this one, which targets Istio instead of Linkerd.
是否有人遇到了类似的问题?到目前为止,我找到的最接近的故障排除资源/指南是这个,它是针对Istio而不是Linkerd的。
英文:
I am deploying some Flink jobs which require access to some services under a service mesh implemented via Linkerd and I'm running into this error:
java.lang.NoClassDefFoundError: Could not initialize class foo.bar.Job
I can confirm that the jar file contains the class that cannot be found apparently, so it's not a problem with the jar itself, but seems to be related to Linkerd. In particular, I'm using the following pod annotations for both the jobmanager and the taskmanager pods (taken from my Helm Chart values file):
podAnnotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: 6123,6124
config.linkerd.io/proxy-await: enabled
For what it's worth, I'm using the Ververica Platform (Community Edition) for deploying my jobs to Kubernetes, although I don't think the issue is VVP-specific:
{{- define "vvp.deployment" }}
kind: Deployment
apiVersion: v1
metadata:
name: my-job
spec:
template:
spec:
artifact:
kind: jar
flinkImageRegistry: {{ .Values.flink.imageRegistry }}
flinkVersion: "1.15.1"
flinkImageTag: 1.15.1-stream1-scala_2.12-java11-linkerd
entryClass: foo.bar.Job
kubernetes:
jobManagerPodTemplate:
metadata:
{{- with .Values.flink.podAnnotations }}
annotations:
{{- toYaml . | nindent 14 }}
{{- end }}
spec:
containers:
- name: flink-jobmanager
command:
- linkerd-entrypoint.sh
taskManagerPodTemplate:
metadata:
{{- with .Values.flink.podAnnotations }}
annotations:
{{- toYaml . | nindent 14 }}
{{- end }}
{{- end }}
where the contents of linkerd-entrypoint.sh
are:
#!/bin/bash
set -e
exec linkerd-await --shutdown -- "$@"
For extra context, the VVP and the flink jobs are deployed into different namespaces. Also, for the VVP pods, I'm not using any linkerd annotations whatsoever.
Has anyone encountered similar problems? The closest troubleshooting resource/guide that I've found so far is this one, which targets Istio instead of Linkerd.
答案1
得分: 2
以下是您要翻译的内容:
在确定了问题的根本原因后,我给自己的回答。
关于Linkerd,一切都设置正确。需要注意的主要预防措施是将linkerd-await
二进制文件添加到Flink镜像中,并确保覆盖作业管理器的入口点,否则在升级作业时会遇到问题。作业管理器不会终止Linkerd代理,因此它将一直保持NotReady
状态。同样,通过在主要命令中包装linkerd-await
调用来轻松解决这个问题。所以,首先将linkerd-await
二进制文件添加到您的Docker镜像中:
# 添加linkerd-await和linkerd-entrypoint.sh
USER root
RUN apt-get update && apt-get install -y wget
RUN wget https://github.com/linkerd/linkerd-await/releases/download/release%2Fv0.2.7/linkerd-await-v0.2.7-amd64 -O ./linkerd-await && chmod +x ./linkerd-await
COPY scripts/flink/linkerd-entrypoint.sh ./linkerd-entrypoint.sh
然后,仅对作业管理器覆盖入口点如下:
spec:
containers:
- name: flink-jobmanager
command:
- linkerd-entrypoint.sh # 上面定义的
或者,也可以使用LINKERD_DISABLED
或LINKERD_AWAIT_DISABLED
环境变量来绕过linkerd-await
包装。有关使用作业和Linkerd的更多信息,请参考以下资源:
- https://itnext.io/three-ways-to-use-linkerd-with-kubernetes-jobs-c12ccc6d4c7c(解决方案#3是在此解释的)
- https://github.com/linkerd/linkerd-await
此外,关于注释
config.linkerd.io/proxy-await: enabled
它只执行等待操作,但不执行关闭操作,所以如果我们打算手动运行linkerd-await --shutdown -- "$@"
,那么可以安全地删除该注释,因为它是多余的:
最后,关于:
java.lang.NoClassDefFoundError: Could not initialize class foo.bar.Job
让我澄清一下,这与Linkerd无关。这主要是一种配置错误,与以下内容相关:
基本上(具体细节无关紧要),任务管理器中缺少了一些环境变量。请注意,异常消息说“Could not initialize class foo.bar.Job”,这与“Could not find class...”不同。
对于混淆,我感到抱歉!
英文:
Answering to myself after having determined the root cause of the issue.
Regarding Linkerd, everything was correctly setup. The main precaution that one needs to take is adding the linkerd-await
binary to the Flink image and making sure to override the entrypoint for the jobmanager since otherwise you will run into issues when upgrading your jobs. The jobmanager won't kill the Linkerd proxy, and because of that it will hang around with NotReady
status. Again, that is easily solved by wrapping the main cmd in a linkerd-await
call. So, first add the linkerd-await
binary to your docker image:
# Add linkerd-await and linkerd-entrypoint.sh
USER root
RUN apt-get update && apt-get install -y wget
RUN wget https://github.com/linkerd/linkerd-await/releases/download/release%2Fv0.2.7/linkerd-await-v0.2.7-amd64 -O ./linkerd-await && chmod +x ./linkerd-await
COPY scripts/flink/linkerd-entrypoint.sh ./linkerd-entrypoint.sh
Then, for the jobmanager only, override the entrypoint like this:
spec:
containers:
- name: flink-jobmanager
command:
- linkerd-entrypoint.sh # defined above
Alternatively one could use the LINKERD_DISABLED
or LINKERD_AWAIT_DISABLED
env vars for bypassing the linkerd-await
wrapper. For more info on using jobs & Linkerd consult the following resources:
- https://itnext.io/three-ways-to-use-linkerd-with-kubernetes-jobs-c12ccc6d4c7c (solution #3 is the one explained here)
- https://github.com/linkerd/linkerd-await
Also, regarding the annotation
config.linkerd.io/proxy-await: enabled
, it does only the waiting but not the shutdown part, so if we are going to manually run linkerd-await --shutdown -- "$@"
anyway, that annotation can be safely removed since it's redundant:
Finally, regarding:
java.lang.NoClassDefFoundError: Could not initialize class foo.bar.Job
let me clarify that this had nothing to do with Linkerd. This was mostly a config error along the lines of:
Essentially (the specific details are irrelevant), there were some env vars missing in the taskmanager pods. Note that the exception message says "Could not initialize class foo.bar.Job" which is different from "Could not find class...".
Sorry for the confusion!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论