英文:
Error: k8s doesn't do anything after executing "kubectl create -f mypod.yaml"
问题
我是一名Kubernetes初学者,已经使用kubectl
命令创建Pod数个月了。然而,最近我遇到了一个问题,当我执行kubectl create -f mypod.yaml
命令后,Kubernetes没有创建Pod。当我运行kubectl get pods
时,mypod
不会出现在Pod列表中,我无法通过名称访问它,就好像它不存在一样。然而,如果我尝试再次创建它,我会收到一条消息,说该Pod已经被创建。
为了说明我的问题,让我给你一个例子。我经常使用名为tpcds-25-query.yaml
的YAML文件生成Pod。该文件的内容如下:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: tpcds-25-query
namespace: default
spec:
type: Scala
mode: cluster
image: registry.cn-beijing.aliyuncs.com/kube-ai/ack-spark-benchmark:1.0.1
imagePullPolicy: Always
sparkVersion: 2.4.5
mainClass: com.aliyun.spark.benchmark.tpcds.BenchmarkSQL
mainApplicationFile: "local:///opt/spark/jars/ack-spark-benchmark-assembly-0.1.jar"
arguments:
- "oss://spark/data/tpc-ds-data/150g"
- "oss://spark/result/tpcds-25-query"
- "/tmp/tpcds-kit/tools"
- "parquet"
- "150"
- "1"
- "false"
- "q1-v2.4,q11-v2.4,q14a-v2.4,q14b-v2.4,q16-v2.4,q17-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q28-v2.4,q29-v2.4,q4-v2.4,q49-v2.4,q5-v2.4,q51-v2.4,q64-v2.4,q74-v2.4,q75-v2.4,q77-v2.4,q78-v2.4,q80-v2.4,q9-v2.4"
- "true"
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
restartPolicy:
type: Never
timeToLiveSeconds: 86400
hadoopConf:
"fs.oss.impl": "OSSFileSystem"
"fs.oss.endpoint": "oss.com"
"fs.oss.accessKeyId": "DFDSMGDNDFMSNGDFMNGCU"
"fs.oss.accessKeySecret": "secret"
sparkConf:
"spark.kubernetes.allocation.batch.size": "200"
"spark.sql.adaptive.join.enabled": "true"
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "oss://spark/spark-events"
driver:
cores: 4
memory: "8192m"
labels:
version: 2.4.5
spark-app: spark-tpcds
role: driver
serviceAccount: spark
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
executor:
cores: 48
instances: 1
memory: "160g"
memoryOverhead: "16g"
labels:
version: 2.4.5
role: executor
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
当我执行kubectl create --validate=false -f tpcds-25-query.yaml
命令时,Kubernetes返回以下信息:
sparkapplication.sparkoperator.k8s.io/tpcds-25-query created
这意味着Pod已经创建。然而,当我执行kubectl get pods
时,它给出了以下信息:
No resources found in default namespace.
当我再次创建Pod时,它给出了以下信息:
Error from server (AlreadyExists): error when creating "tpcds-25-query.yaml": sparkapplications.sparkoperator.k8s.io "tpcds-25-query" already exists
我知道选项-v=8
可以打印更详细的日志。所以我执行了kubectl create --validate=false -f tpcds-25-query.yaml -v=8
命令,其输出如下:
I0219 05:50:17.121661 2148722 loader.go:372] Config loaded from file: /root/.kube/config
I0219 05:50:17.124735 2148722 round_trippers.go:432] GET https://172.16.0.212:6443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0219 05:50:17.124747 2148722 round_trippers.go:438] Request Headers:
...
(日志继续)
...
I0219 05:50:17.136050 2148722 request.go:1181] Response Body: service unavailable
I0219 05:50:17.135255 2148722 request.go:1372] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:"apiVersion,omitempty""; Kind string "json:"kind,omitempty"" }
I0219 05:50:17.135265 2148722 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0219 05:50:17.136050 2148722 request.go:1181] Request Body: {...}
...
(日志继续)
...
从日志中,我们可以看到唯一的错误是"Response Status: 503 Service Unavailable in 8 milliseconds",我不知道这是什么意思。
所以我想问,可能是什么原因导致这个问题,我应该如何诊断这个问题?感谢任何帮助!
英文:
I am a beginner in Kubernetes and have been using the kubectl command to create pods for several months. However, I recently encountered a problem where Kubernetes did not create a pod after I executed the kubectl create -f mypod.yaml
command. When I run kubectl get pods, the mypod does not appear in the list of pods and I am unable to access it by name as if it does not exist. However, if I try to create it again, I receive a message saying that the pod has already been created.
To illustrate my point, let me give you an example. I frequently generate pods using a YAML file called tpcds-25-query.yaml. The contents of this file are as follows:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: tpcds-25-query
namespace: default
spec:
type: Scala
mode: cluster
image: registry.cn-beijing.aliyuncs.com/kube-ai/ack-spark-benchmark:1.0.1
imagePullPolicy: Always
sparkVersion: 2.4.5
mainClass: com.aliyun.spark.benchmark.tpcds.BenchmarkSQL
mainApplicationFile: "local:///opt/spark/jars/ack-spark-benchmark-assembly-0.1.jar"
arguments:
# TPC-DS data localtion
- "oss://spark/data/tpc-ds-data/150g"
# results location
- "oss://spark/result/tpcds-25-query"
# Path to kit in the docker image
- "/tmp/tpcds-kit/tools"
# Data Format
- "parquet"
# Scale factor (in GB)
- "150"
# Number of iterations
- "1"
# Optimize queries
- "false"
# Filter queries, will run all if empty - "q70-v2.4,q82-v2.4,q64-v2.4"
- "q1-v2.4,q11-v2.4,q14a-v2.4,q14b-v2.4,q16-v2.4,q17-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q28-v2.4,q29-v2.4,q4-v2.4,q49-v2.4,q5-v2.4,q51-v2.4,q64-v2.4,q74-v2.4,q75-v2.4,q77-v2.4,q78-v2.4,q80-v2.4,q9-v2.4"
# Logging set to WARN
- "true"
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
restartPolicy:
type: Never
timeToLiveSeconds: 86400
hadoopConf:
# OSS
"fs.oss.impl": "OSSFileSystem"
"fs.oss.endpoint": "oss.com"
"fs.oss.accessKeyId": "DFDSMGDNDFMSNGDFMNGCU"
"fs.oss.accessKeySecret": "secret"
sparkConf:
"spark.kubernetes.allocation.batch.size": "200"
"spark.sql.adaptive.join.enabled": "true"
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "oss://spark/spark-events"
driver:
cores: 4
memory: "8192m"
labels:
version: 2.4.5
spark-app: spark-tpcds
role: driver
serviceAccount: spark
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
executor:
cores: 48
instances: 1
memory: "160g"
memoryOverhead: "16g"
labels:
version: 2.4.5
role: executor
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
After I executed kubectl create --validate=false -f tpcds-25-query.yaml
command, k8s returns this:
sparkapplication.sparkoperator.k8s.io/tpcds-25-query created
which means the pod has been created. However, when I executed kubectl get pods
, it gave me this:
No resources found in default namespace.
When I created the pod again, it gave me this:
Error from server (AlreadyExists): error when creating "tpcds-25-query.yaml": sparkapplications.sparkoperator.k8s.io "tpcds-25-query" already exists
I know the option -v=8
can print more detailed logs. So I execute the command kubectl create --validate=false -f tpcds-25-query.yaml -v=8
, its output is:
I0219 05:50:17.121661 2148722 loader.go:372] Config loaded from file: /root/.kube/config
I0219 05:50:17.124735 2148722 round_trippers.go:432] GET https://172.16.0.212:6443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0219 05:50:17.124747 2148722 round_trippers.go:438] Request Headers:
I0219 05:50:17.124753 2148722 round_trippers.go:442] Accept: application/json, */*
I0219 05:50:17.124759 2148722 round_trippers.go:442] User-Agent: kubectl/v1.22.3 (linux/amd64) kubernetes/9377577
I0219 05:50:17.132864 2148722 round_trippers.go:457] Response Status: 503 Service Unavailable in 8 milliseconds
I0219 05:50:17.132876 2148722 round_trippers.go:460] Response Headers:
I0219 05:50:17.132881 2148722 round_trippers.go:463] X-Kubernetes-Pf-Prioritylevel-Uid: e75a0286-dd47-4533-a65c-79d95dac5bb1
I0219 05:50:17.132890 2148722 round_trippers.go:463] Content-Length: 20
I0219 05:50:17.132894 2148722 round_trippers.go:463] Date: Sun, 19 Feb 2023 05:50:17 GMT
I0219 05:50:17.132898 2148722 round_trippers.go:463] Audit-Id: 3ab06f73-0c88-469a-834d-54ec06e910f1
I0219 05:50:17.132902 2148722 round_trippers.go:463] Cache-Control: no-cache, private
I0219 05:50:17.132906 2148722 round_trippers.go:463] Content-Type: text/plain; charset=utf-8
I0219 05:50:17.132909 2148722 round_trippers.go:463] X-Content-Type-Options: nosniff
I0219 05:50:17.132913 2148722 round_trippers.go:463] X-Kubernetes-Pf-Flowschema-Uid: 7f136704-82ad-4f6c-8c86-b470a972fede
I0219 05:50:17.134365 2148722 request.go:1181] Response Body: service unavailable
I0219 05:50:17.135255 2148722 request.go:1372] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0219 05:50:17.135265 2148722 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0219 05:50:17.136050 2148722 request.go:1181] Request Body: {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"name":"tpcds-25-query","namespace":"default"},"spec":{"arguments":["oss://lfpapertest/spark/data/tpc-ds-data/150g","oss://lfpapertest/spark/result/tpcds-runc-150g-48core-160g-1pod-25-query","/tmp/tpcds-kit/tools","parquet","150","1","false","q1-v2.4,q11-v2.4,q14a-v2.4,q14b-v2.4,q16-v2.4,q17-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q28-v2.4,q29-v2.4,q4-v2.4,q49-v2.4,q5-v2.4,q51-v2.4,q64-v2.4,q74-v2.4,q75-v2.4,q77-v2.4,q78-v2.4,q80-v2.4,q9-v2.4","true"],"dnsPolicy":"ClusterFirstWithHostNet","driver":{"cores":4,"labels":{"role":"driver","spark-app":"spark-tpcds","version":"2.4.5"},"memory":"8192m","nodeSelector":{"beta.kubernetes.io/instance-type":"ecs.g6.13xlarge"},"serviceAccount":"spark"},"executor":{"cores":48,"instances":1,"labels":{"role":"executor","version":"2.4.5"},"memory":"160g","memoryOverhead":"16g","nodeSelector":{"beta.kubernetes.io/instance-type":"ecs.g6.13xlarge"}},"hadoopConf":{"fs.oss.acce [truncated 802 chars]
I0219 05:50:17.136091 2148722 round_trippers.go:432] POST https://172.16.0.212:6443/apis/sparkoperator.k8s.io/v1beta2/namespaces/default/sparkapplications?fieldManager=kubectl-create
I0219 05:50:17.136098 2148722 round_trippers.go:438] Request Headers:
I0219 05:50:17.136104 2148722 round_trippers.go:442] Accept: application/json
I0219 05:50:17.136108 2148722 round_trippers.go:442] Content-Type: application/json
I0219 05:50:17.136113 2148722 round_trippers.go:442] User-Agent: kubectl/v1.22.3 (linux/amd64) kubernetes/9377577
I0219 05:50:17.144313 2148722 round_trippers.go:457] Response Status: 201 Created in 8 milliseconds
I0219 05:50:17.144327 2148722 round_trippers.go:460] Response Headers:
I0219 05:50:17.144332 2148722 round_trippers.go:463] X-Kubernetes-Pf-Prioritylevel-Uid: e75a0286-dd47-4533-a65c-79d95dac5bb1
I0219 05:50:17.144337 2148722 round_trippers.go:463] Content-Length: 2989
I0219 05:50:17.144341 2148722 round_trippers.go:463] Date: Sun, 19 Feb 2023 05:50:17 GMT
I0219 05:50:17.144345 2148722 round_trippers.go:463] Audit-Id: 8eef9d08-04c0-44f7-87bf-e820853cd9c6
I0219 05:50:17.144349 2148722 round_trippers.go:463] Cache-Control: no-cache, private
I0219 05:50:17.144352 2148722 round_trippers.go:463] Content-Type: application/json
I0219 05:50:17.144356 2148722 round_trippers.go:463] X-Kubernetes-Pf-Flowschema-Uid: 7f136704-82ad-4f6c-8c86-b470a972fede
I0219 05:50:17.144396 2148722 request.go:1181] Response Body: {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"creationTimestamp":"2023-02-19T05:50:17Z","generation":1,"managedFields":[{"apiVersion":"sparkoperator.k8s.io/v1beta2","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:arguments":{},"f:driver":{".":{},"f:cores":{},"f:labels":{".":{},"f:role":{},"f:spark-app":{},"f:version":{}},"f:memory":{},"f:nodeSelector":{".":{},"f:beta.kubernetes.io/instance-type":{}},"f:serviceAccount":{}},"f:executor":{".":{},"f:cores":{},"f:instances":{},"f:labels":{".":{},"f:role":{},"f:version":{}},"f:memory":{},"f:memoryOverhead":{},"f:nodeSelector":{".":{},"f:beta.kubernetes.io/instance-type":{}}},"f:hadoopConf":{".":{},"f:fs.oss.accessKeyId":{},"f:fs.oss.accessKeySecret":{},"f:fs.oss.endpoint":{},"f:fs.oss.impl":{}},"f:image":{},"f:imagePullPolicy":{},"f:mainApplicationFile":{},"f:mainClass":{},"f:mode":{},"f:restartPolicy":{".":{},"f:type":{}},"f:sparkConf":{".":{},"f:spark.eventLog.dir":{},"f:spark.eventLog.enabled":{},"f:spark.kubernetes. [truncated 1965 chars]
sparkapplication.sparkoperator.k8s.io/tpcds-25-query created
From the logs, we can see the only error "Response Status: 503 Service Unavailable in 8 milliseconds", I don't know what it means.
So I want to ask what may cause this, and how would I diagnose the problem? Any help is appreciated!
答案1
得分: 1
以下是翻译好的部分:
"可能有多个原因导致这个问题,首先让我们检查一下是否真的已经创建了 Pod。像 ehmad11
建议的那样,可以使用 kubectl get pods --all-namespaces
来列出所有命名空间中的 Pod。然而,在您的情况下,可能不起作用,因为您的应用程序是直接部署在默认命名空间中。关于错误消息“响应状态:503 服务不可用,8 毫秒”,一旦您能够找到 Pod,使用 kubectl describe <pod>
来查找与您的 Pod 相关的日志,并按照此文档中提供的故障排除步骤来解决问题。
注意: 参考文档是从 komodor
网站提供的,他们在文档中详细而易于理解地阐述了每个故障排除步骤。
英文:
There might be multiple reasons for this, initially let’s check whether the pod is really created or not. Like ehmad11
suggested use kubectl get pods --all-namespaces
for listing pods in all the namespaces. However in your case it might not work because your application is getting directly deployed in defaulf namespace. Regarding the error “Response Status: 503 Service Unavailable in 8 milliseconds” once you are able to locate the pod use kubectl describe <pod>
for finding logs specific to your pod and follow the troubleshooting steps provided in this document for rectifying it.
Note: The reference document is provided from komodor
site, here they have articulated each troubleshooting step in highly detailed and understandable manner.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论