英文:
How to properly kill a running batch dataproc job?
问题
以下是翻译好的部分:
-
What is the best way to kill the job?
最好的方法是终止作业的方式是? -
Am I still being charged once I canceled the running job?
一旦我取消了正在运行的作业,我是否仍然会被收费?
英文:
I had run a long-running batch job in DataProc Serverless. After some time of running, I figured out that running the job any longer was a waste of time and money, and I wanted to stop it.
I couldn't find a way to kill the job. However, there were two other ways.
- Cancel the batch
- Delete the batch
Initially, I used the first option, and I cancelled the job using:
gcloud dataproc batches cancel BATCH --region=REGION
On the dataproc batch console, it showed the job got cancelled, and I also saw the DCU and shuffle storage usage.
But the surprising point is, I can see the job is still running after one day on the spark history server.
After this, I thought of going with the second option to delete the batch job, and I ran this command.
gcloud dataproc batches delete BATCH --region=REGION
This removed the batch entry from the dataproc batch console, but the job is still seen to be running through the spark history server.
My query is:
- What is the best way to kill the job?
- Am I still being charged once I canceled the running job?
答案1
得分: 2
Spark和Spark历史服务器存在的已知缺陷是您所观察到的情况。Spark仅将成功完成的Spark应用标记为已完成,而将失败/取消的Spark应用保留在进行中/未完成的状态中(https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options):
> 3. 没有注册为已完成的应用程序将被列为未完成 - 即使它们不再运行。如果应用程序崩溃,就可能会发生这种情况。
要监视批处理作业状态,您需要使用Dataproc API - 如果Dataproc API/UI显示批处理作业状态为CANCELLED
,这意味着它不再运行,而不管Spark历史服务器中的Spark应用程序状态如何。
英文:
What you are observing is a known shortcoming of the Spark and Spark History Server. Spark marks only successfully finished Spark applications as completed and leaves failed/cancelled Spark applications in the in-progress/incomplete state (https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options):
> 3. Applications which exited without registering themselves as completed will be listed as incomplete —even though they are no longer running. This can happen if an application crashes.
To monitor batch job state you need to use Dataproc API - if Dataproc API/UI shows that state of the batch job is CANCELLED
, it means that it does not run anymore, regardless of the Spark application status in the Spark History Server.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论