英文:
Bigquery.go export job much slower than WebGUI
问题
我正在使用bigquery.go库。
在调查性能时,我发现从客户端开始的导出(.csv到GCS)作业(仅导出作业)平均需要约60秒,而从WebGUI开始的相同作业只需要约20秒。这可能是什么原因?
以下是代码:
time1 := time.Now()
job_extract, err := extractor.Run(ctx)
if err != nil {
return err
}
status, err = job_extract.Wait(ctx)
if err != nil {
return err
}
if status.Err() != nil {
log.Fatalf("作业失败,错误信息:%v", status.Err())
return status.Err()
}
time2 := time.Since(time1)
英文:
I'm using the bigquery.go library.
While investigating some of the performance I have found that my export (.csv to GCS) job (and export job only) started from the client takes about 60seconds on average while the same job started from the WebGUI takes about 20seconds. What could be the reason for this?
The code is the following:
time1 := time.Now()
job_extract, err := extractor.Run(ctx)
if err != nil {
return err
}
status, err = job_extract.Wait(ctx)
if err != nil {
return err
}
if status.Err() != nil {
log.Fatalf("Job failed with error %v", status.Err())
return status.Err()
}
time2 := time.Since(time1)
答案1
得分: 3
WEB UI通常具有轮询机制,用于检查作业是否已完成,因此您可能会看到较长的时间。通常,导出到GCS的文件会比作业在Web UI中实际完成的时间更早出现。
为了确保并查看确切的时间,请使用cli工具
获取最新的作业:
bq ls -j -a --max_results=15
运行此命令将显示一个包含作业ID和相应时间的表格。
您可以通过这种方式检查提取作业的持续时间。如果您确实验证了这是一个问题,请将这样的表格发布在您的问题中,因为稍后会有Google工程师对其进行检查。但是,如果没有正确的详细信息,我们只能假设您的测量结果是错误的。
英文:
The WEB UI usually has a polling mechanism to check when a job has finished, so you might see longer times. Usually the files exported to GCS appear sooner than the job actually finishes in the WebUI.
To make sure and see exact timing, please use the cli tool
to obtain the most recent jobs:
bq ls -j -a --max_results=15
running this would display a table with job ids and the respective timing.
jobId Job Type State Start Time Duration
--------------------------------- ---------- --------- ----------------- ----------
bquijob_1864e679_15a84d8878a query SUCCESS 28 Feb 07:11:06 0:00:04
bquijob_770b512_15a84d8122c query FAILURE 28 Feb 07:10:35 0:00:00
bquijob_de0df03_15a84d6a4fa query FAILURE 28 Feb 07:09:02 0:00:00
bquijob_52c4f7d7_15a84d660e6 query FAILURE 28 Feb 07:08:44 0:00:00
bquijob_76a2c1be_15a84d5e769 query FAILURE 28 Feb 07:08:13 0:00:00
bquijob_7f51dde5_15a84d55afb query SUCCESS 28 Feb 07:07:41 0:00:08
bquijob_34f25864_15a84d50503 query SUCCESS 28 Feb 07:07:18 0:00:08
job_Ca0cuRTAjY7MEHAs7vTJMxtVYTs query SUCCESS 28 Feb 07:00:47 0:00:09
job_hHfmcdwyBsPsYF5dDvvOdR1Rmd0 load SUCCESS 28 Feb 07:00:26 0:00:20
job_mkiLf_mFHLKSplGJOtg-XDKzvv4 load SUCCESS 28 Feb 02:52:50 0:00:02
job_3RsPvttxWwv3SzVoOI9Cv_2yWtA query SUCCESS 27 Feb 21:18:40 0:00:08
job_JLsqJO0NEIlKNac6jkDWbwneGMg extract SUCCESS 27 Feb 11:35:04 0:00:17
job_KOS7vKX4aX0FNbK6dibE7cxzcQA query SUCCESS 27 Feb 11:33:44 0:00:37
bquijob_44046bec_15a802f703a query SUCCESS 27 Feb 09:27:48 0:00:07
job_2qQ6YSWeXaP2y2doONQJsIoga3c query SUCCESS 27 Feb 08:53:20 0:00:06
You would be able this way to check the extract job DURATION. In case you validate indeed it's a problem, please post such a table back in your question as sooner/later a Google engineer will check it. But without proper details, we can just assume your measurement is wrong.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论