使用时间分区来进行BigQuery加载不会上传每一行。

huangapple go评论55阅读模式
英文:

using time partitioning for bigquery load doesn't upload every row

问题

我尝试使用BigQuery的Python API客户端来上传一个大的数据框。上传成功,但是当指定时间分区时,只有一些行被上传。当省略时间分区时,所有行都被上传。

JOB_CONFIG:

job_config = bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE", autodetect=True,)
job_config.time_partitioning = bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")

数据框有120,532行,但在BigQuery中检查表的行时,只添加了62,433行到表中。我已经检查了数据框中的日期列,以确保每一行都有日期,我认为是这样的。

还有其他什么可能导致上传不完整的想法吗?

英文:

I'm attempting to use the BigQuery python API client for uploading a large dataframe. The upload works however when specifying a time partition only some rows are uploaded. When time partitioning is omitted all rows are uploaded.

JOB_CONFIG:

job_config = bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE",autodetect=True,)
job_config.time_partitioning = bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")

dataframe is 120,532 rows, when table rows are checked in BigQuery only 62,433 rows are added to a table. I've checked the date column in the DataFrame to make sure every row has a date and I believe that is the case.

Any other ideas on what could be causing an incomplete upload?

答案1

得分: 0

在BigQuery沙盒模式下,存在一个60天的分区过期限制。我相信在添加分区时,表格会被上传并限制为60天,而不提供任何消息来说明这一点。

英文:

in BigQuery sandbox mode there is a 60 day partition expiration limit. I believe that when adding in partitioning the table is uploaded limited to 60 days without providing any message that it does so.

huangapple
  • 本文由 发表于 2023年6月1日 02:18:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376309.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定