英文:
using time partitioning for bigquery load doesn't upload every row
问题
我尝试使用BigQuery的Python API客户端来上传一个大的数据框。上传成功,但是当指定时间分区时,只有一些行被上传。当省略时间分区时,所有行都被上传。
JOB_CONFIG:
job_config = bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE", autodetect=True,)
job_config.time_partitioning = bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")
数据框有120,532行,但在BigQuery中检查表的行时,只添加了62,433行到表中。我已经检查了数据框中的日期列,以确保每一行都有日期,我认为是这样的。
还有其他什么可能导致上传不完整的想法吗?
英文:
I'm attempting to use the BigQuery python API client for uploading a large dataframe. The upload works however when specifying a time partition only some rows are uploaded. When time partitioning is omitted all rows are uploaded.
JOB_CONFIG:
job_config = bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE",autodetect=True,)
job_config.time_partitioning = bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")
dataframe is 120,532 rows, when table rows are checked in BigQuery only 62,433 rows are added to a table. I've checked the date column in the DataFrame to make sure every row has a date and I believe that is the case.
Any other ideas on what could be causing an incomplete upload?
答案1
得分: 0
在BigQuery沙盒模式下,存在一个60天的分区过期限制。我相信在添加分区时,表格会被上传并限制为60天,而不提供任何消息来说明这一点。
英文:
in BigQuery sandbox mode there is a 60 day partition expiration limit. I believe that when adding in partitioning the table is uploaded limited to 60 days without providing any message that it does so.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论