英文:
create table operator does not use correct mode for table columns
问题
作为Airflow中DAG的一部分,我使用了一个包含BigQueryCreateEmptyTableOperator任务的任务,用于创建一个新的空表,我将表的模式传递给'schema_fields'参数,作为一个包含模式的Python变量,该模式以列表形式存在。模式中的一些列的模式是'REQUIRED'。
DAG正常运行,表确实被创建,但当我在BigQuery中检查它时,表中的所有列的模式都被设置为'NULLABLE' - 应该是'REQUIRED'模式的列被忽略了。
这是我的代码示例:
create_table = BigQueryCreateEmptyTableOperator(
trigger_rule=TriggerRule.NONE_FAILED,
task_id="create_table",
dataset_id='my_dataset',
table_id="my_table",
project_id='my_project',
schema_fields=my_table_schema,
exists_ok=True
)
我的模式看起来像这样:
my_table_schema = [
{
"name": "ID_NO",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "PROD_NAME",
"mode": "NULLABLE",
"type": "STRING"
},
{
"name": "DESC",
"mode": "NULLABLE",
"type": "STRING"
}
]
因此,表被创建,并包含上述模式的列,但它们的模式都是'NULLABLE',甚至包括ID_NO列。
为什么模式中的模式被替换为'NULLABLE'呢?
英文:
As part of a DAG in Airflow, I'm using a task with a BigQueryCreateEmptyTableOperator to create a new, empty table, and I'm passing the table's schema in the 'schema_fields' argument, as a Python variable that contains the schema as a list. The mode of some columns in the schema is 'REQUIRED'
The DAG runs OK and the table is indeed created, but when I check it in BigQuery, all columns in the table have their mode set to 'NULLABLE' - the 'REQUIRED' mode has been ignored for the columns that should have it.
create_table = BigQueryCreateEmptyTableOperator(
trigger_rule=TriggerRule.NONE_FAILED,
task_id="create_table",
dataset_id = 'my_dataset',
table_id= "my_table",
project_id = 'my_project',
schema_fields= my_table_schema,
exists_ok=True
)
My schema looks like this:
my_table_schema = [
{
"name": "ID_NO",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "PROD_NAME",
"mode": "NULLABLE",
"type": "STRING"
},
{
"name": "DESC",
"mode": "NULLABLE",
"type": "STRING"
}
]
So the table gets created with the columns from the schema above, but they've all got NULLABLE mode, even the ID_NO column.
Why is the mode in the schema being replaced by NULLABLE?
答案1
得分: 1
我尝试了你的代码设置,对我来说运行正常。这可能是一些临时问题,请确保在Airflow环境中使用了正确的DAG文件。
这是我尝试过的内容:
with models.DAG(
DAG_ID,
schedule="@once",
start_date=datetime(2021, 1, 1),
catchup=False,
tags=["example", "bigquery"],
) as dag:
createtable1 = BigQueryCreateEmptyTableOperator(
task_id='createtable',
dataset_id='my-dataset',
table_id= 'mytablecomposer',
project_id = 'my-project',
schema_fields= [{"name": "ID_NO","type": "INTEGER","mode": "REQUIRED" },{"name":"PROD_NAME","mode": "NULLABLE","type": "STRING"},{"name": "DESC","mode": "NULLABLE","type": "STRING" }],
exists_ok=True
)
BigQuery表:
英文:
I tried your code setup and it is working fine for me. This might be some transient issue, do check if you're using the correct dag file in the Airflow environment.
Here is what I tried:
with models.DAG(
DAG_ID,
schedule="@once",
start_date=datetime(2021, 1, 1),
catchup=False,
tags=["example", "bigquery"],
) as dag:
createtable1 = BigQueryCreateEmptyTableOperator(
task_id='createtable',
dataset_id = 'my-dataset',
table_id= 'mytablecomposer',
project_id = 'my-project',
schema_fields= [{"name": "ID_NO","type": "INTEGER","mode": "REQUIRED" },{"name":"PROD_NAME","mode": "NULLABLE","type": "STRING"},{"name": "DESC","mode": "NULLABLE","type": "STRING" }],
exists_ok=True
)
BigQuery Table:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论