英文:
How to schedule or automate dataset refresh in aws quicksight
问题
有哪些选项可用于安排或自动刷新QuickSight SPICE数据集?
是否有可用于自动化刷新SPICE数据集的API?最好使用Python。
英文:
What are the options available to schedule or automate refresh of a quicksight SPICE dataset?
Are there any APIs available to automate spice datatset refresh? preferably using python.
答案1
得分: 14
- 使用boto3最新版本中可用的API服务
使用'create_ingestion'方法来启动数据集刷新,并使用'describe_ingestion'来检查刷新的状态
import boto3
import time
import sys
client = boto3.client('quicksight')
response = client.create_ingestion(DataSetId='<dataset-id>', IngestionId='<ingestion-id>', AwsAccountId='<aws-account-id>')
while True:
response = client.describe_ingestion(DataSetId='<dataset-id>', IngestionId='<ingestion-id>', AwsAccountId='<aws-account-id>')
if response['Ingestion']['IngestionStatus'] in ('INITIALIZED', 'QUEUED', 'RUNNING'):
time.sleep(10) # 根据数据集大小调整休眠时间
elif response['Ingestion']['IngestionStatus'] == 'COMPLETED':
print("刷新完成。RowsIngested {0}, RowsDropped {1}, IngestionTimeInSeconds {2}, IngestionSizeInBytes {3}".format(
response['Ingestion']['RowInfo']['RowsIngested'],
response['Ingestion']['RowInfo']['RowsDropped'],
response['Ingestion']['IngestionTimeInSeconds'],
response['Ingestion']['IngestionSizeInBytes']))
break
else:
print("刷新失败! - 状态 {0}".format(response['Ingestion']['IngestionStatus']))
sys.exit(1)
数据集的DataSetId可以从AWS URI中找到,或者使用'list_data_sets'方法列出所有数据集,并从字段['DataSetSummaries']['DataSetId']的方法调用响应中获取DataSetId。
IngestionId - 设置唯一的ID,我使用了当前时间的epoch值[str(int(time.time()))]
- 使用quicksight数据集中的调度选项进行刷新
您可以使用quicksight-dataset中的调度选项来安排“小时”,“每天”,“每周”或“每月”的刷新
英文:
You have two options,
- Using API services available in the latest version of boto3
Use 'create_ingestion' method to initiate dataset refresh, and use 'describe_ingestion' to check the status of refresh
import boto3
import time
import sys
client = boto3.client('quicksight')
response = client.create_ingestion(DataSetId='<dataset-id>',IngestionId='<ingetion-id>',AwsAccountId='<aws-account-id>')
while True:
response = client.describe_ingestion(DataSetId='<dataset-id>',IngestionId='<ingetion-id>',AwsAccountId='<aws-account-id>')
if response['Ingestion']['IngestionStatus'] in ('INITIALIZED', 'QUEUED', 'RUNNING'):
time.sleep(10) #change sleep time according to your dataset size
elif response['Ingestion']['IngestionStatus'] == 'COMPLETED':
print("refresh completed. RowsIngested {0}, RowsDropped {1}, IngestionTimeInSeconds {2}, IngestionSizeInBytes {3}".format(
response['Ingestion']['RowInfo']['RowsIngested'],
response['Ingestion']['RowInfo']['RowsDropped'],
response['Ingestion']['IngestionTimeInSeconds'],
response['Ingestion']['IngestionSizeInBytes']))
break
else:
print("refresh failed! - status {0}".format(response['Ingestion']['IngestionStatus']))
sys.exit(1)
DataSetId of dataset can be found from aws URI or use 'list_data_sets' method to list all datasets and get DataSetId from the field ['DataSetSummaries']['DataSetId'] method call response
IngestionId - set unique id, I used current time in epoch [str(int(time.time()))]
- Schedule refresh using schedule option in quicksight dataset
You can schedule refreshes for 'hourly', 'daily', 'weekly' or 'monthly' cadence using schedule option in quicksight-dataset
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论