2023年3月7日 13:52:34go评论118阅读模式

英文:

create view from athena table using program

问题

我有100多张Athena表，例如table1，table2等等。

是否有任何方法可以在Athena中创建一个通用视图，以便在自动创建任何表时（使用任何程序或脚本）也创建视图？

例如，如果我在Athena中创建任何表MyTable，那么也应创建视图v_MyTable。

英文:

I've 100 plus tables in Athena, for example table1, table2 and so on.

Is there any way I could create generic view once I have any table in Athena automatically (using any program or script)?

For example, If I create any table MyTable in Athena, so v_MyTable should be created as view too.

答案1

得分: 1

你可以使用AWS-Glue或AWS-Lambda来实现这个目的。我更倾向于使用AWS-Glue，类似下面的Glue作业示例一样进行每日调度，但你也可以设计自己的解决方案。AWS-Lambda支持文件上传触发器，但不支持文件夹上传触发器，所以无论如何你都应该使用调度。

1- 创建一个Delta表，用于保存所有同时具有其自己视图的表名。但当开始时没有表-视图对时，它当然是空的。（顺便说一句，你可以使用普通的Glue Catalog表而不是Delta表，每次覆盖它。但是，Delta允许在S3上使用插入-更新-删除）。

2- 在你的Glue作业上创建一个sqlContext。

spark_context = SparkContext()
glue_context = GlueContext(spark_context)
spark_session = glue_context.spark_session
sql_context = SQLContext(spark_session.sparkContext, spark_session)

3- 从所需的数据库中获取所有表名，类似下面的方式；

table_names_df = sqlContext.sql("show tables in <database_name>")

4- 从Delta表中获取具有相关视图的表名；

tables_with_view_df = sqlContext.sql("select * from <database_name.delta_table_name>")

5- 左连接这些表，以查找哪些表没有自己的视图。

tables_wo_view_df = (table_names_df.join(tables_with_view_df,
              table_names_df.table_name==tables_with_view_df.table_name2,
"leftouter")
.where(col("table_name2").isNull())
.drop("table_name2"))

6- 创建视图，但最好使用循环，不要像下面这样（不要忘记将tables_wo_view_df转换为列表。例如，你可以使用rdd.flatMap()）。

create_views_sql_script = """
CREATE VIEW <{database_name}.{table_name_wo_view}_view> AS
SELECT *
FROM <{database_name}.{table_name_wo_view}>
"""
ath = boto3.client('athena')
ath.start_query_execution(
        QueryString=create_views_sql_script,
        ResultConfiguration={'OutputLocation': 's3://.../'})

7- 最后使用tables_wo_view_df更新Delta表。

谢谢。

英文:

You can use AWS-Glue or AWS-Lambda for this purpose.
I'd prefer using AWS-Glue with daily schedule like below Glue Job example but you can design your own solution.
AWS-Lambda supports file-upload trigger but not folder-upload trigger so you should use a schedule anyway.

1- Create a delta-table which will keep all table names which have also its own view. But of course it will be empty at the beginning since there is no table-view pair at the beginning. (By the way you could use normal Glue Catalog table instead Delta Table by overwriting at each time. However, Delta provides of using Insert-Update-Delete on S3).

2- Create a sqlContext on your Glue Job.

spark_context = SparkContext()
glue_context = GlueContext(spark_context)
spark_session = glue_context.spark_session
sql_context = SQLContext(spark_session.sparkContext, spark_session)

3- Get all the table names from the database desired like below;

table_names_df = sqlContext.sql(&quot;show tables in &lt;database_name&gt;&quot;)

4- Get the table names which have its relevant view from the Delta table;

tables_with_view_df = sqlContext.sql(&quot;select * from &lt;database_name.delta_table_name&gt;&quot;)

5- Left join these tables to find out which tables don't have its own view.

tables_wo_view_df = (table_names_df.join(tables_with_view_df,
              table_names_df.table_name==tables_with_view_df.table_name2,
&quot;leftouter&quot;)
.where(col(&quot;table_name2&quot;).isNull())
.drop(&quot;table_name2&quot;))

6- Create the views but you'd better use a loop
unlike below (do not forget to convert tables_wo_view_df to a list. You can use rdd.flatMap() for example.

create_views_sql_script = &quot;&quot;&quot;
CREATE VIEW &lt;{database_name}.{table_name_wo_view}_view&gt; AS
SELECT *
FROM &lt;{database_name}.{table_name_wo_view}&gt;
&quot;&quot;&quot;
ath = boto3.client(&#39;athena&#39;)
ath.start_query_execution(
        QueryString=create_views_sql_script,
        ResultConfiguration={&#39;OutputLocation&#39;: &#39;s3://.../&#39;})

7- Finally update the Delta Table with tables_wo_view_df.

Thanks.

答案2

得分: 0

AWS Glue数据目录通过API调用可访问。

例如，可以调用get_tables()(其中包括Views)和create_table()。

您可以创建一个按计划运行的AWS Lambda函数。Lambda函数将调用API以查询现有的表（包括视图），并在必要时创建新视图。

英文:

The AWS Glue Data Catalog is accessible via API calls.

For example, there are calls to get_tables()(which includes Views) and create_table().

You could create an AWS Lambda function that runs on a schedule. The Lambda function would call the APIs to query the existing tables (including views) and create new views when necessary.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建视图从Athena表使用程序

问题

答案1

答案2

How to test api gateway endpoints without deploying in Java.

AWS CDK: VPC子网与另一个子网冲突

如何在使用AWS SES时在电子邮件中添加标签？

Golang和AWS：部署到AWS Elastic Beanstalk不起作用

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论