英文:
Executing query (multiple/ subquery) in HIVE is giving Vertex Failed
问题
我对HIVE和SQL查询有基本的了解。因此,我有多个单独的查询或者可以说子查询的查询。但是,如果我同时运行5-6个查询,它可以正常工作,但如果我运行整个查询(99个单独的查询),它会给我一个"Vertex Failed Error"错误。
查询:
Select "TABLE_NAME_1" as TBL_NAME, (Select count(*) from a_t1 where BDATE='2023-05-31') as a, (Select count(*) from b_t1 where BDATE='2023-05-31') as b, (Select count(*) from c_t1 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_2" as TBL_NAME, (Select count(*) from a_t2 where BDATE='2023-05-31') as a, (Select count(*) from b_t2 where BDATE='2023-05-31') as b, (Select count(*) from c_t2 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_3" as TBL_NAME, (Select count(*) from a_t3 where BDATE='2023-05-31') as a, (Select count(*) from b_t3 where BDATE='2023-05-31') as b, (Select count(*) from c_t3 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_4" as TBL_NAME, (Select count(*) from a_t4 where BDATE='2023-05-31') as a, (Select count(*) from b_t4 where BDATE='2023-05-31') as b, (Select count(*) from c_t4 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_5" as TBL_NAME, (Select count(*) from a_t5 where BDATE='2023-05-31') as a, (Select count(*) from b_t5 where BDATE='2023-05-31') as b, (Select count(*) from c_t5 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_6" as TBL_NAME, (Select count(*) from a_t6 where BDATE='2023-05-31') as a, (Select count(*) from b_t6 where BDATE='2023-05-31') as b, (Select count(*) from c_t6 where BDATE='2023-05-31') as c UNION ALL
.
.
.
.
.
Select "TABLE_NAME_33" as TBL_NAME, (Select count(*) from a_t33 where BDATE='2023-05-31') as a, (Select count(*) from b_t33 where BDATE='2023-05-31') as b, (Select count(*) from c_t33 where BDATE='2023-05-31') as c
不确定这是否是HIVE数据库的限制,或者我是否遗漏了什么。
注意:我还使用了以下设置:
set tez.queue.name = APPNAME;
set hive.execution.engine = tez;
已编辑
我想要的输出是:
TBL_NAME a b c
TABLE_NAME1 10 20 15
TABLE_NAME2 10 20 15
TABLE_NAME3 10 20 15
.
.
.
TABLE_NAME33 10 20 15
谢谢。
英文:
I have a basic understanding of HIVE and SQL Query. So the query that i have has multiple individual query or you can say sub query. However, if i run 5-6 query together it is working fine but if i run the whole query (99 individual query) it is giving me Vertex Failed Error
Query:
Select "TABLE_NAME_2" as TBL_NAME, (Select count(*) from a_t2 where BDATE='2023-05-31') as a, (Select count(*) from b_t2 where BDATE='2023-05-31') as b, (Select count(*) from c_t2 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_3" as TBL_NAME, (Select count(*) from a_t3 where BDATE='2023-05-31') as a, (Select count(*) from b_t3 where BDATE='2023-05-31') as b, (Select count(*) from c_t3 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_4" as TBL_NAME, (Select count(*) from a_t4 where BDATE='2023-05-31') as a, (Select count(*) from b_t4 where BDATE='2023-05-31') as b, (Select count(*) from c_t4 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_5" as TBL_NAME, (Select count(*) from a_t5 where BDATE='2023-05-31') as a, (Select count(*) from b_t5 where BDATE='2023-05-31') as b, (Select count(*) from c_t5 where BDATE='2023-05-31') as c UNION ALL
Select "TABLE_NAME_6" as TBL_NAME, (Select count(*) from a_t6 where BDATE='2023-05-31') as a, (Select count(*) from b_t6 where BDATE='2023-05-31') as b, (Select count(*) from c_t6 where BDATE='2023-05-31') as c UNION ALL
.
.
.
.
.
Select "TABLE_NAME_33" as TBL_NAME, (Select count(*) from a_t33 where BDATE='2023-05-31') as a, (Select count(*) from b_t33 where BDATE='2023-05-31') as b, (Select count(*) from c_t33 where BDATE='2023-05-31') as c
Not sure if this is the limitation of HIVE DB or if i am missing something.
Note: I have also used
set tez.queue.name = APPNAME;
set hive.execution.engine = tez;
EDITED
Output i want is:
TBL_NAME a b c
TABLE_NAME1 10 20 15
TABLE_NAME2 10 20 15
TABLE_NAME3 10 20 15
.
.
.
TABLE_NAME33 10 20 15
Thanks
答案1
得分: 2
我可以给你一些解决方案。就像 @nbk 说的那样,语法是错误的,这种内联 SQL 在 Hive 中不起作用。但这将是一个昂贵的解决方案 - 尝试在这个表中寻找一些分区,并查看是否可以统计分区行数。
以下是您可以尝试的内容 -
WITH
TABLE_NAME_2_a as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from a_t2 ,
TABLE_NAME_2_b as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from b_t2 ,
TABLE_NAME_2_c as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from c_t2 ,
TABLE_NAME_3_a as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from a_t3 ,
TABLE_NAME_3_b as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from b_t3 ,
TABLE_NAME_3_c as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from c_t3
--UNION for Table 2
select
TABLE_NAME_2_a.TBL_NAME, TABLE_NAME_2_a.a, TABLE_NAME_2_b.b,TABLE_NAME_2_c.c
from
TABLE_NAME_2_a,TABLE_NAME_2_b,TABLE_NAME_2_c
where
TABLE_NAME_2_a.TBL_NAME=TABLE_NAME_2_b.TBL_NAME and TABLE_NAME_2_a.TBL_NAME=TABLE_NAME_2_c.TBL_NAME
union all
--UNION for Table 3
select
TABLE_NAME_3_a.TBL_NAME, TABLE_NAME_3_a.a, TABLE_NAME_3_b.b,TABLE_NAME_3_c.c
from
TABLE_NAME_3_a,TABLE_NAME_3_b,TABLE_NAME_3_c
where
TABLE_NAME_3_a.TBL_NAME=TABLE_NAME_3_b.TBL_NAME and TABLE_NAME_3_a.TBL_NAME=TABLE_NAME_3_c.TBL_NAME
英文:
i can give you some solution. Like @nbk said syntax is wrong and this kind of inline SQL doesnt wotk in hive. But this is going to be an expensive solution - seek for some partitions in this table and see if you can count partitioned rows.
Here is something you can try -
WITH
TABLE_NAME_2_a as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from a_t2 ,
TABLE_NAME_2_b as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from b_t2 ,
TABLE_NAME_2_c as Select "TABLE_NAME_2" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from c_t2 ,
TABLE_NAME_3_a as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from a_t3 ,
TABLE_NAME_3_b as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from b_t3 ,
TABLE_NAME_3_c as Select "TABLE_NAME_3" as TBL_NAME, SUM(case when BDATE='2023-05-31' then 1 else 0 end) as a from c_t3
--UNION for Table 2
select
TABLE_NAME_2_a.TBL_NAME, TABLE_NAME_2_a.a, TABLE_NAME_2_b.b,TABLE_NAME_2_c.c
from
TABLE_NAME_2_a,TABLE_NAME_2_b,TABLE_NAME_2_c
where
TABLE_NAME_2_a.TBL_NAME=TABLE_NAME_2_b.TBL_NAME and TABLE_NAME_2_a.TBL_NAME=TABLE_NAME_2_c.TBL_NAME
union all
--UNION for Table 3
select
TABLE_NAME_3_a.TBL_NAME, TABLE_NAME_3_a.a, TABLE_NAME_3_b.b,TABLE_NAME_3_c.c
from
TABLE_NAME_3_a,TABLE_NAME_3_b,TABLE_NAME_3_c
where
TABLE_NAME_3_a.TBL_NAME=TABLE_NAME_3_b.TBL_NAME and TABLE_NAME_3_a.TBL_NAME=TABLE_NAME_3_c.TBL_NAME
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论