创建表,如在Impala数据库中的百分比子查询。

huangapple go评论62阅读模式
英文:

Create table as select percentage subquery in Impala DB

问题

我是Impala的新手,我需要创建一个带有选择结果集的表,而且这个SQL语句是在使用JDBC的Java中运行的,请看下面的查询:

create table if not exists my_temp_table as select 
41 as rule_id, 49 as record_id,
(select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$' )/(select count(1) from dirty_table);

我需要创建表my_temp_table并将数据插入到这个表中,这是一个我需要运行的SQL语句。但是运行失败并显示以下错误:

[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error

经过检查,我知道Impala不支持SELECT子查询,我们只能在FROMWHERE子句中使用子查询,请参阅Impala文档:https://impala.apache.org/docs/build/html/topics/impala_subqueries.html

所以对于这个问题,我该如何解决呢?

我的想法:

  1. 将SQL更新为可执行的样式,我尝试了类似下面的带有WITH的SQL,它可以工作,但不能在CREATE TABLE ... AS ...中使用。
WITH q1 AS (
  select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$'
),
q2 AS (
  select count(1) val2 from dirty_table
)
SELECT 100 * q1.val / q2.val2  result
FROM q1, q2
  1. 或者,是否有类似于MySQL或Oracle中的BEGIN ... END语句,然后我可以单独运行这个SQL。
英文:

I'm newbie of Impala, i need to create table with select resultset, also, this sql is run in Java using JDBC, see my below query:

create table if not exists my_temp_table as select 
41 as rule_id,49 as record_id,
(select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$' )/(select count(1) from dirty_table);

I need to create table my_temp_table and insert data into this table, this is one SQL that i need to run. But it runs failed and gives errer as below:

[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error

After checking, i know Impala doesn't support SELECT clause subquery, we can only use subquery
in FROM or WHERE clause, see Impala docs: https://impala.apache.org/docs/build/html/topics/impala_subqueries.html.

So for this question how can i do to solve this problem.

My thought:

  1. update sql to let it execute, I tried WITH like below sql, it works but can't be used in
    CREATE TABLE ... AS ....
    WITH q1 AS (
      select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$'
    ),
    q2 AS (
      select count(1) val2 from dirty_table
    )
    SELECT 100 * q1.val / q2.val2  result
    FROM q1, q2
  1. or, is there any statement like BEGIN ... END in MySQL or Oracle, then i can run this sql separately.

答案1

得分: 1

使用您提供的示例,我会尝试以下几种方法,我认为这些方法可能会很好地工作。

我通过Impala检查了解决方案。

第一个示例:

CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
     (SELECT COUNT(1) AS val2 FROM dirty_table) b;
SELECT * FROM my_temp_table;

第二个示例:

DROP TABLE my_temp_table;

CREATE TABLE IF NOT EXISTS my_temp_table AS 
SELECT result FROM
    (WITH q1 AS (
      SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
    ),
    q2 AS (
      SELECT COUNT(1) val2 FROM dirty_table
    )
    SELECT 100 * q1.val / q2.val2 AS result
    FROM q1, q2) t;
SELECT * FROM my_temp_table;

请注意,我只翻译了代码部分,不包括任何其他内容。

英文:

With your examples, I would try these approaches that, I believe, could work fine.
I checked the solution with Impala

CREATE TABLE dirty_table (
 id INT,
 msg STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED  BY ','
STORED AS TEXTFILE;


[localhost.localdomain:21000] > SELECT * FROM dirty_table;
Query: SELECT * FROM dirty_table
Query submitted at: 2020-07-28 17:05:24 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=5441d6a46ce61e7b:8e49432600000000
+----+-------------+
| id | msg         |
+----+-------------+
| 1  | 13321512121 |
| 2  | 13121212121 |
| 3  | 03121212121 |
| 4  | 13321512121 |
| 5  | 13121212121 |
| 6  | 03121212121 |
| 7  | 13121212121 |
+----+-------------+
Fetched 7 row(s) in 0.14s

First example

CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
     (SELECT COUNT(1) AS val2 FROM dirty_table) b;

[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
                              > SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
                              > FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
                              >      (SELECT COUNT(1) AS val2 FROM dirty_table) b;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
     (SELECT COUNT(1) AS val2 FROM dirty_table) b
+-------------------+
| summary           |
+-------------------+
| Inserted 0 row(s) |
+-------------------+
Fetched 1 row(s) in 0.21s

[localhost.localdomain:21000] > invalidate metadata;

[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: select * from my_temp_table
Query submitted at: 2020-07-28 17:03:44 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=47370bf793a09b:29c4dfa000000000
+---------+-----------+--------------------+
| rule_id | record_id | result             |
+---------+-----------+--------------------+
| 41      | 49        | 0.7142857142857143 |
+---------+-----------+--------------------+
Fetched 1 row(s) in 0.13s

Second example

DROP TABLE my_temp_table;

CREATE TABLE IF NOT EXISTS my_temp_table AS 
SELECT result FROM
    (WITH q1 AS (
      SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
    ),
    q2 AS (
      SELECT COUNT(1) val2 FROM dirty_table
    )
    SELECT 100 * q1.val / q2.val2 AS result
    FROM q1, q2) t;

[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS 
                              > SELECT result FROM
                              >     (WITH q1 AS (
                              >       SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
                              >     ),
                              >     q2 AS (
                              >       SELECT COUNT(1) val2 FROM dirty_table
                              >     )
                              >     SELECT 100 * q1.val / q2.val2 AS result
                              >     FROM q1, q2) t;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT result FROM
    (WITH q1 AS (
      SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
    ),
    q2 AS (
      SELECT COUNT(1) val2 FROM dirty_table
    )
    SELECT 100 * q1.val / q2.val2 AS result
    FROM q1, q2) t
+-------------------+
| summary           |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
Fetched 1 row(s) in 0.40s

[localhost.localdomain:21000] > invalidate metadata;

[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: SELECT * FROM my_temp_table
Query submitted at: 2020-07-28 17:08:17 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=3447684ef59d0c4:f70779200000000
+-------------------+
| result            |
+-------------------+
| 71.42857142857143 |
+-------------------+
Fetched 1 row(s) in 0.74s

答案2

得分: 0

以下是您要求的翻译部分:

我认为一个条件平均值可以简单高效地实现您想要的结果,只需进行一次表扫描:

select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table

您可以将其转换为create table语句:

create table my_temp_table as
select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table
英文:

I think a conditional average can do what you want simply and efficiently, with a single table scan:

select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table

You can turn this to a create table statement:

create table my_temp_table as
select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table

huangapple
  • 本文由 发表于 2020年7月27日 21:27:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/63116374.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定