mysql查询在从Java客户端(JdbcTemplate)执行时卡在”发送数据”状态。

huangapple go评论66阅读模式
英文:

mysql query stuck in sending data state when execured from java client (JdbcTemplate)

问题

以下是翻译好的内容:

当我尝试在MySQL服务器上运行以下SQL查询时,MySQL服务器会一直处于“发送数据”状态:

SELECT a.cust
     , a.job_num
     , a.fund_num
     , a.fund_type
     , a.process_type
     , a.code
     , a.mail_type
     , a.rec_date
     , a.mail_date
     , a.acc_num
     , a.add1
     , a.add2
     , a.add3
     , a.add4
     , a.add5
     , a.add5
     , a.add7
     , a.tax_num
     , a.data_source
     , a.sec_dec
     , a.serv_flag
     , a.add_ind
     , a.cons_id
     , a.cust_num
     , a.cust_name
     , (CASE when a.process_type='DAILY' 
            THEN (SELECT b.roll_num 
                    FROM db_daily_report b 
                   WHERE b.acc_num = a.acc_num 
                     AND b.cust_num = a.cust_num 
                     AND b.cons_id = a.cons_id 
                     AND b.WRITTEN_TO = 'OUTPUT') 
            When a.PROCESS_TYPE='ANNUAL' 
            THEN (SELECT c.roll_num 
                    FROM db_annual_report c 
                   WHERE c.acc_num = a.acc_num 
                     AND c.cust_num = a.cust_num 
                     AND c.cons_id = a.cons_id 
                     AND c.WRITTEN_TO = 'OUTPUT') 
                END) roll_num
     , a.pref
     , a.cons_id
     , a.cntry 
  FROM audit_customer a  
 WHERE DATE(a.mail_date) BETWEEN '2020-10-05' and '2020-10-05' 
   AND a.process_type = 'Annual' 
   AND a.fund_type IN (2) 
   AND a.mail_type IN ('F') 
 ORDER 
    BY a.created_ts DESC

上述查询的解释结果如下:

  • id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
  • '1', 'PRIMARY', 'a', 'index_merge', 'idx_audit_customer_fund_type,idx_audit_customer_process_type,idx_audit_customer_rec_date', 'idx_audit_customer_rec_date,idx_audit_customer_fund_type,idx_audit_customer_process_type', '3,4,23', NULL, '196048', 'Using intersect(idx_audit_customer_rec_date,idx_audit_customer_fund_type,idx_audit_customer_process_type); Using where; Using filesort'
  • '3', 'DEPENDENT SUBQUERY', 'c', 'ref', 'idx_db_annual_report_cust_name,idx_db_annual_report_acc_num,idx_db_annual_report_cons_id', 'idx_db_annual_report_acc_num', '63', 'cmhdb.a.acc_num', '16', 'Using where'
  • '2', 'DEPENDENT SUBQUERY', 'b', 'ref', 'idx_db_daily_report_cust_name,idx_db_daily_report_acc_num,idx_db_daily_report_cons_id', 'idx_db_daily_report_acc_num', '63', 'db.a.acc_num', '16', 'Using where'

当我尝试运行 SHOW_FULL_PROCESSLIST 命令在MySQL Workbench中时,我无法理解为什么我的查询会陷入“发送数据”状态。

英文:

When i am trying to run below sql query on mysql server then mysql server is going into sending data state for forever

SELECT a.cust
     , a.job_num
     , a.fund_num
     , a.fund_type
     , a.process_type
     , a.code
     , a.mail_type
     , a.rec_date
     , a.mail_date
     , a.acc_num
     , a.add1
     , a.add2
     , a.add3
     , a.add4
     , a.add5
     , a.add5
     , a.add7
     , a.tax_num
     , a.data_source
     , a.sec_dec
     , a.serv_flag
     , a.add_ind
     , a.cons_id
     , a.cust_num
     , a.cust_name
     , (CASE when a.process_type='DAILY' 
            THEN (SELECT b.roll_num 
                    FROM db_daily_report b 
                   WHERE b.acc_num = a.acc_num 
                     AND b.cust_num = a.cust_num 
                     AND b.cons_id = a.cons_id 
                     AND b.WRITTEN_TO = 'OUTPUT') 
            When a.PROCESS_TYPE='ANNUAL' 
            THEN (SELECT c.roll_num 
                    FROM db_annual_report c 
                   WHERE c.acc_num = a.acc_num 
                     AND c.cust_num = a.cust_num 
                     AND c.cons_id = a.cons_id 
                     AND c.WRITTEN_TO = 'OUTPUT') 
                END) roll_num
     , a.pref
     , a.cons_id
     , a.cntry 
  FROM audit_customer a  
 WHERE DATE(a.mail_date) BETWEEN '2020-10-05' and '2020-10-05' 
   AND a.process_type = 'Annual' 
   AND a.fund_type IN (2) 
   AND a.mail_type IN ('F') 
 ORDER 
    BY a.created_ts DESC

explain on above query returns following result

> id, select_type, table, type, possible_keys, key, key_len, ref, rows,
> Extra
>
> '1', 'PRIMARY', 'a', 'index_merge',
> 'idx_audit_customer_fund_type,idx_audit_customer_process_type,idx_audit_customer_rec_date',
> 'idx_audit_customer_rec_date,idx_audit_customer_fund_type,idx_audit_customer_process_type',
> '3,4,23', NULL, '196048', 'Using
> intersect(idx_audit_customer_rec_date,idx_audit_customer_fund_type,idx_audit_customer_process_type);
> Using where; Using filesort'
>
>
> '3', 'DEPENDENT SUBQUERY', 'c', 'ref',
> 'idx_db_annual_report_cust_name,idx_db_annual_report_acc_num,idx_db_annual_report_cons_id', 'idx_db_annual_report_acc_num', '63', 'cmhdb.a.acc_num', '16', 'Using
> where'
>
> '2', 'DEPENDENT SUBQUERY', 'b', 'ref',
> 'idx_db_daily_report_cust_name,idx_db_daily_report_acc_num,idx_db_daily_report_cons_id',
> 'idx_db_daily_report_acc_num', '63', 'db.a.acc_num', '16', 'Using
> where'

I am not able to understand why my query is stuck in sending_data state when I try to run

> SHOW_FULL_PROCESSLIST

command in mysql workbench

答案1

得分: 1

你的 case/when 结构有问题... 你的第二个 WHEN 语句也使用了别名 "b",但是连接条件却使用了别名 "c",可能导致产生笛卡尔积的结果,并且在每条记录之间产生问题。

另外,不了解你的数据,如果这些 case/when 选择语句返回多于一条的记录,可能会失败。

根据你的要求进行反馈:

以下是我建议的更改。由于你是基于年度运行的,所以我去掉了 case/when,并直接设置为与你的年度表进行连接的直接 JOIN 条件。如果是按日运行,我会将表从年度更改为每日,以便在连接部分使用。

此外,为了优化查询,我会在 (process_type, fund_type, mail_type, mail_date, created_ts) 上建立索引。

日期的 where 子句分成两部分,而不是使用 between。由于你之前使用了 date() 函数来去除记录中的任何时间部分,这无法针对索引进行优化。因此,我上面的索引将其他字段放在较高的优先级,将日期放在末尾。通过使用大于或等于 '2020-10-05',可以获得从那天午夜/早上12:00 开始的所有内容。通过再添加一个小于 '2020-10-06',可以获得从那天的10/5日期开始到10/6日的 11:59:59PM 之间的所有内容。因此,现在索引可以包含日期部分。

SELECT a.cust, a.job_num, a.fund_num, a.fund_type, a.process_type, a.code, a.mail_type, a.rec_date, a.mail_date, a.acc_num, a.add1, a.add2, a.add3, a.add4, a.add5, a.add5, a.add7, a.tax_num, a.data_source, a.sec_dec, a.serv_flag, a.add_ind, a.cons_id, a.cust_num, a.cust_name, c.roll_num, a.pref, a.cons_id, a.cntry 
FROM audit_customer a  
JOIN db_annual_report c 
  ON c.acc_num = a.acc_num 
  AND c.cust_num = a.cust_num 
  AND c.cons_id = a.cons_id 
  AND c.WRITTEN_TO = 'OUTPUT'
WHERE 
  a.process_type = 'Annual' 
  AND a.fund_type IN (2) 
  AND a.mail_type IN ('F') 
  AND a.mail_date >= '2020-10-05' 
  AND a.mail_date < '2020-10-06' 
ORDER BY a.created_ts DESC

第二次反馈:

为了解释大于等于(>=)和小于(<)与 between 的区别。你原来的查询使用了 date() 函数调用,仅从日期/时间列中获取日期部分。在 where 子句中使用函数调用无法进行优化,也无法应用于索引。

因此,当你使用 date(mail_date) between '2020-10-05' and '2020-10-05' 时,实际上是在询问仅针对 2020-10-05 这一天的活动,无论是早上、下午还是深夜,只要日期是 2020-10-05。

为了利用日期时间字段上的索引,我没有使用 date() 函数,而是允许完整的日期时间字段成为 where 子句的一部分,通过明确的大于等于(>=)和小于(<)来实现。

因此,我的起始日期/时间是 2020-10-05,时间是凌晨 12:00:00(从 2020-10-04 到 2020-10-05)。因此,从上午开始,即使是在 2020-10-05 12:00:01(如果活动如此早)之前的活动,都会通过 >= '2020-10-05' 包含在内。通过不明确指定时间,午夜(凌晨)的时间是默认的。

现在,结束日期范围,我明确地使用小于(<) 2020-10-06。这意味着它可以一直到 2020-10-05 到 23:59:59(在 2020-10-06 开始之前的午夜之前)。

如果我使用 date(mail_date) between '2020-10-05' and '2020-10-06',我将会获得两天的活动。由于你只关心一天,between 不是必需的,可以用 date(mail_date) = '2020-10-05' 来表示,但这两种情况都不能通过索引进行优化。

有帮助吗?我认为我无法再进一步解释了。

英文:

Your case/when construct is off... Your second WHEN is also using an alias "b", but the join condition is using alias "c" and probably throwing into a Cartesian result and choking on an every record to every record.

Also, not knowing your data, it will probably fail if either of those case/when select statements return more than one record.

FEEDBACK per request

Here is what I would change to. You are running based on annual, so I removed the case/when and just set to a direct JOIN condition to your annual table. If doing daily, I would have just change the table from the annual to your daily for the JOIN portion.

Also, to help optimize the query, I would have an index on (process_type, fund_type, mail_type, mail_date, created_ts )

The where clause with the date is done in two parts instead of between. Since you were using the date() function to strip off any time component of the record, that can not be optimized against an index. So, my index above puts the other fields in higher priority and moved the date to the tail-end. By doing a GREATER THAN or EQUAL to the '2020-10-05', you get everything on/after 12:00 midnight/morning of that date. By also getting AND LESS THAN '2020-10-06', you are getting everything for the 10/5 date up to 11:59:59PM for that day, hence less than 10/6. So now the index can be utilized inclusive of the date portion.

SELECT a.cust
     , a.job_num
     , a.fund_num
     , a.fund_type
     , a.process_type
     , a.code
     , a.mail_type
     , a.rec_date
     , a.mail_date
     , a.acc_num
     , a.add1
     , a.add2
     , a.add3
     , a.add4
     , a.add5
     , a.add5
     , a.add7
     , a.tax_num
     , a.data_source
     , a.sec_dec
     , a.serv_flag
     , a.add_ind
     , a.cons_id
     , a.cust_num
     , a.cust_name
     , c.roll_num
     , a.pref
     , a.cons_id
     , a.cntry 
  FROM 
     audit_customer a  
	    JOIN db_annual_report c 
		  on c.acc_num = a.acc_num 
         AND c.cust_num = a.cust_num 
         AND c.cons_id = a.cons_id 
         AND c.WRITTEN_TO = &#39;OUTPUT&#39;

 WHERE 
	   a.process_type = &#39;Annual&#39; 
   AND a.fund_type IN (2) 
   AND a.mail_type IN (&#39;F&#39;) 
   AND a.mail_date &gt;= &#39;2020-10-05&#39; 
   and a.mail_date &lt; &#39;2020-10-06&#39; 
 ORDER 
    BY a.created_ts DESC

Feedback #2

To add clarification of >= and < vs between. Your original query was using the date() function call to get only the date portion from a date/time column. Using function calls in a where clause is not optimizable and can not be applied to an index.

So when you had date(mail_date) between &#39;2020-10-05&#39; and &#39;2020-10-05&#39;, you were essentially asking for only activity for the single date of 2020-10-05 regardless of morning, afternoon, or late at night, as long as the date was 2020-10-05.

To take advantage of an index on a datetime field that could be any time within the day, I am not using the date() function, but allowing the full date/time field to be part of the where clause by doing the explicit >= and <

So my from date/time is 2020-10-05 at 12:00:00 am (midnight from 2020-10-04 changing to 2020-10-05). So, any activity as of the am even as early as 2020-10-05 12:00:01am (should activity be so early), would be included via the >= '2020-10-05'. By not explicitly stating the time, the 12 midnight (am) is the default.

Now, the ENDING date range, I am explicitly doing LESS THAN the 2020-10-06. This means it allows all the way up to 2020-10-05 to 11:59:59pm (just before midnight starting 2020-10-06).

If I did date( mail_date ) between &#39;2020-10-05&#39; and &#39;2020-10-06&#39;, I would get activity for BOTH dates. Since you only cared about 1 day, the BETWEEN was not required and could have been done with date( mail_date ) = &#39;2020-10-05&#39; but would not have been optimized with an index either condition.

Does that help? I don't think I can explain it any further.

huangapple
  • 本文由 发表于 2020年10月16日 22:59:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/64391642.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定