为什么在查询整数列中的浮点数据时会出现慢查询和资源争用?

huangapple go评论76阅读模式
英文:

Why do slow queries and resource contention occur when querying float data in an int column?

问题

I'm seeing some performance issues with my database when querying for data that matches a float data type in an int column. This seems to be causing slow queries and resource contention, as multiple sessions are trying to do the same thing at the same time.

我在数据库中查询匹配整数列中的浮点数据类型时遇到了性能问题。这似乎导致了查询变慢和资源争用,因为多个会话同时尝试执行相同的操作。

I'm curious to understand the underlying reason for this issue. Why does querying for float data in an int column cause slow queries and resource contention? Is there a way to avoid this issue without modifying the schema or optimizing the queries? Any explanation or advice would be greatly appreciated.

我想了解这个问题的根本原因。为什么在整数列中查询浮点数据会导致查询变慢和资源争用?是否有办法在不修改模式或优化查询的情况下避免这个问题?非常感谢任何解释或建议。

my query is as below:

我的查询如下:

sql_command = f"SELECT a, b, c, d, " \ f"timestamp FROM {schema}.table " \ f"where timestamp = {ref_ts * 1000}"

where "timestamp" is the int column.

其中"timestamp"是整数列。

ENV: python3.10, postgresql, connecting via SQLAlchemy version 2.0.3, using pd.to_sql(sql_command, connection) method. I've tried in the Dbeaver client, but the same issue occurred.

环境:python3.10,postgresql,通过SQLAlchemy版本2.0.3连接,使用pd.to_sql(sql_command, connection)方法。我在Dbeaver客户端中尝试过,但仍然遇到了相同的问题。

I've changed the timestamp in the command as int type and the query performance issue resolved.

我已将命令中的"timestamp"更改为整数类型,查询性能问题得以解决。

英文:

I'm seeing some performance issues with my database when querying for data that matches a float data type in an int column. This seems to be causing slow queries and resource contention, as multiple sessions are trying to do the same thing at the same time.

I'm curious to understand the underlying reason for this issue. Why does querying for float data in an int column cause slow queries and resource contention?Is there a way to avoid this issue without modifying the schema or optimizing the queries? Any explanation or advice would be greatly appreciated.

my query is as below:

>  `sql_command = f"SELECT a, b, c, d, " \
>                 f"timestamp FROM {schema}.table " \
>                 f"where timestamp = {ref_ts * 1000}"`

where "timestamp" is the int column.

ENV: python3.10, postgresql,
connecting via SQLAlchemy version 2.0.3,
using, pd.to_sql(sql_command, connection) method.
I've tried in the Dbeaver client, but the same issue occred.

I've changed the timestamp in the command as int type and the query performance issue resolved.

答案1

得分: 0

如果列被定义为int并且上面有一个索引,您必须使用具有int数据类型的谓词才能使用该索引。

在使用数值值时,对索引列进行数据类型转换,会禁用索引使用,并且会回退到Seq扫描。

演示

正确的用法

explain (analyze, verbose, buffers)
select * from tab where ID = 1;

导致Index Scan(为简洁起见,输出已截断)

Index Scan using tab_idx on jdbn.tab  (cost=0.14..8.16 rows=1 width=1008) (actual time=0.005..0.009 rows=2 loops=1)
  Output: id, pad
  Index Cond: (tab.id = 1)

而不正确的用法

explain (analyze, verbose, buffers)
select * from tab where ID = 1.0;

触发Seq扫描

Seq Scan on jdbn.tab  (cost=0.00..32.00 rows=1 width=1008) (actual time=0.012..0.094 rows=2 loops=1)
  Output: id, pad
  Filter: ((tab.id)::numeric = 1.0)

请注意,在过滤器中的转换会阻止索引使用(tab.id)::numeric

但当然,您应该使用绑定变量(而不是串联的文字值),如评论中所指出的。

正确的谓词将参数转换为int以确保索引使用:

where ID = :1::int;

设置

CREATE TABLE tab (
     id int NOT null,
     pad text
);
    
insert into tab select id, repeat('x',1000) pad from generate_series(1,100) t(id);

CREATE INDEX tab_idx ON tab USING btree (id);

analyze tab;
英文:

If the column is defined as int and has an index on it, you have to use a predicate with an int datatype to be able to use the index.

While using a numeric value a data type conversion of the indexed column is performed that disables the index use and you fall back to the Seq Scan.

Demonstration

Correct usage

explain (analyze, verbose, buffers)
select * from tab where ID = 1;

leads to Index Scan (output cut for brevity)

Index Scan using tab_idx on jdbn.tab  (cost=0.14..8.16 rows=1 width=1008) (actual time=0.005..0.009 rows=2 loops=1)
  Output: id, pad
  Index Cond: (tab.id = 1)

While the incorrect usage

explain (analyze, verbose, buffers)
select * from tab where ID = 1.0;

triggers a Seq Scan

Seq Scan on jdbn.tab  (cost=0.00..32.00 rows=1 width=1008) (actual time=0.012..0.094 rows=2 loops=1)
  Output: id, pad
  Filter: ((tab.id)::numeric = 1.0)

Note the cast in the filter that inhibits the index usage (tab.id)::numeric

But of course you should use bind variable (instead of a concatenated literal value) as pointed out in a comment.

The correct predicate would cast the parameter to int to assure the index usage:

where ID = :1::int;

Setup

CREATE TABLE tab (
     id int NOT null,
     pad text
);
    
insert into tab select id, repeat('x',1000) pad from generate_series(1,100) t(id);

CREATE INDEX tab_idx ON tab USING btree (id);

analyze tab;

huangapple
  • 本文由 发表于 2023年4月13日 16:45:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76003436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定