2023年4月13日 16:45:44go评论112阅读模式

英文:

Why do slow queries and resource contention occur when querying float data in an int column?

问题

I'm seeing some performance issues with my database when querying for data that matches a float data type in an int column. This seems to be causing slow queries and resource contention, as multiple sessions are trying to do the same thing at the same time.

我在数据库中查询匹配整数列中的浮点数据类型时遇到了性能问题。这似乎导致了查询变慢和资源争用，因为多个会话同时尝试执行相同的操作。

I'm curious to understand the underlying reason for this issue. Why does querying for float data in an int column cause slow queries and resource contention? Is there a way to avoid this issue without modifying the schema or optimizing the queries? Any explanation or advice would be greatly appreciated.

我想了解这个问题的根本原因。为什么在整数列中查询浮点数据会导致查询变慢和资源争用？是否有办法在不修改模式或优化查询的情况下避免这个问题？非常感谢任何解释或建议。

my query is as below:

我的查询如下：

sql_command = f"SELECT a, b, c, d, " \ f"timestamp FROM {schema}.table " \ f"where timestamp = {ref_ts * 1000}"

where "timestamp" is the int column.

其中"timestamp"是整数列。

ENV: python3.10, postgresql, connecting via SQLAlchemy version 2.0.3, using pd.to_sql(sql_command, connection) method. I've tried in the Dbeaver client, but the same issue occurred.

环境：python3.10，postgresql，通过SQLAlchemy版本2.0.3连接，使用pd.to_sql(sql_command, connection)方法。我在Dbeaver客户端中尝试过，但仍然遇到了相同的问题。

I've changed the timestamp in the command as int type and the query performance issue resolved.

我已将命令中的"timestamp"更改为整数类型，查询性能问题得以解决。

英文:

I'm curious to understand the underlying reason for this issue. Why does querying for float data in an int column cause slow queries and resource contention?Is there a way to avoid this issue without modifying the schema or optimizing the queries? Any explanation or advice would be greatly appreciated.

my query is as below:

&gt;  `sql_command = f&quot;SELECT a, b, c, d, &quot; \
&gt;                 f&quot;timestamp FROM {schema}.table &quot; \
&gt;                 f&quot;where timestamp = {ref_ts * 1000}&quot;`

where "timestamp" is the int column.

ENV: python3.10, postgresql,
connecting via SQLAlchemy version 2.0.3,
using, pd.to_sql(sql_command, connection) method.
I've tried in the Dbeaver client, but the same issue occred.

I've changed the timestamp in the command as int type and the query performance issue resolved.

答案1

得分: 0

如果列被定义为int并且上面有一个索引，您必须使用具有int数据类型的谓词才能使用该索引。

在使用数值值时，对索引列进行数据类型转换，会禁用索引使用，并且会回退到Seq扫描。

演示

正确的用法

explain (analyze, verbose, buffers)
select * from tab where ID = 1;

导致Index Scan（为简洁起见，输出已截断）

Index Scan using tab_idx on jdbn.tab  (cost=0.14..8.16 rows=1 width=1008) (actual time=0.005..0.009 rows=2 loops=1)
  Output: id, pad
  Index Cond: (tab.id = 1)

而不正确的用法

explain (analyze, verbose, buffers)
select * from tab where ID = 1.0;

触发Seq扫描

Seq Scan on jdbn.tab  (cost=0.00..32.00 rows=1 width=1008) (actual time=0.012..0.094 rows=2 loops=1)
  Output: id, pad
  Filter: ((tab.id)::numeric = 1.0)

请注意，在过滤器中的转换会阻止索引使用(tab.id)::numeric

但当然，您应该使用绑定变量（而不是串联的文字值），如评论中所指出的。

正确的谓词将参数转换为int以确保索引使用：

where ID = :1::int;

设置

CREATE TABLE tab (
     id int NOT null,
     pad text
);
    
insert into tab select id, repeat('x',1000) pad from generate_series(1,100) t(id);
CREATE INDEX tab_idx ON tab USING btree (id);
analyze tab;

英文:

If the column is defined as int and has an index on it, you have to use a predicate with an int datatype to be able to use the index.

While using a numeric value a data type conversion of the indexed column is performed that disables the index use and you fall back to the Seq Scan.

Demonstration

Correct usage

explain (analyze, verbose, buffers)
select * from tab where ID = 1;

leads to Index Scan (output cut for brevity)

Index Scan using tab_idx on jdbn.tab  (cost=0.14..8.16 rows=1 width=1008) (actual time=0.005..0.009 rows=2 loops=1)
  Output: id, pad
  Index Cond: (tab.id = 1)

While the incorrect usage

explain (analyze, verbose, buffers)
select * from tab where ID = 1.0;

triggers a Seq Scan

Seq Scan on jdbn.tab  (cost=0.00..32.00 rows=1 width=1008) (actual time=0.012..0.094 rows=2 loops=1)
  Output: id, pad
  Filter: ((tab.id)::numeric = 1.0)

Note the cast in the filter that inhibits the index usage (tab.id)::numeric

But of course you should use bind variable (instead of a concatenated literal value) as pointed out in a comment.

The correct predicate would cast the parameter to int to assure the index usage:

where ID = :1::int;

Setup

CREATE TABLE tab (
     id int NOT null,
     pad text
);
    
insert into tab select id, repeat(&#39;x&#39;,1000) pad from generate_series(1,100) t(id);
CREATE INDEX tab_idx ON tab USING btree (id);
analyze tab;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在查询整数列中的浮点数据时会出现慢查询和资源争用？

问题

答案1

使用Django表单进行文件上传

当您尝试在PHP中运行命令时，为什么会在命令中添加’1’呢？

Postgres的`\d`显示表，但`\d table`不存在？

MongoDB – 按内部元素分组

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。