英文:
how to optimize data aggregation based on value range conditions applied on two columns
问题
I have a particles
table in PostgreSQL 10.19 that looks like this:
CREATE TABLE particles (
particle_diameter REAL,
particle_speed REAL
);
INSERT INTO particles VALUES
(0.35, 0.74),
(0.57, 2.63),
(0.27, 1.05),
(0.65, 2.33);
What I want is to aggregate my data by diameter range (i.e. each particles that have a diameter between for example 0 and 0.2 mm, 0.2 and 0.4 mm, etc.) and speed range. I want to generate a series of diameter range, and a series of speed range and finally count the number of particles for each duo diameter range and speed range.
So far I managed to reach the desired result using this query:
WITH speed_series AS (
SELECT generate_series(-1, 19.8, 0.2) AS speed_from
), speed_range AS (
SELECT speed_from, (speed_from + 0.2) AS speed_to FROM speed_series
), diameter_series AS (
SELECT generate_series(0, 9.8, 0.2) AS diameter_from
), diameter_range AS (
SELECT
diameter_from, (diameter_from + 0.2) AS diameter_to, speed_from, speed_to
FROM diameter_series, speed_range
)
SELECT
diameter_from,
diameter_to,
speed_from,
speed_to,
(SELECT
COUNT(particle_diameter)
FROM particles
WHERE particle_diameter BETWEEN diameter_from AND diameter_to
AND particle_speed BETWEEN speed_from AND speed_to
)
FROM diameter_range;
On a relatively small dataset (~30k records) this query took more than a minute to execute. So my question is:
Is there a way to rewrite this query to be more efficient and less time-consuming?
英文:
I have a particles
table in PostgreSQL 10.19 that looks like this:
CREATE TABLE particles (
particle_diameter REAL,
particle_speed REAL
);
INSERT INTO particles VALUES
(0.35, 0.74),
(0.57, 2.63),
(0.27, 1.05),
(0.65, 2.33);
What I want is to aggregate my data by diameter range (i.e. each particles that have a diameter between for example 0 and 0.2 mm, 0.2 and 0.4 mm, etc.) and speed range. I want to generate a serie of diameter range, and a serie of speed range and finally count the number of particles for each duo diameter range and speed range.
So far I managed to reach the desired result using this query:
WITH speed_series AS (
SELECT generate_series(-1, 19.8, 0.2) AS speed_from
), speed_range AS (
SELECT speed_from, (speed_from + 0.2) AS speed_to FROM speed_series
), diameter_series AS (
SELECT generate_series(0, 9.8, 0.2) AS diameter_from
), diameter_range AS (
SELECT
diameter_from, (diameter_from + 0.2) AS diameter_to, speed_from, speed_to
FROM diameter_series, speed_range
)
SELECT
diameter_from,
diameter_to,
speed_from,
speed_to,
(SELECT
COUNT(particle_diameter)
FROM particles
WHERE particle_diameter BETWEEN diameter_from AND diameter_to
AND particle_speed BETWEEN speed_from AND speed_to
)
FROM diameter_range;
You can explore it on this :
db<>fiddle
On a relatively small dataset (~30k records) this query took more than a minute to execute. So my question is:
Is there a way to rewrite this query to be more efficient and less time consuming?
答案1
得分: 1
我会在这里尝试使用width_bucket()或floor()函数:
with b as (
select width_bucket(particle_diameter, 0, 10, 50) pd,
width_bucket(particle_speed, -1, 20, 105) ps
from particles)
select pd * .2 - .2 diam_from, pd * .2 diam_to,
ps * .2 - 1.2 speed_from, ps * .2 - 1 speed_to,
count(1) cnt
from b group by pd, ps
当然,如果你需要这些零行,你可以与生成的序列连接:
with b as (
select width_bucket(particle_diameter, 0, 10, 50) pd,
width_bucket(particle_speed, -1, 20, 105) ps
from particles)
select d diam_from, d+.2 diam_to, s speed_from, s+.2 speed_to, coalesce(t.cnt, 0) cnt
from (select generate_series(0, 9.8, 0.2)) as dm(d)
cross join (select generate_series(-1, 19.8, 0.2)) AS sp(s)
left join (
select pd * .2 - .2 diam_from, ps * .2 - 1.2 speed_from, count(1) cnt
from b group by pd, ps) t
on dm.d = t.diam_from and sp.s = t.speed_from
[dbfiddle demo](https://dbfiddle.uk/k1zekvTd)
英文:
I would try width_bucket() or floor() here:
with b as (
select width_bucket(particle_diameter, 0, 10, 50) pd,
width_bucket(particle_speed, -1, 20, 105) ps
from particles)
select pd * .2 - .2 diam_from, pd * .2 diam_to,
ps * .2 - 1.2 speed_from, ps * .2 - 1 speed_to,
count(1) cnt
from b group by pd, ps
Of course you can join with generated series if you need these zero-rows:
with b as (
select width_bucket(particle_diameter, 0, 10, 50) pd,
width_bucket(particle_speed, -1, 20, 105) ps
from particles)
select d diam_from, d+.2 diam_to, s speed_from, s+.2 speed_to, coalesce(t.cnt, 0) cnt
from (select generate_series(0, 9.8, 0.2)) as dm(d)
cross join (select generate_series(-1, 19.8, 0.2)) AS sp(s)
left join (
select pd * .2 - .2 diam_from, ps * .2 - 1.2 speed_from, count(1) cnt
from b group by pd, ps) t
on dm.d = t.diam_from and sp.s = t.speed_from
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论