英文:
UPDATE ... FROM ... inside trigger function not working as expected?
问题
I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data.
I have a table with fields (id
, price
, EFF_FROM
, EFF_TO
), and if I only insert (id
, price
, EFF_FROM
), EFF_TO
needs to be calculated automatically as the next EFF_FROM
of this id
(-1 day) if it exists, and "5999-12-31" if it does not.
To update the table, I am using UPDATE ... FROM
inside of a trigger function, but it seems to update all EFF_TO
values in the table with the last given EFF_FROM
value.
The trigger function that I use looks like this:
CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
UPDATE prices_schema.prices
SET EFF_TO = subquery.next_eff_from
FROM (
SELECT COALESCE(
LEAD(EFF_FROM, 1)
OVER (
ORDER BY EFF_FROM ASC
),
TO_DATE('6000-00-00', 'YYYY-MM-DD')
) - 1 AS next_eff_from FROM prices_schema.prices
) AS subquery;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();
Load some test data:
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 100, '2017-01-12');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 150, '2017-02-09');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 125, '2017-01-27');
Execute:
SELECT * FROM prices_schema.prices
ORDER BY EFF_FROM ASC;
Result:
id | price | EFF_FROM | EFF_TO |
---|---|---|---|
1 | 100 | 2017-01-12 | 2017-01-26 |
1 | 125 | 2017-01-27 | 2017-01-26 |
1 | 150 | 2017-02-09 | 2017-01-26 |
EFF_TO
obviously doesn't look as expected to. What amazes me more is that calling this query:
SELECT eff_from, COALESCE(
LEAD(EFF_FROM, 1)
OVER (
ORDER BY EFF_FROM ASC
),
TO_DATE('6000-00-00', 'YYYY-MM-DD')
) - 1 AS next_eff_from FROM prices_schema.prices;
Gives me the correct result:
EFF_FROM | next_eff_from |
---|---|
2017-01-12 | 2017-01-26 |
2017-01-27 | 2017-02-08 |
2017-02-09 | 5999-12-31 |
I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.
英文:
I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data.
I have a table with fields (id
, price
, EFF_FROM
, EFF_TO
), and if I only insert (id
, price
, EFF_FROM
), EFF_TO
needs to be calculated automatically as the next EFF_FROM
of this id
(-1 day) if it exists, and "5999-12-31" if it does not.
To update the table, I am using UPDATE ... FROM
inside of a trigger function, but it seems to update all EFF_TO
values in the table with the last given EFF_FROM
value.
The trigger function that I use looks like this:
CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
UPDATE prices_schema.prices
SET EFF_TO = subquery.next_eff_from
FROM (
SELECT COALESCE(
LEAD(EFF_FROM, 1)
OVER (
ORDER BY EFF_FROM ASC
),
TO_DATE('6000-00-00', 'YYYY-MM-DD')
) - 1 AS next_eff_from FROM prices_schema.prices
) AS subquery;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();
Load some test data:
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 100, '2017-01-12');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 150, '2017-02-09');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 125, '2017-01-27');
Execute:
SELECT * FROM prices_schema.prices
ORDER BY EFF_FROM ASC;
Result:
id | price | EFF_FROM | EFF_TO |
---|---|---|---|
1 | 100 | 2017-01-12 | 2017-01-26 |
1 | 125 | 2017-01-27 | 2017-01-26 |
1 | 150 | 2017-02-09 | 2017-01-26 |
EFF_TO
obviously doesn't look as expected to. What amazes me more is that calling this query:
SELECT eff_from, COALESCE(
LEAD(EFF_FROM, 1)
OVER (
ORDER BY EFF_FROM ASC
),
TO_DATE('6000-00-00', 'YYYY-MM-DD')
) - 1 AS next_eff_from FROM prices_schema.prices;
Gives me the correct result:
EFF_FROM | next_eff_from |
---|---|
2017-01-12 | 2017-01-26 |
2017-01-27 | 2017-02-08 |
2017-02-09 | 5999-12-31 |
I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.
答案1
得分: 2
以下是您要翻译的内容:
问题
您的触发器执行的操作与您描述的完全不同。
首先,在您的UPDATE
中,subquery
计算了表中的每一行的新值。这既费时又不必要。我们只需要一个值。
其次,没有WHERE
子句来限制UPDATE
只更新当前行。因此,它会更新表中的每一行 - 多次。这是非常昂贵的无意义操作。
多行INSERT
使用过渡表的语句级AFTER
触发器可以实现您的目标:在同一多行INSERT
中的所有行都会被适应,同时还能看到相同批量插入的其他行。
CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
UPDATE prices_schema.prices p
SET eff_to = COALESCE(p1.next_eff_to, 'infinity')
FROM news n
LEFT JOIN LATERAL (
SELECT p1.eff_from - 1 AS next_eff_to
FROM prices_schema.prices p1
WHERE p1.id = n.id
AND p1.eff_from > n.eff_from
ORDER BY p1.eff_from
LIMIT 1
) p1 ON true
WHERE n.eff_to IS NULL
AND p.prices_id = n.prices_id;
RETURN NULL; -- 用于语句级触发器
END
$func$;
CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news -- 过渡表
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();
使用EXECUTE FUNCTION
而不是EXECUTE PROCEDURE
需要Postgres 11或更高版本。请参阅:
请注意,附加到相同INSERT
的RETURNING
不会显示触发器的效果,尚未显示出来!
Postgres日期/时间类型允许使用特殊值'infinity'
。我建议使用它,而不是您的任意回退日期“5999-12-31”(这与TO_DATE('6000-00-00', 'YYYY-MM-DD')
不同)。
**注意:**虽然这实现了您所要求的内容,但现有行不会被更新以适应新插入的行。
仅单行INSERT
CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT INTO NEW.eff_to
p.eff_from - 1
FROM prices_schema.prices P
WHERE p.id = NEW.id
AND p.eff_from > NEW.eff_from
ORDER BY p.eff_from
LIMIT 1;
IF NOT FOUND THEN
NEW.eff_to = 'infinity'; -- 比任意年份更好
END IF;
RETURN NEW; -- !
END
$func$;
CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)
EXECUTE FUNCTION prices_schema.prices_etl();
这个更简单的BEFORE
触发器非常便宜!
它按行工作 - 如FOR EACH ROW
所示 - 仅看到先前已插入的行(包括在同一语句中插入的行)。不像第一个解决方案,它不会看到批量INSERT
中的后续行。
触发器的添加WHEN
子句确保只在eff_to IS NULL
时触发。相关链接:
附加到相同INSERT
的RETURNING
已经包括触发器的所有效果 - 与上面的AFTER
触发器不同!
我这里没有使用COALESCE
。如果查询没有找到行,则不会执行COALESCE
。这将需要额外的开销。请参阅:
-
https://stackoverflow.com/questions/17839212/return-zero-if-no-record-is-found/17839278#17839278
-
https://stackoverflow.com/questions/8098795/return-a-value-if-no-record-is-found/8098816#8098816
而是我利用了特殊变量FOUND
。请参阅:
英文:
Problems
Your trigger does something else entirely from what you describe.
Firstly, subquery
in your UPDATE
computes a new value for every single row in the table. Expensive, and unnecessary. We just need the one.
Next, there is no WHERE
clause to constrain the UPDATE
to the row at hand. So it updates every row in the table - multiple times. That's hugely expensive nonsense.
Multi-row INSERT
A statement-level AFTER
trigger using a transition table achieves what you were aiming for: all rows within the same multi-row INSERT
are adapted while seeing all other rows of the same bulk insert.
CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
UPDATE prices_schema.prices p
SET eff_to = COALESCE(p1.next_eff_to, 'infinity')
FROM news n
LEFT JOIN LATERAL (
SELECT p1.eff_from - 1 AS next_eff_to
FROM prices_schema.prices p1
WHERE p1.id = n.id
AND p1.eff_from > n.eff_from
ORDER BY p1.eff_from
LIMIT 1
) p1 ON true
WHERE n.eff_to IS NULL
AND p.prices_id = n.prices_id;
RETURN NULL; -- for statement-level trigger
END
$func$;
CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news -- transition table
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();
EXECUTE FUNCTION
instead of EXECUTE PROCEDURE
requires Postgres 11 or later. See:
Note that RETURNING
appended to the same INSERT
does not show effects of the trigger, yet!
Postgres date/time types allow the special value 'infinity'
. I suggest to use that instead of your arbitrary fallback "5999-12-31" (which is not the same as TO_DATE('6000-00-00', 'YYYY-MM-DD')
.
Note: While this achieves what you ask for, pre-existing rows are not updated to fit in with newly inserted rows.
Only sinlge-row INSERT
CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT INTO NEW.eff_to
p.eff_from - 1
FROM prices_schema.prices P
WHERE p.id = NEW.id
AND p.eff_from > NEW.eff_from
ORDER BY p.eff_from
LIMIT 1;
IF NOT FOUND THEN
NEW.eff_to = 'infinity'; -- better than arbitrary year
END IF;
RETURN NEW; -- !
END
$func$;
CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)
EXECUTE FUNCTION prices_schema.prices_etl();
This simpler BEFORE
trigger is very cheap!
It works per row - as indicated by FOR EACH ROW
- and only sees rows already inserted earlier (including rows inserted in the same statement). Does not see later rows in a bulk INSERT
like the first solution.
The added WHEN
clause to the trigger makes sure it only even fires when eff_to IS NULL
. Related:
RETURNING
appended to the same INSERT
already includes all effects of the trigger - as opposed to the above AFTER
trigger!
I do not use COALESCE
here. If the query finds no row, COALESCE
is never executed. That would require additional overhead. See:
-
https://stackoverflow.com/questions/17839212/return-zero-if-no-record-is-found/17839278#17839278
-
https://stackoverflow.com/questions/8098795/return-a-value-if-no-record-is-found/8098816#8098816
Instead, I make use of the special variable FOUND
. See:
答案2
得分: 0
以下是翻译好的部分:
Turns out, I was only 2 lines away from one:
这证明,我离这个目标只差两行代码:
This now works as intended, even if multiple different id values are inserted. Execute:
现在它按预期工作,即使插入了多个不同的id值。执行:
Result:
结果:
| id | price | eff_from | eff_to |
| ----- | ----- | ----- | ----- |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |
Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.
是的,我知道它可能仍然效率很低,但它能正常工作。它似乎也适用于批量插入,但这不是我的主要关注点,因为我是通过psycopg2插入数据,解析.json文件。
What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.
更重要的是,如果插入影响两行,它会正确地更新这两行。例如:当插入价格=125的新行时,需要同时更新这一行和价格=100的行的eff_from字段,因为我们是在时间间隔之间插入的。
英文:
Turns out, I was only 2 lines away from one:
CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
UPDATE prices_schema.prices p0
SET eff_to = subquery.next_eff_from
FROM (
SELECT id, price, COALESCE(
LEAD(eff_from, 1)
OVER (
PARTITION BY id
ORDER BY eff_from ASC
),
TO_DATE('6000-00-00', 'YYYY-MM-DD')
) - 1 AS next_eff_from FROM prices_schema.prices
) AS subquery
WHERE subquery.id = p0.id AND subquery.price = p0.price;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();
This now works as intended, even if multiple different id values are inserted. Execute:
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 100, '2017-01-12');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 150, '2017-02-09');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES(1, 125, '2017-01-27');
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
VALUES (2, 900, '2016-05-05');
SELECT * FROM prices_schema.prices;
Result:
id | price | eff_from | eff_to |
---|---|---|---|
1 | 100 | 2017-01-12 | 2017-01-26 |
1 | 150 | 2017-02-09 | 5999-12-31 |
1 | 125 | 2017-01-27 | 2017-02-08 |
2 | 900 | 2016-05-05 | 5999-12-31 |
Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.
What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price
= 125 is inserted, eff_from
fields need to be updated both for this row and for row with price
= 100, since we're inserting in between the time intervals.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论