UPDATE … FROM … 触发函数内部无法按预期工作?

huangapple go评论67阅读模式
英文:

UPDATE ... FROM ... inside trigger function not working as expected?

问题

I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data.
I have a table with fields (id, price, EFF_FROM, EFF_TO), and if I only insert (id, price, EFF_FROM), EFF_TO needs to be calculated automatically as the next EFF_FROM of this id (-1 day) if it exists, and "5999-12-31" if it does not.

To update the table, I am using UPDATE ... FROM inside of a trigger function, but it seems to update all EFF_TO values in the table with the last given EFF_FROM value.

The trigger function that I use looks like this:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
	UPDATE prices_schema.prices
	SET EFF_TO = subquery.next_eff_from
	FROM (
		SELECT COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices
	) AS subquery;
	
	RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

Load some test data:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 100, '2017-01-12');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 150, '2017-02-09');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 125, '2017-01-27');

Execute:

SELECT * FROM prices_schema.prices
	ORDER BY EFF_FROM ASC;

Result:

id price EFF_FROM EFF_TO
1 100 2017-01-12 2017-01-26
1 125 2017-01-27 2017-01-26
1 150 2017-02-09 2017-01-26

EFF_TO obviously doesn't look as expected to. What amazes me more is that calling this query:

SELECT eff_from, COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices;

Gives me the correct result:

EFF_FROM next_eff_from
2017-01-12 2017-01-26
2017-01-27 2017-02-08
2017-02-09 5999-12-31

I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.

英文:

I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data.
I have a table with fields (id, price, EFF_FROM, EFF_TO), and if I only insert (id, price, EFF_FROM), EFF_TO needs to be calculated automatically as the next EFF_FROM of this id (-1 day) if it exists, and "5999-12-31" if it does not.

To update the table, I am using UPDATE ... FROM inside of a trigger function, but it seems to update all EFF_TO values in the table with the last given EFF_FROM value.

The trigger function that I use looks like this:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
	UPDATE prices_schema.prices
	SET EFF_TO = subquery.next_eff_from
	FROM (
		SELECT COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices
	) AS subquery;
	
	RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

Load some test data:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 100, '2017-01-12');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 150, '2017-02-09');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 125, '2017-01-27');

Execute:

SELECT * FROM prices_schema.prices
	ORDER BY EFF_FROM ASC;

Result:

id price EFF_FROM EFF_TO
1 100 2017-01-12 2017-01-26
1 125 2017-01-27 2017-01-26
1 150 2017-02-09 2017-01-26

EFF_TO obviously doesn't look as expected to. What amazes me more is that calling this query:

SELECT eff_from, COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices;

Gives me the correct result:

EFF_FROM next_eff_from
2017-01-12 2017-01-26
2017-01-27 2017-02-08
2017-02-09 5999-12-31

I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.

答案1

得分: 2

以下是您要翻译的内容:

问题

您的触发器执行的操作与您描述的完全不同。

首先,在您的UPDATE中,subquery计算了表中的每一行的新值。这既费时又不必要。我们只需要一个值。

其次,没有WHERE子句来限制UPDATE只更新当前行。因此,它会更新表中的每一行 - 多次。这是非常昂贵的无意义操作。

多行INSERT

使用过渡表的语句级AFTER触发器可以实现您的目标:在同一多行INSERT中的所有行都会被适应,同时还能看到相同批量插入的其他行。

CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   UPDATE prices_schema.prices p
   SET    eff_to = COALESCE(p1.next_eff_to, 'infinity')
   FROM   news n
   LEFT   JOIN LATERAL (
      SELECT p1.eff_from - 1 AS next_eff_to
      FROM   prices_schema.prices p1
      WHERE  p1.id = n.id
      AND    p1.eff_from > n.eff_from
      ORDER  BY p1.eff_from
      LIMIT  1
      ) p1 ON true
   WHERE  n.eff_to IS NULL
   AND    p.prices_id = n.prices_id;

   RETURN NULL;  -- 用于语句级触发器
END
$func$;


CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news  -- 过渡表
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();

fiddle

使用EXECUTE FUNCTION而不是EXECUTE PROCEDURE需要Postgres 11或更高版本。请参阅:

请注意,附加到相同INSERTRETURNING不会显示触发器的效果,尚未显示出来!

Postgres日期/时间类型允许使用特殊值'infinity'。我建议使用它,而不是您的任意回退日期“5999-12-31”(这与TO_DATE('6000-00-00', 'YYYY-MM-DD')不同)。

**注意:**虽然这实现了您所要求的内容,但现有行不会被更新以适应新插入的行。

仅单行INSERT

CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   SELECT INTO NEW.eff_to
          p.eff_from - 1
   FROM   prices_schema.prices P
   WHERE  p.id = NEW.id
   AND    p.eff_from > NEW.eff_from
   ORDER  BY p.eff_from
   LIMIT  1;
   
   IF NOT FOUND THEN
      NEW.eff_to = 'infinity';  -- 比任意年份更好
   END IF;
   
   RETURN NEW;  -- !
END
$func$;


CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)            
EXECUTE FUNCTION prices_schema.prices_etl();

fiddle

这个更简单的BEFORE触发器非常便宜!

它按行工作 - 如FOR EACH ROW所示 - 仅看到先前已插入的行(包括在同一语句中插入的行)。不像第一个解决方案,它不会看到批量INSERT中的后续行。

触发器的添加WHEN子句确保只在eff_to IS NULL时触发。相关链接:

附加到相同INSERTRETURNING已经包括触发器的所有效果 - 与上面的AFTER触发器不同!

我这里没有使用COALESCE。如果查询没有找到行,则不会执行COALESCE。这将需要额外的开销。请参阅:

而是我利用了特殊变量FOUND。请参阅:

英文:

Problems

Your trigger does something else entirely from what you describe.

Firstly, subquery in your UPDATE computes a new value for every single row in the table. Expensive, and unnecessary. We just need the one.

Next, there is no WHERE clause to constrain the UPDATE to the row at hand. So it updates every row in the table - multiple times. That's hugely expensive nonsense.

Multi-row INSERT

A statement-level AFTER trigger using a transition table achieves what you were aiming for: all rows within the same multi-row INSERT are adapted while seeing all other rows of the same bulk insert.

CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   UPDATE prices_schema.prices p
   SET    eff_to = COALESCE(p1.next_eff_to, 'infinity')
   FROM   news n
   LEFT   JOIN LATERAL (
      SELECT p1.eff_from - 1 AS next_eff_to
      FROM   prices_schema.prices p1
      WHERE  p1.id = n.id
      AND    p1.eff_from > n.eff_from
      ORDER  BY p1.eff_from
      LIMIT  1
      ) p1 ON true
   WHERE  n.eff_to IS NULL
   AND    p.prices_id = n.prices_id;

   RETURN NULL;  -- for statement-level trigger
END
$func$;


CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news  -- transition table
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();

fiddle

EXECUTE FUNCTION instead of EXECUTE PROCEDURE requires Postgres 11 or later. See:

Note that RETURNING appended to the same INSERT does not show effects of the trigger, yet!

Postgres date/time types allow the special value 'infinity'. I suggest to use that instead of your arbitrary fallback "5999-12-31" (which is not the same as TO_DATE('6000-00-00', 'YYYY-MM-DD').

Note: While this achieves what you ask for, pre-existing rows are not updated to fit in with newly inserted rows.

Only sinlge-row INSERT

CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   SELECT INTO NEW.eff_to
          p.eff_from - 1
   FROM   prices_schema.prices P
   WHERE  p.id = NEW.id
   AND    p.eff_from > NEW.eff_from
   ORDER  BY p.eff_from
   LIMIT  1;
   
   IF NOT FOUND THEN
      NEW.eff_to = 'infinity';  -- better than arbitrary year 
   END IF;
   
   RETURN NEW;  -- !
END
$func$;


CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)            
EXECUTE FUNCTION prices_schema.prices_etl();

fiddle

This simpler BEFORE trigger is very cheap!

It works per row - as indicated by FOR EACH ROW - and only sees rows already inserted earlier (including rows inserted in the same statement). Does not see later rows in a bulk INSERT like the first solution.

The added WHEN clause to the trigger makes sure it only even fires when eff_to IS NULL. Related:

RETURNING appended to the same INSERT already includes all effects of the trigger - as opposed to the above AFTER trigger!

I do not use COALESCE here. If the query finds no row, COALESCE is never executed. That would require additional overhead. See:

Instead, I make use of the special variable FOUND. See:

答案2

得分: 0

以下是翻译好的部分:

Turns out, I was only 2 lines away from one:
这证明,我离这个目标只差两行代码:

This now works as intended, even if multiple different id values are inserted. Execute:
现在它按预期工作,即使插入了多个不同的id值。执行:

Result:
结果:

| id | price | eff_from | eff_to |
| ----- | ----- | ----- | ----- |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |

Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.
是的,我知道它可能仍然效率很低,但它能正常工作。它似乎也适用于批量插入,但这不是我的主要关注点,因为我是通过psycopg2插入数据,解析.json文件。

What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.
更重要的是,如果插入影响两行,它会正确地更新这两行。例如:当插入价格=125的新行时,需要同时更新这一行和价格=100的行的eff_from字段,因为我们是在时间间隔之间插入的。
英文:

Turns out, I was only 2 lines away from one:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
    UPDATE prices_schema.prices p0
    SET eff_to = subquery.next_eff_from
    FROM (
        SELECT id, price, COALESCE(
            LEAD(eff_from, 1)
            OVER (
				PARTITION BY id
                ORDER BY eff_from ASC
            ),
            TO_DATE('6000-00-00', 'YYYY-MM-DD')
        ) - 1 AS next_eff_from FROM prices_schema.prices
    ) AS subquery
	WHERE subquery.id = p0.id AND subquery.price = p0.price;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

This now works as intended, even if multiple different id values are inserted. Execute:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 100, '2017-01-12');
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 150, '2017-02-09');
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 125, '2017-01-27');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES (2, 900, '2016-05-05');
	
SELECT * FROM prices_schema.prices;

Result:

id price eff_from eff_to
1 100 2017-01-12 2017-01-26
1 150 2017-02-09 5999-12-31
1 125 2017-01-27 2017-02-08
2 900 2016-05-05 5999-12-31

Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.

What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.

huangapple
  • 本文由 发表于 2023年5月22日 06:10:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302128.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定