2023年5月22日 06:10:49go评论93阅读模式

英文:

UPDATE ... FROM ... inside trigger function not working as expected?

问题

I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data.
I have a table with fields (id, price, EFF_FROM, EFF_TO), and if I only insert (id, price, EFF_FROM), EFF_TO needs to be calculated automatically as the next EFF_FROM of this id (-1 day) if it exists, and "5999-12-31" if it does not.

To update the table, I am using UPDATE ... FROM inside of a trigger function, but it seems to update all EFF_TO values in the table with the last given EFF_FROM value.

The trigger function that I use looks like this:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
	UPDATE prices_schema.prices
	SET EFF_TO = subquery.next_eff_from
	FROM (
		SELECT COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices
	) AS subquery;
	
	RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

Load some test data:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 100, '2017-01-12');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 150, '2017-02-09');
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 125, '2017-01-27');

Execute:

SELECT * FROM prices_schema.prices
	ORDER BY EFF_FROM ASC;

Result:

id	price	EFF_FROM	EFF_TO
1	100	2017-01-12	2017-01-26
1	125	2017-01-27	2017-01-26
1	150	2017-02-09	2017-01-26

EFF_TO obviously doesn't look as expected to. What amazes me more is that calling this query:

SELECT eff_from, COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE('6000-00-00', 'YYYY-MM-DD')
		) - 1 AS next_eff_from FROM prices_schema.prices;

Gives me the correct result:

EFF_FROM	next_eff_from
2017-01-12	2017-01-26
2017-01-27	2017-02-08
2017-02-09	5999-12-31

I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.

英文:

To update the table, I am using UPDATE ... FROM inside of a trigger function, but it seems to update all EFF_TO values in the table with the last given EFF_FROM value.

The trigger function that I use looks like this:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
	UPDATE prices_schema.prices
	SET EFF_TO = subquery.next_eff_from
	FROM (
		SELECT COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE(&#39;6000-00-00&#39;, &#39;YYYY-MM-DD&#39;)
		) - 1 AS next_eff_from FROM prices_schema.prices
	) AS subquery;
	
	RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

Load some test data:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 100, &#39;2017-01-12&#39;);
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 150, &#39;2017-02-09&#39;);
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES(1, 125, &#39;2017-01-27&#39;);

Execute:

SELECT * FROM prices_schema.prices
	ORDER BY EFF_FROM ASC;

Result:

id	price	EFF_FROM	EFF_TO
1	100	2017-01-12	2017-01-26
1	125	2017-01-27	2017-01-26
1	150	2017-02-09	2017-01-26

EFF_TO obviously doesn't look as expected to. What amazes me more is that calling this query:

SELECT eff_from, COALESCE(
			LEAD(EFF_FROM, 1)
			OVER (
				ORDER BY EFF_FROM ASC
			),
			TO_DATE(&#39;6000-00-00&#39;, &#39;YYYY-MM-DD&#39;)
		) - 1 AS next_eff_from FROM prices_schema.prices;

Gives me the correct result:

EFF_FROM	next_eff_from
2017-01-12	2017-01-26
2017-01-27	2017-02-08
2017-02-09	5999-12-31

I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.

答案1

得分: 2

以下是您要翻译的内容：

问题

您的触发器执行的操作与您描述的完全不同。

首先，在您的UPDATE中，subquery计算了表中的每一行的新值。这既费时又不必要。我们只需要一个值。

其次，没有WHERE子句来限制UPDATE只更新当前行。因此，它会更新表中的每一行 - 多次。这是非常昂贵的无意义操作。

多行`INSERT`

使用过渡表的语句级AFTER触发器可以实现您的目标：在同一多行INSERT中的所有行都会被适应，同时还能看到相同批量插入的其他行。

CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   UPDATE prices_schema.prices p
   SET    eff_to = COALESCE(p1.next_eff_to, 'infinity')
   FROM   news n
   LEFT   JOIN LATERAL (
      SELECT p1.eff_from - 1 AS next_eff_to
      FROM   prices_schema.prices p1
      WHERE  p1.id = n.id
      AND    p1.eff_from > n.eff_from
      ORDER  BY p1.eff_from
      LIMIT  1
      ) p1 ON true
   WHERE  n.eff_to IS NULL
   AND    p.prices_id = n.prices_id;
   RETURN NULL;  -- 用于语句级触发器
END
$func$;
CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news  -- 过渡表
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();

fiddle

使用EXECUTE FUNCTION而不是EXECUTE PROCEDURE需要Postgres 11或更高版本。请参阅：

触发器使用过程还是函数？

请注意，附加到相同INSERT的RETURNING不会显示触发器的效果，尚未显示出来！

Postgres日期/时间类型允许使用特殊值'infinity'。我建议使用它，而不是您的任意回退日期“5999-12-31”（这与TO_DATE('6000-00-00', 'YYYY-MM-DD')不同）。

**注意：**虽然这实现了您所要求的内容，但现有行不会被更新以适应新插入的行。

仅单行`INSERT`

CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   SELECT INTO NEW.eff_to
          p.eff_from - 1
   FROM   prices_schema.prices P
   WHERE  p.id = NEW.id
   AND    p.eff_from > NEW.eff_from
   ORDER  BY p.eff_from
   LIMIT  1;
   
   IF NOT FOUND THEN
      NEW.eff_to = 'infinity';  -- 比任意年份更好
   END IF;
   
   RETURN NEW;  -- !
END
$func$;
CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)            
EXECUTE FUNCTION prices_schema.prices_etl();

fiddle

这个更简单的BEFORE触发器非常便宜！

它按行工作 - 如FOR EACH ROW所示 - 仅看到先前已插入的行（包括在同一语句中插入的行）。不像第一个解决方案，它不会看到批量INSERT中的后续行。

触发器的添加WHEN子句确保只在eff_to IS NULL时触发。相关链接：

https://stackoverflow.com/questions/56138945/trigger-how-to-delete-row-instead-of-update-based-on-cell-value/57015428#57015428

附加到相同INSERT的RETURNING已经包括触发器的所有效果 - 与上面的AFTER触发器不同！

我这里没有使用COALESCE。如果查询没有找到行，则不会执行COALESCE。这将需要额外的开销。请参阅：

而是我利用了特殊变量FOUND。请参阅：

https://stackoverflow.com/questions/27350232/why-is-is-not-null-false-when-checking-a-row-type/27353191#27353191

英文:

Problems

Your trigger does something else entirely from what you describe.

Firstly, subquery in your UPDATE computes a new value for every single row in the table. Expensive, and unnecessary. We just need the one.

Next, there is no WHERE clause to constrain the UPDATE to the row at hand. So it updates every row in the table - multiple times. That's hugely expensive nonsense.

Multi-row `INSERT`

A statement-level AFTER trigger using a transition table achieves what you were aiming for: all rows within the same multi-row INSERT are adapted while seeing all other rows of the same bulk insert.

CREATE OR REPLACE FUNCTION prices_schema.prices_set_all_eff_to()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   UPDATE prices_schema.prices p
   SET    eff_to = COALESCE(p1.next_eff_to, &#39;infinity&#39;)
   FROM   news n
   LEFT   JOIN LATERAL (
      SELECT p1.eff_from - 1 AS next_eff_to
      FROM   prices_schema.prices p1
      WHERE  p1.id = n.id
      AND    p1.eff_from &gt; n.eff_from
      ORDER  BY p1.eff_from
      LIMIT  1
      ) p1 ON true
   WHERE  n.eff_to IS NULL
   AND    p.prices_id = n.prices_id;
   RETURN NULL;  -- for statement-level trigger
END
$func$;
CREATE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
REFERENCING NEW TABLE AS news  -- transition table
FOR EACH STATEMENT
EXECUTE FUNCTION prices_set_all_eff_to();

fiddle

EXECUTE FUNCTION instead of EXECUTE PROCEDURE requires Postgres 11 or later. See:

Trigger uses a procedure or a function?

Note that RETURNING appended to the same INSERT does not show effects of the trigger, yet!

Postgres date/time types allow the special value 'infinity'. I suggest to use that instead of your arbitrary fallback "5999-12-31" (which is not the same as TO_DATE('6000-00-00', 'YYYY-MM-DD').

Note: While this achieves what you ask for, pre-existing rows are not updated to fit in with newly inserted rows.

Only sinlge-row `INSERT`

CREATE OR REPLACE FUNCTION prices_schema.prices_etl()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   SELECT INTO NEW.eff_to
          p.eff_from - 1
   FROM   prices_schema.prices P
   WHERE  p.id = NEW.id
   AND    p.eff_from &gt; NEW.eff_from
   ORDER  BY p.eff_from
   LIMIT  1;
   
   IF NOT FOUND THEN
      NEW.eff_to = &#39;infinity&#39;;  -- better than arbitrary year 
   END IF;
   
   RETURN NEW;  -- !
END
$func$;
CREATE OR REPLACE TRIGGER before_insert_prices
BEFORE INSERT ON prices_schema.prices
FOR EACH ROW
WHEN (NEW.eff_to IS NULL)            
EXECUTE FUNCTION prices_schema.prices_etl();

fiddle

This simpler BEFORE trigger is very cheap!

It works per row - as indicated by FOR EACH ROW - and only sees rows already inserted earlier (including rows inserted in the same statement). Does not see later rows in a bulk INSERT like the first solution.

The added WHEN clause to the trigger makes sure it only even fires when eff_to IS NULL. Related:

https://stackoverflow.com/questions/56138945/trigger-how-to-delete-row-instead-of-update-based-on-cell-value/57015428#57015428

RETURNING appended to the same INSERT already includes all effects of the trigger - as opposed to the above AFTER trigger!

I do not use COALESCE here. If the query finds no row, COALESCE is never executed. That would require additional overhead. See:

Instead, I make use of the special variable FOUND. See:

https://stackoverflow.com/questions/27350232/why-is-is-not-null-false-when-checking-a-row-type/27353191#27353191

答案2

得分: 0

以下是翻译好的部分：

Turns out, I was only 2 lines away from one:
这证明，我离这个目标只差两行代码：
This now works as intended, even if multiple different id values are inserted. Execute:
现在它按预期工作，即使插入了多个不同的id值。执行：
Result:
结果：
| id | price | eff_from | eff_to |
| ----- | ----- | ----- | ----- |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |
| 1 | 100 | 2017-01-12 | 2017-01-26 |
| 1 | 150 | 2017-02-09 | 5999-12-31 |
| 1 | 125 | 2017-01-27 | 2017-02-08 |
| 2 | 900 | 2016-05-05 | 5999-12-31 |
Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.
是的，我知道它可能仍然效率很低，但它能正常工作。它似乎也适用于批量插入，但这不是我的主要关注点，因为我是通过psycopg2插入数据，解析.json文件。
What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.
更重要的是，如果插入影响两行，它会正确地更新这两行。例如：当插入价格=125的新行时，需要同时更新这一行和价格=100的行的eff_from字段，因为我们是在时间间隔之间插入的。

英文:

Turns out, I was only 2 lines away from one:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
    UPDATE prices_schema.prices p0
    SET eff_to = subquery.next_eff_from
    FROM (
        SELECT id, price, COALESCE(
            LEAD(eff_from, 1)
            OVER (
				PARTITION BY id
                ORDER BY eff_from ASC
            ),
            TO_DATE(&#39;6000-00-00&#39;, &#39;YYYY-MM-DD&#39;)
        ) - 1 AS next_eff_from FROM prices_schema.prices
    ) AS subquery
	WHERE subquery.id = p0.id AND subquery.price = p0.price;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

This now works as intended, even if multiple different id values are inserted. Execute:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 100, &#39;2017-01-12&#39;);
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 150, &#39;2017-02-09&#39;);
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 125, &#39;2017-01-27&#39;);
	
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
	VALUES (2, 900, &#39;2016-05-05&#39;);
	
SELECT * FROM prices_schema.prices;

Result:

id	price	eff_from	eff_to
1	100	2017-01-12	2017-01-26
1	150	2017-02-09	5999-12-31
1	125	2017-01-27	2017-02-08
2	900	2016-05-05	5999-12-31

Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.

What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

UPDATE … FROM … 触发函数内部无法按预期工作？

问题

答案1

问题

多行`INSERT`

仅单行`INSERT`

Problems

Multi-row `INSERT`

Only sinlge-row `INSERT`

答案2

In back end (java spring boot) sorting, how to sort with Alias name using pageable? Without parent table as prefix

如何修复Golang中的SQL字符串转换扫描错误

“`python regexp_replace(COALESCE(“今天 | 是 | 好 | 日”, ”), ‘\\|’, ‘>’) “`

为什么Spring JPA的双向一对多和多对一关系不能更新外键列呢？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

问题

多行INSERT

仅单行INSERT

Problems

Multi-row INSERT

Only sinlge-row INSERT

答案2

发表评论

多行`INSERT`

仅单行`INSERT`

Multi-row `INSERT`

Only sinlge-row `INSERT`