重分区现有表格在线

huangapple go评论116阅读模式
英文:

Repartitioning existing table online

问题

由于这个脚本是一个单独的事务,所以即使数据已成功迁移到新创建的分区,表'solutions'在迁移过程中对终端用户仍然不可用,即使是SELECT查询也会等待锁定。这可能是因为整个脚本是一个事务,因此在detach_partition('solutions_legacy')之后或者在这个过程的最后尝试提交更改时,大多数情况下会出现以下错误:

ERROR: invalid transaction termination

您已经尝试在一些地方添加COMMIT,以及尝试使用BEGIN/END语句,但没有成功。

解决这个问题的正确方法是使用事务控制。您可以将整个脚本封装在一个事务块内,并使用COMMIT和ROLLBACK语句来明确定义事务的开始和结束。以下是修改后的代码:

-- 开始一个事务块
BEGIN;

-- 在事务块内执行你的操作
do
$$
declare 
  -- ... (你的代码保持不变)
begin
  -- ... (你的代码保持不变)
end;
$$ LANGUAGE plpgsql;

-- 提交事务
COMMIT;

在这个修改后的版本中,整个脚本被包装在一个事务块内。BEGIN; 标记了事务的开始,而COMMIT; 标记了事务的结束。如果在脚本的执行过程中出现任何错误,您可以使用ROLLBACK语句回滚事务,以确保数据的一致性。

英文:

Table schema

I have a partitioned table 'solutions' partitioned by day as follows:

solutions:

  • solutions_20230115

  • solutions_20230116

  • solutions_20230117

    ...

  • solutions_20230314

  • solutions_20230315

  • solutions_legacy

New partition 'solutuons_yyyymmdd' is created every day for new data.
Partition 'solutions_legacy' stores old data for the period before table 'solutions' was partitioned, currently from MINVALUE to '2023-01-15'.

My goal

I have to create new partitions for the old data and migrate the data from partition 'solutions_legacy' to newly created partitions.

My difficulties

The problem is I have to do it online so that table 'solutions' (at least it's recent partitions) was available for the end users while the data are being migrated.

My solution

The idea is to detach 'solutions_legacy' from 'solutions' and work with it as a separate table while 'solutions' is available for end users. So here is my approach:

  • detach 'solutions_legacy'
  • create new table 'solutions_yyyymmdd'
  • migrate corresponding data from 'solutuons_legacy' to 'solutions_yyyymmdd'
  • attach both tables as partitions to 'solutions'

This way I suppose to have some downtime only on the first step while detaching 'solutions_legacy'.

My code

create or replace PROCEDURE attach_partition(p_name text, p_start date default '2020-01-01', p_end date default '2099-01-01') 
as $$
begin
  RAISE NOTICE '%: Attaching partition ''%''...', clock_timestamp(), p_name;
  execute format(
    'ALTER TABLE solutions ATTACH PARTITION %I
    FOR VALUES FROM (%L) TO (%L)', p_name, p_start, p_end);
  RAISE NOTICE '% ''%'' has been attached', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE detach_partition(p_name text)
as $$
begin
  RAISE NOTICE '%: Detaching ''%''...', clock_timestamp(), p_name;
  execute format('ALTER TABLE solutions DETACH PARTITION %I', p_name);
  RAISE NOTICE '%: ''%'' has been detached', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE create_table(p_name text, p_start date, p_end date)
as $$
begin
  RAISE NOTICE '%: Creating new table ''%'' for values from % to %...', clock_timestamp(), p_name, p_start, p_end;
  execute format('CREATE TABLE IF NOT EXISTS %I
    (LIKE solutions INCLUDING DEFAULTS)', p_name);
  RAISE NOTICE '%: Table ''%'' has been created', clock_timestamp(), p_name;
end;
$$ LANGUAGE plpgsql;

create or replace PROCEDURE migrate_data(p_name text, p_start date, p_end date)
as $$
begin
  RAISE NOTICE '%: Migrating data to ''%''...', clock_timestamp(), p_name;
  execute format('
    WITH moved_rows AS (
      DELETE FROM public.solutions_legacy sl
      WHERE created_at >= %L
	    and created_at < %L
      RETURNING sl.*
    )
    INSERT INTO %I
    SELECT * FROM moved_rows', p_start, p_end, p_name);
  RAISE NOTICE '%: Data has been migrated', clock_timestamp();
end;
$$ LANGUAGE plpgsql;


do
$$
declare 
  new_partitions_count int := 3;
  rec record;
  partition_start date;
  partition_end   date;
  partition_name  text;
begin
  call detach_partition('solutions_legacy');
  -- !!! HERE IS WHERE I TRY TO COMMIT
  RAISE NOTICE '%: Querying ''solutions_legacy'' to determine % new partitions...', clock_timestamp(), new_partitions_count;
  for rec in 
    select distinct date_trunc('day', created_at)::date d 
    from public.solutions_legacy 
    order by d desc limit new_partitions_count
  loop
    partition_start := rec.d;
    partition_end   := rec.d + 1;
    partition_name  := 'solutions_' || TO_CHAR(partition_start, 'yyyymmdd') ;
    call create_table(partition_name, partition_start, partition_end);
    call migrate_data(partition_name, partition_start, partition_end);
    call attach_partition(partition_name, partition_start, partition_end);
  end loop;
  call attach_partition('solutions_legacy', p_end => partition_start);
end;
$$ LANGUAGE plpgsql;

Here is an example log it producers for better understanding:

NOTICE:  2023-03-15 10:28:02.306654+00: Detaching 'solutions_legacy'...
NOTICE:  2023-03-15 10:31:07.60235+00: 'solutions_legacy' has been detached
NOTICE:  2023-03-15 10:31:07.602576+00: Querying 'solutions_legacy' to determine 3 new partitions...
NOTICE:  2023-03-15 10:33:29.410183+00: Creating new table 'solutions_20230114' for values from 2023-01-14 to 2023-01-15...
NOTICE:  2023-03-15 10:33:29.416459+00: Table 'solutions_20230114' has been created
NOTICE:  2023-03-15 10:33:29.417196+00: Migrating data to 'solutions_20230114'...
NOTICE:  2023-03-15 10:36:14.731551+00: Data has been migrated
NOTICE:  2023-03-15 10:36:14.732235+00: Attaching partition 'solutions_20230114'...
NOTICE:  2023-03-15 10:37:51.100635+00 'solutions_20230114' has been attached
NOTICE:  2023-03-15 10:37:51.100761+00: Creating new table 'solutions_20230113' for values from 2023-01-13 to 2023-01-14...
NOTICE:  2023-03-15 10:37:51.102272+00: Table 'solutions_20230113' has been created
NOTICE:  2023-03-15 10:37:51.102315+00: Migrating data to 'solutions_20230113'...
NOTICE:  2023-03-15 10:40:10.749074+00: Data has been migrated
NOTICE:  2023-03-15 10:40:10.749182+00: Attaching partition 'solutions_20230113'...
NOTICE:  2023-03-15 10:41:47.559556+00 'solutions_20230113' has been attached
NOTICE:  2023-03-15 10:41:47.559661+00: Creating new table 'solutions_20230112' for values from 2023-01-12 to 2023-01-13...
NOTICE:  2023-03-15 10:41:47.561249+00: Table 'solutions_20230112' has been created
NOTICE:  2023-03-15 10:41:47.561291+00: Migrating data to 'solutions_20230112'...
NOTICE:  2023-03-15 10:44:07.43721+00: Data has been migrated
NOTICE:  2023-03-15 10:44:07.43729+00: Attaching partition 'solutions_20230112'...
NOTICE:  2023-03-15 10:45:41.37163+00 'solutions_20230112' has been attached
NOTICE:  2023-03-15 10:45:41.371922+00: Attaching partition 'solutions_legacy'...
NOTICE:  2023-03-15 10:47:56.991827+00 'solutions_legacy' has been attached
DO

Query returned successfully in 20 min.

My problem

Even though the data is successfully migrated to newly created partitions, table 'solutions' is not available for the end users during the migration process, even SELECT queries are waiting for the Lock. I suppose that's because the whole script is a single transaction. So, I try to commit the changes right after the call to detach_partition('solutions_legacy') or at the very end of this procedure, yet in most cases I get the error

> ERROR: invalid transaction termination

I have tried adding COMMIT in some places as well as playing with BEGIN/END statements.

What is the right approach to this problem?

> PostgreSQL 15.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r4) 12.2.1 20220924, 64-bit

答案1

得分: 0

DO块仅在未包裹在事务中时才允许进行事务控制。

我建议在事务之外执行DO块,然后你的commit语句应该按计划工作。

编辑:请参阅https://www.postgresql.org/docs/current/sql-do.html#id-1.9.3.102.8 获取文档。

英文:

DO blocks allow for transaction control if and only if you didn't wrap it inside a transaction.

I'd advise to execute the do block outside of a transaction, and your commit statement should work as planned.

EDIT: see https://www.postgresql.org/docs/current/sql-do.html#id-1.9.3.102.8 for documentation.

huangapple
  • 本文由 发表于 2023年3月15日 18:53:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743721.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定