英文:
delete few records from multiple partitions in oracle
问题
- 我有一个包含 4000 万条记录的表 t_orders。
- 这个表按 trans_dt 每天分区。
- 我想根据 p_code='OTC' 删除约 2500 万条记录。
- 我想采用下面的方法,这似乎更快。有没有办法根据多个分区进行删除,比如使用 2 个分区 (SYS_P20209, SYS_P20345)。使用一个分区的方式已经可以正常工作。
ALTER SESSION ENABLE PARALLEL DML;
DELETE t_orders PARTITION (SYS_P20209, SYS_P20345) WHERE p_code = 'OTC';
COMMIT;
谢谢您提前的帮助,
卡什
英文:
- I have a table t_orders containing 40mn records .
- The table is partitioned daily on trans_dt .
- I want to delete around 25 mn records based on p_code='OTC'
- I want to go with the below approach which seems to be faster. Is there any way I could delete based on multiple partitions like below? like using 2 partition(SYS_P20209,SYS_P20345) . with using one partition it is working fine.
ALTER SESSION ENABLE PARALLEL DML;
DELETE t_orders PARTITION(SYS_P20209,SYS_P20345) WHERE p_code = 'OTC';
COMMIT;
Thanks in advance,
Kashi
答案1
得分: 0
你想从总共4,000万行中删除2,500万行,即整个表的62%?你不想使用DELETE
,因为撤消和重做的量会非常大,如果操作失败或中止,可能会导致非常长的回滚时间。处理这种情况的正确方法是重新创建段。有几种方法可以做到:
-
使用
CREATE TABLE AS SELECT
(CTAS)创建一个新表,其中只包含要保留的数据。 CTAS允许使用分区语法,因此您可以在创建时进行分区,而不是预先创建表,或稍后进行修改。然后将旧表重命名,将新表重命名为旧表,应用旧表上的任何授权、索引等等,然后完成操作。这是最有效的方法,无需脚本。 -
使用
CREATE TABLE AS SELECT
创建一个临时表,其中只包含要保留的数据。截断主表,启用并行DML,并从临时表插入追加到主表。这样做的缺点是数据会移动两次,但避免了重新定义依赖对象/授权等问题。 -
由于它是分区的,您可以编写一个PL/SQL循环遍历分区(使用
all_tab_partitions
查询获取分区名称),然后使用扩展分区命名语法逐个将每个分区的数据CTAS到临时表中,只移动要保留的数据。在完成后,使用ALTER TABLE EXCHANGE PARTITION
来交换旧分区和临时表。完成后重建索引。这与方法#1一样有效,还避免了重新应用授权和重新定义索引等问题,但当然只适用于分区表。 -
最后,如果您绝对必须执行真正的
DELETE
(再次强烈不建议),以下是一些选项:4a. 禁用索引和触发器。启用并行DML并使用提示(没有提示,启用并行DML可能不会产生任何效果):
DELETE /*+ parallel(16) */ FROM t_orders...
。完成后,重新构建索引并启用触发器。4b. 编写一个PL/SQL循环,遍历分区名称,并使用扩展分区命名语法执行#4a中的提示并行DML删除,但只针对一个分区进行操作。在每个循环迭代中进行提交。这确保了所有并行线程均匀地按块范围分配工作量,而不是Oracle可能选择分区级分发,这可能导致并行化效果不佳。这还将回滚时间和未完成的空间需求保持在较合理的水平,因为您一次只对一个分区进行操作。但如果操作失败,您需要知道中断的位置,因为这是通过提交块而不是在一个事务中执行操作的。示例:
BEGIN EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML'; FOR rec_part IN (SELECT partition_name FROM user_tab_partitions WHERE table_name = 'T_ORDERS' ORDER BY partition_position) LOOP dbms_output.put_line('Working on '||rec_part.partition_name); EXECUTE IMMEDIATE 'DELETE /*+ parallel(16) */ FROM t_orders PARTITION ('||rec_part.partition_name||') WHERE p_code = ''OTC'''; COMMIT; END LOOP; END;
英文:
You want to delete 25 million out of 40 million, or 62% of your entire table? You do not want to use DELETE
, as the amount of undo and redo will be enormous, and if it fails or you abort it you could end up with a very long rollback time. The correct way to handle this is to recreate the segments. You can do this several ways:
-
Use a
CREATE TABLE AS SELECT
(CTAS) to create a new table with only the data you want to keep. CTAS allows partitioning syntax so you can partition it as is created rather than pre-creating the table, or altering it later. Then rename the old table, rename the new table to the old, apply any grants, indexes, etc.. on it that the old table had, and you're done. This is the most efficient method without scripting. -
Use a
CREATE TABLE AS SELECT
to create a temporary table with only the data you want to keep. Truncate the main table, enable parallel dml and insert append from the temp table back into the main table. This has the downside of moving the data twice, but avoids having to redefine dependent objects/grants, etc. -
Since it's partitioned, you can write a PL/SQL loop through the partitions (query
all_tab_partitions
to get the partition names) and one by one CTAS each partition using extended partition naming syntax to a temp table, moving only the data you want to keep. You can only work on one partition at a time when using extended partition syntax. Then use anALTER TABLE EXCHANGE PARTITION
to swap the old partition with the temp table. Rebuild indexes when you're done. This is as efficient as #1 plus avoids having to reapply grants and redefine indexes, etc.. but of course only works on partitioned tables. -
Lastly, if you absolutely, absolutely have to do a real
DELETE
(again, not recommended), here are some options:4a. Disable indexes and triggers. Enable parallel dml and hint (without the hint, enabling parallel dml may not do anything):
DELETE /*+ parallel(16) */ FROM t_orders...
. When done, rebuild indexes and reenable triggers.4b. Write a PL/SQL loop through the partition names and using extended partition naming syntax do the hinted, parallel dml delete in #4a but against only one partition. Commit each loop iteration. This ensures that all parallel threads distribute the workload evenly by block range rather than Oracle possibly opting for partition-wise distribution which can skew and result in less effective parallelization. It also keeps rollback times and space requirement for undo down to a more reasonable level since you are only doing one partition at a time. But if you fail you'll need to know where you left off as this is doing the operation in committed chunks rather than in one transaction. Example:
BEGIN EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML'; FOR rec_part IN (SELECT partition_name FROM user_tab_partitions WHERE table_name = 'T_ORDERS' ORDER BY partition_position) LOOP dbms_output.put_line('Working on '||rec_part.partition_name); EXECUTE IMMEDIATE 'DELETE /*+ parallel(16) */ FROM t_orders PARTITION ('||rec_part.partition_name||') WHERE p_code = ''OTC'''; COMMIT; END LOOP; END;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论