英文:
Compare two tables and update the mismatch column in Oracle SQL
问题
我有一个需求,需要比较两个表,并在出现不匹配的情况下更新另一个表。表格 LEGACY_CUST_INFO 包含正确的 CUST_PRIVILEGE_NUMBER。因此,我需要找出第二个表格 CUST_INFO 中错误的 CUST_PRIVILEGE_NUMBER,并使用第一个表格 LEGACY_CUST_INFO 中正确的 CUST_PRIVILEGE_NUMBER 进行更新。两个表格中的 ROW_ID 匹配是精确匹配。两个表格都包含大约 2000 万条记录。
以下是两个表格的结构:
表格 1 名称:LEGACY_CUST_INFO
ROW_ID CUST_PRIVILEGE_NUMBER
表格 2 名称:CUST_INFO
ROW_ID CUST_PRIVILEGE_NUMBER
当我使用下面的语句进行检查时,我遇到了性能问题。是否有更好的方法来创建一个包含不匹配记录并更新 CUST_INFO 表中的 CUST_PRIVILEGE_NUMBER 列的表格?
以下是我的代码和逻辑。
- 我创建了一个名为 EXACT_RECORDS 的表格,其中包含精确匹配的记录。
- 从 LEGACY_CUST_INFO 表格中删除了精确匹配的记录。
- 根据 LEGACY_CUST_INFO 表格更新了 CUST_PRIVILEGE_NUMBER。
CREATE TABLE EXACT_RECORDS
AS
(SELECT A.* FROM LEGACY_CUST_INFO A, CUST_INFO B WHERE A.ROW_ID = B.ROW_ID AND A.CUST_PRIVILEGE_NUMBER = B.CUST_PRIVILEGE_NUMBER);
DELETE FROM LEGACY_CUST_INFO WHERE ROW_ID IN (SELECT ROW_ID FROM EXACT_RECORDS);
UPDATE CUST_INFO B SET B.CUST_PRIVILEGE_NUMBER = (SELECT A.CUST_PRIVILEGE_NUMBER FROM LEGACY_CUST_INFO A WHERE A.ROW_ID = B.ROW_ID);
我知道这个逻辑非常基本。你能否帮助找到更好的方法,以便在不到 30 分钟内更新 2000 万条记录?
英文:
I have a requirement to compare with one table and update another table if there are mismatches. Table LEGACY_CUST_INFO has correct CUST_PRIVILEGE_NUMBER. So, I need to findout the wrong CUST_PRIVILEGE_NUMBER in second table CUST_INFO and update with the right CUST_PRIVILEGE_NUMBER from first table LEGACY_CUST_INFO. ROW_ID in both the tables match are an exact match. Both tables have approximately 20 million records.
Below are the two table structures
Table 1 Name : LEGACY_CUST_INFO
ROW_ID CUST_PRIVILEGE_NUMBER
Table 2 Name : CUST_INFO
ROW_ID CUST_PRIVILEGE_NUMBER
I am getting performance issues when I check with the below statements. Is there a better way to create a table with mismatch records and update the column CUST_PRIVILEGE_NUMBER in CUST_INFO table ?
Below is my code and logic.
- I am creating a table - EXACT_RECORDS with exact match records.
- Deleting the exact match records from LEGACY_CUST_INFO table.
- Updating the CUST_PRIVILEGE_NUMBER based on LEGACY_CUST_INFO table.
CREATE TABLE EXACT_RECORDS
AS
(SELECT A.* FROM LEGACY_CUST_INFO A, CUST_INFO B WHERE A.ROW_ID = B.ROW_ID AND A.CUST_PRIVILEGE_NUMBER = B.CUST_PRIVILEGE_NUMBER);
DELETE FROM LEGACY_CUST_INFO WHERE ROW_ID IN (SELECT ROW_ID FROM EXACT_RECORDS);
UPDATE CUST_INFO B SET B.CUST_PRIVILEGE_NUMBER = (SELECT A.CUST_PRIVILEGE_NUMBER FROM LEGACY_CUST_INFO A WHERE A.ROW_ID = B.ROW_ID);
I know this logic is very basic. Could you help in better way so that 20 millions records are updated in less than 30 minutes ?
答案1
得分: 1
我建议使用包含查询的合并语句,返回准备好的未匹配记录以进行更新。
merge into cust_info dst
using (
select l.row_id, l.cust_privilege_number
from legacy_cust_info lci
inner join cust_info ci on ci.row_id = lci.row_id
where ci.cust_privilege_number != lci.cust_privilege_number
) src
on (dst.row_id = src.row_id)
when matched then
update set dst.cust_privilege_number = src.cust_privilege_number;
还要考虑为您的表创建索引:
CREATE INDEX legacy_cust_info#rid#cpn#idx ON LEGACY_CUST_INFO (row_id, cust_privilege_number);
CREATE INDEX cust_info#row_id#idx ON CUST_INFO (row_id);
更新:
您可以通过运行嵌套查询来估算时间,而不将其内容合并到目标表中:
select count(1) from (
select l.row_id, l.cust_privilege_number
from legacy_cust_info lci
inner join cust_info ci on ci.row_id = lci.row_id
where ci.cust_privilege_number != lci.cust_privilege_number
)
更新2:要考虑在 ci.cust_privilege_number
中处理 NULL 值,只需将备用条件添加到查询中:
where (
ci.cust_privilege_number != lci.cust_privilege_number
or (
ci.cust_privilege_number is null
and lci.cust_privilege_number is not null
)
)
英文:
I suggest to using a merge statement containing a query, returning prepared not matched records for update.
merge into cust_info dst
using (
select l.row_id, l.cust_privilege_number
from legacy_cust_info lci
inner join cust_info ci on ci.row_id = lci.row_id
where ci.cust_privilege_number != lci.cust_privilege_number
) src
on (dst.row_id = src.row_id)
when matched then
update set dst.cust_privilege_number = src.cust_privilege_number;
Also consider create indexes for your tables:
CREATE INDEX legacy_cust_info#rid#cpn#idx ON LEGACY_CUST_INFO (row_id, cust_privilege_number);
CREATE INDEX cust_info#row_id#idx ON CUST_INFO (row_id);
UPD.
You can estimate the time by running the nested select without merging its contents into the destination table:
select count(1) from (
select l.row_id, l.cust_privilege_number
from legacy_cust_info lci
inner join cust_info ci on ci.row_id = lci.row_id
where ci.cust_privilege_number != lci.cust_privilege_number
)
UPD2. To considering NULLs in ci.cust_privilege_number
simply add alternative condition into the query:
where (
ci.cust_privilege_number != lci.cust_privilege_number
or (
ci.cust_privilege_number is null
and lci.cust_privilege_number is not null
)
)
答案2
得分: 0
以下是翻译好的部分:
Indexing helps to retrieve data faster. Check the below query: -
对于 row_ID
创建索引 idx_legacy_cust_info_row_id 于 LEGACY_CUST_INFO (ROW_ID);
创建索引 idx_cust_info_row_id 于 CUST_INFO (ROW_ID);
对于使用两个表之间的连接来更新不匹配的 CUST_PRIVILEGE_NUMBER 的单个更新语句,请检查以下查询: -
更新 CUST_INFO CI
设置 CI.CUST_PRIVILEGE_NUMBER = (
选择 LCI.CUST_PRIVILEGE_NUMBER
从 LEGACY_CUST_INFO LCI
WHERE LCI.ROW_ID = CI.ROW_ID
)
WHERE CI.CUST_PRIVILEGE_NUMBER <> (
选择 LCI.CUST_PRIVILEGE_NUMBER
从 LEGACY_CUST_INFO LCI
WHERE LCI.ROW_ID = CI.ROW_ID
);
英文:
Indexing helps to retrieve data faster. Check the below query: -
For row_ID
CREATE INDEX idx_legacy_cust_info_row_id ON LEGACY_CUST_INFO (ROW_ID);
CREATE INDEX idx_cust_info_row_id ON CUST_INFO (ROW_ID);
For single update statement using a join between the two tables to update the mismatched CUST_PRIVILEGE_NUMBER. check the following query: -
UPDATE CUST_INFO CI
SET CI.CUST_PRIVILEGE_NUMBER = (
SELECT LCI.CUST_PRIVILEGE_NUMBER
FROM LEGACY_CUST_INFO LCI
WHERE LCI.ROW_ID = CI.ROW_ID
)
WHERE CI.CUST_PRIVILEGE_NUMBER <> (
SELECT LCI.CUST_PRIVILEGE_NUMBER
FROM LEGACY_CUST_INFO LCI
WHERE LCI.ROW_ID = CI.ROW_ID
);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论