英文:
Identifying changes to data over time
问题
使用 Oracle 数据库 11.2。
问题:比较来自两个源的数据并仅显示差异。
我正在寻找一种非常巧妙的解决方案,以自动化这个比较过程,用于数百个表,每个表都有数百列,这将在Crystal Reports中的报表查询的上下文中工作。是的,我考虑过我在某个地方走错了路(不过,不是 Crystal Reports 的部分。我被困在那里)。在这一点之后的描述都是无意义的。
暂时搁置有关查询或报表性能的想法。我打算强制使用筛选器来限制单个请求中处理的数据量。我在这里问的是如何使这个通用化。换句话说,我不想在查询代码中列出任何特定的列,除了可能用于区分已知的分组或查找列 - updated_by、updated_date 等。我希望查询能够自动收集这些名称。
出于简化的目的,假设我想根据筛选条件比较表中分组内相邻行的数据。以下是简化的示例输入数据:
with source_data as (
select 'a' grp
, 'b' b
, 'c' c
, date '2022-12-01' record_date
, 'joe' updated_by
from dual
union all
select 'a'
, 'b'
, 'd'
, date '2022-12-02'
, 'sally' updated_by
from dual
union all
select 'a'
, 'a'
, 'd'
, date '2022-12-04'
, 'joe' updated_by
from dual
union all
select 'z' a
, 'b' b
, 'c' c
, date '2022-12-01'
, 'joe' updated_by
from dual
union all
select 'z'
, 'e'
, 'c'
, date '2022-12-08'
, 'joe' updated_by
from dual
union all
select 'z'
, 'f'
, 'c'
, date '2022-12-09'
, 'sally' updated_by
from dual
)
GRP | B | C | RECORD_DATE | UPDATED_BY |
---|---|---|---|---|
a | b | c | 2022-12-01 00:00:00 | joe |
a | b | d | 2022-12-02 00:00:00 | sally |
a | a | d | 2022-12-04 00:00:00 | joe |
z | b | c | 2022-12-01 00:00:00 | joe |
z | e | c | 2022-12-08 00:00:00 | joe |
z | f | c | 2022-12-09 00:00:00 | sally |
需要查看某些类别的人所做的更改。以这个示例为例,假设 Sally 是该组的成员,而 Joe 不是。所以,我关心的只是第2行和第6行的更改。但我需要将每一行与前一行进行比较,因此...
,
changed as (
select sd.*
from source_data sd
where updated_by = 'sally'
),
changes as (
select 'current' as status
, c.*
from changed c
union all
select 'previous'
, sd.grp
, sd.b
, sd.c
, c.record_date
, c.updated_by
from source_data sd
inner join changed c on c.grp = sd.grp
and sd.record_date = (select max(record_date) from source_data where grp = c.grp and record_date < c.record_date)
)
这个简单示例的输出似乎相对简单。但当我有数百行数百列要比较时,要识别变化并不容易。
我有许多需要比较的表,它们都有相同的问题。其中许多表有数百列。通常,差异只在其中一列或几列中。
这将在一个报表中完成。我没有权限创建函数或存储过程,所以我怀疑我无法以任何方式使用动态SQL。这可能有类似于开发视图的约束。
我没有使用 PL/SQL。(有点疲倦于几乎每个与我的搜索相关的 Oracle 问题都与 PL/SQL 有关,但无法筛选掉它们。)
我认为为了比较数据,我首先需要将其解开以在行上获取列/值对...
(构建在此问题的答案基础上:https://stackoverflow.com/questions/19124891/oracle-unpivot-columns-to-rows)
, unpivot as (
Select *
From (
Select grp
, status
, record_date
, updated_by
, Case When C.lvl = 1 Then 'B'
When C.lvl = 2 Then 'C'
End col
, Case When C.lvl = 1 Then coalesce(B, '<null>')
When C.lvl = 2 Then coalesce(C, '<null>')
End val
From changes
cross join (
select level lvl
from dual
connect by level <= 2
) c
)
where val is not null
order by 1, 3, 2 desc
)
(是的,对于复杂的数据,我需要将进入 val
的数据转换为更通用的类型,比如字符串。)
但如何编程确定列的数量、列顺序,并为col
值的 CASE 语句以及val
的引用生成列名称呢?
我想我可以将这样的查询作为解决方案的一部分:
SELECT COLUMN_NAME
, COLUMN_ID
FROM ALL_tab_columns
WHERE OWNER = 'MY
<details>
<summary>英文:</summary>
Using Oracle Database 11.2.
Problem: Compare data from two sources and show only the differences.
I'm looking for some really slick solution to automate this comparison for hundreds of tables, each with hundreds of columns, that will work within the context of a query in a report developed in Crystal Reports. And, yes, I have considered that I took a wrong turn somewhere (Not the Crystal Reports part, though. I'm stuck with that.) and everything in this description after that point is meaningless.
Set aside thoughts about query or report performance. I intend to force filters to limit the amount of data that could be processed in a single request. What I'm asking about here is how to make this generic. In other words, I don't want to list any specific columns in my query code except, maybe, to distinguish between known grouping or lookup columns -- updated_by, updated_date, etc. I want to have queries that automatically gather those names for me.
For the sake of simplicity, let's say I want to compare data, based on filter criteria, from adjacent rows within a grouping in a table. Here is simplified example input data:
with source_data as (
select 'a' grp
, 'b' b
, 'c' c
, date '2022-12-01' record_date
, 'joe' updated_by
from dual
union all
select 'a'
, 'b'
, 'd'
, date '2022-12-02'
, 'sally' updated_by
from dual
union all
select 'a'
, 'a'
, 'd'
, date '2022-12-04'
, 'joe' updated_by
from dual
union all
select 'z' a
, 'b' b
, 'c' c
, date '2022-12-01'
, 'joe' updated_by
from dual
union all
select 'z'
, 'e'
, 'c'
, date '2022-12-08'
, 'joe' updated_by
from dual
union all
select 'z'
, 'f'
, 'c'
, date '2022-12-09'
, 'sally' updated_by
from dual
)
|GRP|B|C|RECORD_DATE|UPDATED_BY|
|---|---|---|---|---|
|a|b|c|2022-12-01 00:00:00|joe|
|a|b|d|2022-12-02 00:00:00|sally|
|a|a|d|2022-12-04 00:00:00|joe|
|z|b|c|2022-12-01 00:00:00|joe|
|z|e|c|2022-12-08 00:00:00|joe|
|z|f|c|2022-12-09 00:00:00|sally|
The need is to see what changes were made by people in certain categories. For this example, let's say Sally is a member of that group and Joe is not. So, the only changes I care about are on rows 2 and 6. But I need to compare each to the previous row, so...
,
changed as (
select sd.*
from source_data sd
where updated_by = 'sally'
),
changes as (
select 'current' as status
, c.*
from changed c
union all
select 'previous'
, sd.grp
, sd.b
, sd.c
, c.record_date
, c.updated_by
from source_data sd
inner join changed c on c.grp = sd.grp
and sd.record_date = (select max(record_date) from source_data where grp = c.grp and record_date < c.record_date)
)
Output from this trivial example seems simple enough. But when I have hundreds of rows by hundreds of columns to compare, it's not so easy to identify the change.
I have many tables to compare that have the same issue. Many of the tables have hundreds of columns. Usually, the difference is in only one or a few of the columns.
This will be done in a report. I don't have access to create functions or stored procedures, so I doubt I can use dynamic SQL in any way. This likely has constraints similar to developing a view.
I am NOT using PL/SQL. (Kinda tired of nearly every Oracle question related to my searches on SO having some relationship to PL/SQL, but no way to filter those out.)
I was thinking that in order to compare the data I'll first want to unpivot it to get a column/value pair on a row...
(Building on the answer to this question: https://stackoverflow.com/questions/19124891/oracle-unpivot-columns-to-rows)
, unpivot as (
Select *
From (
Select grp
, status
, record_date
, updated_by
, Case When C.lvl = 1 Then 'B'
When C.lvl = 2 Then 'C'
End col
, Case When C.lvl = 1 Then coalesce(B, '<null>')
When C.lvl = 2 Then coalesce(C, '<null>')
End val
From changes
cross join (
select level lvl
from dual
connect by level <= 2
) c
)
where val is not null
order by 1, 3, 2 desc
)
(Yes, for non-trivial data I'll need to cast the data going into `val` to something more generic, like a string.)
But **how do I programmatically determine the number of columns, the column order, and generate the column names** for both the value of `col` and for the reference for the `CASE` statement in `val`?
I suppose I could use something like this as part of the solution:
SELECT COLUMN_NAME
, COLUMN_ID
FROM ALL_tab_columns
WHERE OWNER = 'MY_OWNER_NAME'
AND TABLE_NAME = 'SOURCE_TABLE'
ORDER BY COLUMN_ID
But I'm not sure how to dovetail that into the solution in a meaningful way without involving dynamic SQL, which I'm pretty sure I can't do. And it would probably require referencing columns based on ordinal position, which doesn't appear to be possible in SQL. Of course, if that would work I could use a similar query to figure out how to handle data types for the `val` column.
Then I need to pivot that to show the before and after values in different columns. Then I can filter that to only what changed.
,
pivot as (
select grp
, record_date
, col
, updated_by
, max("'previous'") val_prev
, max("'current'") val_curr
from unpivot
pivot (
max(val)
for status
in (
'previous',
'current'
)
)
group by grp
, record_date
, col
, updated_by
)
select grp
, record_date
, col
, updated_by
, val_prev
, val_curr
from pivot
where val_curr <> val_prev
order by grp
, record_date
|GRP|RECORD_DATE|COL|UPDATED_BY|VAL_PREV|VAL_CURR|
|---|---|---|---|---|---|
|a|2022-12-02 00:00:00|C|sally|c|d|
|z|2022-12-09 00:00:00|B|sally|e|f|
</details>
# 答案1
**得分**: 1
以下是翻译好的部分:
"with" 子句中的 SQL 宏示例是在 Oracle 数据库 21.3 上运行的示例,该示例展示了如何使用 SQL 宏来实现您想要的功能。通过此示例,您可以将未列出的列转换为行,使用以下 SQL 宏函数:
```sql
with function unpivot_macro (
tab dbms_tf.table_t,
keep_cols dbms_tf.columns_t
) return clob sql_macro is
sql_stmt clob;
unpivot_list clob;
select_list clob;
begin
for col in tab.column.first .. tab.column.last loop
if tab.column ( col ).description.name
not member of keep_cols then
unpivot_list := unpivot_list ||
', ' || tab.column ( col ).description.name;
end if;
select_list := select_list ||
', to_char (' || tab.column ( col ).description.name || ') as ' ||
tab.column ( col ).description.name;
end loop;
sql_stmt :=
'select * from (
select ' || trim ( both ',' from select_list ) || ' from tab
)
unpivot (
val for col
in ( ' || trim ( both ',' from unpivot_list ) || ' )
)
where status is not null';
return sql_stmt;
end unpivot_macro;
source_data as (
select 'a' grp, 'b' b, 'c' c, date '2022-12-01' record_date, 'joe' updated_by
from dual union all
select 'a', 'z', 'd', date '2022-12-02', 'sally' updated_by
from dual union all
select 'a', 'a', 'd', date '2022-12-04', 'joe' updated_by
from dual union all
select 'z' a, 'b' b, 'c' c, date '2022-12-01', 'joe' updated_by
from dual union all
select 'z', 'e', 'c', date '2022-12-08', 'joe' updated_by
from dual union all
select 'z', 'f', 'c', date '2022-12-09', 'sally' updated_by
from dual
), changes as (
select s.grp, b, c,
'sally' updated_by,
case
when updated_by = 'sally' then record_date
else lead ( record_date ) over ( partition by grp order by record_date )
end record_date,
case
when updated_by = 'sally' then 'current'
when lead ( updated_by ) over ( partition by grp order by record_date ) = 'sally'
then 'previous'
end status
from source_data s
)
select * from unpivot_macro (
changes, columns ( grp, record_date, updated_by, status )
)
pivot (
max ( val ) for status
in ( 'previous' prev_val, 'current' curr_val )
)
where prev_val <> curr_val;
以上示例展示了如何在 Oracle 数据库中使用 SQL 宏来执行动态 unpivot 操作。这允许您将未列出的列转换为行,以实现所需的功能。
英文:
You can't do this with pure SQL alone. But you can achieve what you want in a single statement using SQL macros - provided you're on an up-to-date version of Oracle Database.
This is an example of a dynamic unpivot macro that converts all the unlisted columns to rows:
create or replace function unpivot_macro (
tab dbms_tf.table_t,
keep_cols dbms_tf.columns_t
) return clob sql_macro is
sql_stmt clob;
unpivot_list clob;
select_list clob;
begin
for col in tab.column.first .. tab.column.last loop
if tab.column ( col ).description.name
not member of keep_cols then
unpivot_list := unpivot_list ||
',' || tab.column ( col ).description.name;
end if;
select_list := select_list ||
', to_char (' || tab.column ( col ).description.name || ') as ' ||
tab.column ( col ).description.name;
end loop;
sql_stmt :=
'select * from (
select ' || trim ( both ',' from select_list ) || ' from tab
)
unpivot (
val for col
in ( ' || trim ( both ',' from unpivot_list ) || ' )
)';
return sql_stmt;
end unpivot_macro;
/
select * from unpivot_macro (
source_data, columns ( grp, updated_by, record_date )
);
GRP RECORD_DATE UPDATED_BY COL VAL
a 01-DEC-2022 00:00 joe B b
a 01-DEC-2022 00:00 joe C c
a 02-DEC-2022 00:00 sally B z
a 02-DEC-2022 00:00 sally C d
a 04-DEC-2022 00:00 joe B a
a 04-DEC-2022 00:00 joe C d
...
If the reason for avoiding PL/SQL is you don't have permission to create functions, you can place the macro in the with
clause.
Here's an example running on 21.3:
with function unpivot_macro (
tab dbms_tf.table_t,
keep_cols dbms_tf.columns_t
) return clob sql_macro is
sql_stmt clob;
unpivot_list clob;
select_list clob;
begin
for col in tab.column.first .. tab.column.last loop
if tab.column ( col ).description.name
not member of keep_cols then
unpivot_list := unpivot_list ||
',' || tab.column ( col ).description.name;
end if;
select_list := select_list ||
', to_char (' || tab.column ( col ).description.name || ') as ' ||
tab.column ( col ).description.name;
end loop;
sql_stmt :=
'select * from (
select ' || trim ( both ',' from select_list ) || ' from tab
)
unpivot (
val for col
in ( ' || trim ( both ',' from unpivot_list ) || ' )
)
where status is not null';
return sql_stmt;
end unpivot_macro;
source_data as (
select 'a' grp, 'b' b, 'c' c, date '2022-12-01' record_date, 'joe' updated_by
from dual union all
select 'a', 'z', 'd', date '2022-12-02', 'sally' updated_by
from dual union all
select 'a', 'a', 'd', date '2022-12-04', 'joe' updated_by
from dual union all
select 'z' a, 'b' b, 'c' c, date '2022-12-01', 'joe' updated_by
from dual union all
select 'z', 'e', 'c', date '2022-12-08', 'joe' updated_by
from dual union all
select 'z', 'f', 'c', date '2022-12-09', 'sally' updated_by
from dual
), changes as (
select s.grp, b, c,
'sally' updated_by,
case
when updated_by = 'sally' then record_date
else lead ( record_date ) over ( partition by grp order by record_date )
end record_date,
case
when updated_by = 'sally' then 'current'
when lead ( updated_by ) over ( partition by grp order by record_date ) = 'sally'
then 'previous'
end status
from source_data s
)
select * from unpivot_macro (
changes, columns ( grp, record_date, updated_by, status )
)
pivot (
max ( val ) for status
in ( 'previous' prev_val, 'current' curr_val )
)
where prev_val <> curr_val;
G UPDAT RECORD_DATE C P C
- ----- ------------------ - - -
a sally 02-DEC-22 B b z
a sally 02-DEC-22 C c d
z sally 09-DEC-22 B e f
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论