识别数据随时间的变化

huangapple go评论58阅读模式
英文:

Identifying changes to data over time

问题

使用 Oracle 数据库 11.2。

问题:比较来自两个源的数据并仅显示差异。

我正在寻找一种非常巧妙的解决方案,以自动化这个比较过程,用于数百个表,每个表都有数百列,这将在Crystal Reports中的报表查询的上下文中工作。是的,我考虑过我在某个地方走错了路(不过,不是 Crystal Reports 的部分。我被困在那里)。在这一点之后的描述都是无意义的。

暂时搁置有关查询或报表性能的想法。我打算强制使用筛选器来限制单个请求中处理的数据量。我在这里问的是如何使这个通用化。换句话说,我不想在查询代码中列出任何特定的列,除了可能用于区分已知的分组或查找列 - updated_by、updated_date 等。我希望查询能够自动收集这些名称。

出于简化的目的,假设我想根据筛选条件比较表中分组内相邻行的数据。以下是简化的示例输入数据:

with source_data as (
  select 'a' grp
  , 'b' b
  , 'c' c
  , date '2022-12-01' record_date
  , 'joe' updated_by
  from dual
  union all
  select 'a'
  , 'b'
  , 'd'
  , date '2022-12-02'
  , 'sally' updated_by
  from dual
  union all
  select 'a'
  , 'a'
  , 'd'
  , date '2022-12-04'
  , 'joe' updated_by
  from dual
  union all
  select 'z' a
  , 'b' b
  , 'c' c
  , date '2022-12-01'
  , 'joe' updated_by
  from dual
  union all
  select 'z'
  , 'e'
  , 'c'
  , date '2022-12-08'
  , 'joe' updated_by
  from dual
  union all
  select 'z'
  , 'f'
  , 'c'
  , date '2022-12-09'
  , 'sally' updated_by
  from dual
)
GRP B C RECORD_DATE UPDATED_BY
a b c 2022-12-01 00:00:00 joe
a b d 2022-12-02 00:00:00 sally
a a d 2022-12-04 00:00:00 joe
z b c 2022-12-01 00:00:00 joe
z e c 2022-12-08 00:00:00 joe
z f c 2022-12-09 00:00:00 sally

需要查看某些类别的人所做的更改。以这个示例为例,假设 Sally 是该组的成员,而 Joe 不是。所以,我关心的只是第2行和第6行的更改。但我需要将每一行与前一行进行比较,因此...

,
changed as (
  select sd.*
  from source_data sd
  where updated_by = 'sally'
),
changes as (
  select 'current' as status
  , c.*
  from changed c
  union all
  select 'previous'
  , sd.grp
  , sd.b
  , sd.c
  , c.record_date
  , c.updated_by
  from source_data sd
    inner join changed c on c.grp = sd.grp
                        and sd.record_date = (select max(record_date) from source_data where grp = c.grp and record_date < c.record_date)
)

这个简单示例的输出似乎相对简单。但当我有数百行数百列要比较时,要识别变化并不容易。

我有许多需要比较的表,它们都有相同的问题。其中许多表有数百列。通常,差异只在其中一列或几列中。

这将在一个报表中完成。我没有权限创建函数或存储过程,所以我怀疑我无法以任何方式使用动态SQL。这可能有类似于开发视图的约束。

我没有使用 PL/SQL。(有点疲倦于几乎每个与我的搜索相关的 Oracle 问题都与 PL/SQL 有关,但无法筛选掉它们。)

我认为为了比较数据,我首先需要将其解开以在行上获取列/值对...

(构建在此问题的答案基础上:https://stackoverflow.com/questions/19124891/oracle-unpivot-columns-to-rows)

, unpivot as (
        Select *
        From (
          Select grp
          , status
          , record_date
          , updated_by
          , Case When C.lvl = 1 Then 'B'
              When C.lvl = 2 Then 'C'
            End col
          , Case When C.lvl = 1 Then coalesce(B, '<null>')
              When C.lvl = 2 Then coalesce(C, '<null>')
            End val
          From changes
          cross join (
            select level lvl
            from dual
            connect by level <= 2
          ) c
        )
        where val is not null
        order by 1, 3, 2 desc
)

(是的,对于复杂的数据,我需要将进入 val 的数据转换为更通用的类型,比如字符串。)

如何编程确定列的数量、列顺序,并为col值的 CASE 语句以及val的引用生成列名称呢?

我想我可以将这样的查询作为解决方案的一部分:

SELECT COLUMN_NAME
, COLUMN_ID
 
FROM ALL_tab_columns
 
WHERE OWNER = 'MY

<details>
<summary>英文:</summary>

Using Oracle Database 11.2.


Problem:  Compare data from two sources and show only the differences.

I&#39;m looking for some really slick solution to automate this comparison for hundreds of tables, each with hundreds of columns, that will work within the context of a query in a report developed in Crystal Reports.  And, yes, I have considered that I took a wrong turn somewhere (Not the Crystal Reports part, though.  I&#39;m stuck with that.) and everything in this description after that point is meaningless.

Set aside thoughts about query or report performance.  I intend to force filters to limit the amount of data that could be processed in a single request.  What I&#39;m asking about here is how to make this generic.  In other words, I don&#39;t want to list any specific columns in my query code except, maybe, to distinguish between known grouping or lookup columns -- updated_by, updated_date, etc.  I want to have queries that automatically gather those names for me.

For the sake of simplicity, let&#39;s say I want to compare data, based on filter criteria, from adjacent rows within a grouping in a table.  Here is simplified example input data:

with source_data as (
select 'a' grp
, 'b' b
, 'c' c
, date '2022-12-01' record_date
, 'joe' updated_by
from dual
union all
select 'a'
, 'b'
, 'd'
, date '2022-12-02'
, 'sally' updated_by
from dual
union all
select 'a'
, 'a'
, 'd'
, date '2022-12-04'
, 'joe' updated_by
from dual
union all
select 'z' a
, 'b' b
, 'c' c
, date '2022-12-01'
, 'joe' updated_by
from dual
union all
select 'z'
, 'e'
, 'c'
, date '2022-12-08'
, 'joe' updated_by
from dual
union all
select 'z'
, 'f'
, 'c'
, date '2022-12-09'
, 'sally' updated_by
from dual
)


|GRP|B|C|RECORD_DATE|UPDATED_BY|
|---|---|---|---|---|
|a|b|c|2022-12-01 00:00:00|joe|
|a|b|d|2022-12-02 00:00:00|sally|
|a|a|d|2022-12-04 00:00:00|joe|
|z|b|c|2022-12-01 00:00:00|joe|
|z|e|c|2022-12-08 00:00:00|joe|
|z|f|c|2022-12-09 00:00:00|sally|


The need is to see what changes were made by people in certain categories.  For this example, let&#39;s say Sally is a member of that group and Joe is not.  So, the only changes I care about are on rows 2 and 6.  But I need to compare each to the previous row, so...

,
changed as (
select sd.*
from source_data sd
where updated_by = 'sally'
),
changes as (
select 'current' as status
, c.*
from changed c
union all
select 'previous'
, sd.grp
, sd.b
, sd.c
, c.record_date
, c.updated_by
from source_data sd
inner join changed c on c.grp = sd.grp
and sd.record_date = (select max(record_date) from source_data where grp = c.grp and record_date < c.record_date)
)


Output from this trivial example seems simple enough.  But when I have hundreds of rows by hundreds of columns to compare, it&#39;s not so easy to identify the change.

I have many tables to compare that have the same issue.  Many of the tables have hundreds of columns.  Usually, the difference is in only one or a few of the columns.

This will be done in a report.  I don&#39;t have access to create functions or stored procedures, so I doubt I can use dynamic SQL in any way.  This likely has constraints similar to developing a view.

I am NOT using PL/SQL.  (Kinda tired of nearly every Oracle question related to my searches on SO having some relationship to PL/SQL, but no way to filter those out.)

I was thinking that in order to compare the data I&#39;ll first want to unpivot it to get a column/value pair on a row...

(Building on the answer to this question:  https://stackoverflow.com/questions/19124891/oracle-unpivot-columns-to-rows)

, unpivot as (
Select *
From (
Select grp
, status
, record_date
, updated_by
, Case When C.lvl = 1 Then 'B'
When C.lvl = 2 Then 'C'
End col
, Case When C.lvl = 1 Then coalesce(B, '<null>')
When C.lvl = 2 Then coalesce(C, '<null>')
End val
From changes
cross join (
select level lvl
from dual
connect by level <= 2
) c
)
where val is not null
order by 1, 3, 2 desc
)


(Yes, for non-trivial data I&#39;ll need to cast the data going into `val` to something more generic, like a string.)

But **how do I programmatically determine the number of columns, the column order, and generate the column names** for both the value of `col` and for the reference for the `CASE` statement in `val`?

I suppose I could use something like this as part of the solution:

SELECT COLUMN_NAME
, COLUMN_ID

FROM ALL_tab_columns

WHERE OWNER = 'MY_OWNER_NAME'
AND TABLE_NAME = 'SOURCE_TABLE'

ORDER BY COLUMN_ID


But I&#39;m not sure how to dovetail that into the solution in a meaningful way without involving dynamic SQL, which I&#39;m pretty sure I can&#39;t do.  And it would probably require referencing columns based on ordinal position, which doesn&#39;t appear to be possible in SQL.  Of course, if that would work I could use a similar query to figure out how to handle data types for the `val` column.

Then I need to pivot that to show the before and after values in different columns.  Then I can filter that to only what changed.

,
pivot as (
select grp
, record_date
, col
, updated_by
, max("'previous'") val_prev
, max("'current'") val_curr
from unpivot
pivot (
max(val)
for status
in (
'previous',
'current'
)
)
group by grp
, record_date
, col
, updated_by
)
select grp
, record_date
, col
, updated_by
, val_prev
, val_curr
from pivot
where val_curr <> val_prev
order by grp
, record_date

|GRP|RECORD_DATE|COL|UPDATED_BY|VAL_PREV|VAL_CURR|
|---|---|---|---|---|---|
|a|2022-12-02 00:00:00|C|sally|c|d|
|z|2022-12-09 00:00:00|B|sally|e|f|


</details>


# 答案1
**得分**: 1

以下是翻译好的部分:

"with" 子句中的 SQL 宏示例是在 Oracle 数据库 21.3 上运行的示例,该示例展示了如何使用 SQL 宏来实现您想要的功能。通过此示例,您可以将未列出的列转换为行,使用以下 SQL 宏函数:

```sql
with function unpivot_macro (
  tab       dbms_tf.table_t,
  keep_cols dbms_tf.columns_t
) return clob sql_macro is
  sql_stmt     clob;
  unpivot_list clob;
  select_list  clob;
begin

  for col in tab.column.first .. tab.column.last loop
    if tab.column ( col ).description.name 
         not member of keep_cols then
      unpivot_list := unpivot_list || 
        ', ' || tab.column ( col ).description.name;
    end if;
    
    select_list := select_list || 
      ', to_char (' || tab.column ( col ).description.name || ') as ' || 
      tab.column ( col ).description.name;
  end loop;
  
  sql_stmt := 
    'select * from ( 
       select ' || trim ( both ',' from select_list ) || ' from tab 
     )
     unpivot (
       val for col 
       in ( ' || trim ( both ',' from unpivot_list ) || ' )
     )
     where status is not null';

  return sql_stmt;
  
end unpivot_macro;
  source_data as (
  select 'a' grp, 'b' b, 'c' c, date '2022-12-01' record_date, 'joe' updated_by
  from dual union all
  select 'a', 'z', 'd', date '2022-12-02', 'sally' updated_by  
  from dual union all
  select 'a', 'a', 'd', date '2022-12-04', 'joe' updated_by  
  from dual union all
  select 'z' a, 'b' b, 'c' c, date '2022-12-01', 'joe' updated_by
  from dual union all
  select 'z', 'e', 'c', date '2022-12-08', 'joe' updated_by
  from dual union all
  select 'z', 'f', 'c', date '2022-12-09', 'sally' updated_by
  from dual
), changes as (
  select s.grp, b, c,
         'sally' updated_by, 
         case 
           when updated_by = 'sally' then record_date
           else lead ( record_date ) over ( partition by grp order by record_date ) 
         end record_date,
         case 
           when updated_by = 'sally' then 'current' 
           when lead ( updated_by ) over ( partition by grp order by record_date ) = 'sally'
           then 'previous' 
         end status
  from   source_data s
)
  select * from unpivot_macro (
    changes, columns ( grp, record_date, updated_by, status )
  )
  pivot (
    max ( val ) for status 
    in ( 'previous' prev_val, 'current' curr_val )
  )
  where  prev_val <> curr_val;

以上示例展示了如何在 Oracle 数据库中使用 SQL 宏来执行动态 unpivot 操作。这允许您将未列出的列转换为行,以实现所需的功能。

英文:

You can't do this with pure SQL alone. But you can achieve what you want in a single statement using SQL macros - provided you're on an up-to-date version of Oracle Database.

This is an example of a dynamic unpivot macro that converts all the unlisted columns to rows:

create or replace function unpivot_macro (
  tab       dbms_tf.table_t,
  keep_cols dbms_tf.columns_t
) return clob sql_macro is
  sql_stmt     clob;
  unpivot_list clob;
  select_list  clob;
begin

  for col in tab.column.first .. tab.column.last loop
    if tab.column ( col ).description.name 
         not member of keep_cols then
      unpivot_list := unpivot_list || 
        &#39;,&#39; || tab.column ( col ).description.name;
    end if;
    
    select_list := select_list || 
      &#39;, to_char (&#39; || tab.column ( col ).description.name || &#39;) as &#39; || 
      tab.column ( col ).description.name;
  end loop;
  
  sql_stmt := 
    &#39;select * from ( 
       select &#39; || trim ( both &#39;,&#39; from select_list ) || &#39; from tab 
     )
     unpivot (
       val for col 
       in ( &#39; || trim ( both &#39;,&#39; from unpivot_list ) || &#39; )
     )&#39;;

  return sql_stmt;
  
end unpivot_macro;
/

select * from unpivot_macro ( 
  source_data, columns ( grp, updated_by, record_date )
);

GRP    RECORD_DATE          UPDATED_BY    COL    VAL    
a      01-DEC-2022 00:00    joe           B      b      
a      01-DEC-2022 00:00    joe           C      c      
a      02-DEC-2022 00:00    sally         B      z      
a      02-DEC-2022 00:00    sally         C      d      
a      04-DEC-2022 00:00    joe           B      a      
a      04-DEC-2022 00:00    joe           C      d
...

If the reason for avoiding PL/SQL is you don't have permission to create functions, you can place the macro in the with clause.

Here's an example running on 21.3:

with function unpivot_macro (
  tab       dbms_tf.table_t,
  keep_cols dbms_tf.columns_t
) return clob sql_macro is
  sql_stmt     clob;
  unpivot_list clob;
  select_list  clob;
begin

  for col in tab.column.first .. tab.column.last loop
    if tab.column ( col ).description.name 
         not member of keep_cols then
      unpivot_list := unpivot_list || 
        &#39;,&#39; || tab.column ( col ).description.name;
    end if;
    
    select_list := select_list || 
      &#39;, to_char (&#39; || tab.column ( col ).description.name || &#39;) as &#39; || 
      tab.column ( col ).description.name;
  end loop;
  
  sql_stmt := 
    &#39;select * from ( 
       select &#39; || trim ( both &#39;,&#39; from select_list ) || &#39; from tab 
     )
     unpivot (
       val for col 
       in ( &#39; || trim ( both &#39;,&#39; from unpivot_list ) || &#39; )
     )
     where status is not null&#39;;

  return sql_stmt;
  
end unpivot_macro;
  source_data as (
  select &#39;a&#39; grp, &#39;b&#39; b, &#39;c&#39; c, date &#39;2022-12-01&#39; record_date, &#39;joe&#39; updated_by
  from dual union all
  select &#39;a&#39;, &#39;z&#39;, &#39;d&#39;, date &#39;2022-12-02&#39;, &#39;sally&#39; updated_by  
  from dual union all
  select &#39;a&#39;, &#39;a&#39;, &#39;d&#39;, date &#39;2022-12-04&#39;, &#39;joe&#39; updated_by  
  from dual union all
  select &#39;z&#39; a, &#39;b&#39; b, &#39;c&#39; c, date &#39;2022-12-01&#39;, &#39;joe&#39; updated_by
  from dual union all
  select &#39;z&#39;, &#39;e&#39;, &#39;c&#39;, date &#39;2022-12-08&#39;, &#39;joe&#39; updated_by
  from dual union all
  select &#39;z&#39;, &#39;f&#39;, &#39;c&#39;, date &#39;2022-12-09&#39;, &#39;sally&#39; updated_by
  from dual
), changes as (
  select s.grp, b, c,
         &#39;sally&#39; updated_by, 
         case 
           when updated_by = &#39;sally&#39; then record_date
           else lead ( record_date ) over ( partition by grp order by record_date ) 
         end record_date,
         case 
           when updated_by = &#39;sally&#39; then &#39;current&#39; 
           when lead ( updated_by ) over ( partition by grp order by record_date ) = &#39;sally&#39;
           then &#39;previous&#39; 
         end status
  from   source_data s
)
  select * from unpivot_macro (
    changes, columns ( grp, record_date, updated_by, status )
  )
  pivot (
    max ( val ) for status 
    in ( &#39;previous&#39; prev_val, &#39;current&#39; curr_val )
  )
  where  prev_val &lt;&gt; curr_val;

G UPDAT RECORD_DATE        C P C
- ----- ------------------ - - -
a sally 02-DEC-22          B b z
a sally 02-DEC-22          C c d
z sally 09-DEC-22          B e f

huangapple
  • 本文由 发表于 2023年2月14日 08:03:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定