外部表格列

huangapple go评论50阅读模式
英文:

External Table Columns

问题

我有以下代码。现在的问题是,我们期望文件中有三列,但有时其他团队会发送给我们四列。因此,不是在加载失败时,它会加载前三列。当文件少于3列时,它会失败,这是预期的。当文件中存在额外列时,我需要在哪里放置逻辑以使其失败?

CREATE TABLE TESTING_DUMP (
    "FIELD_1"            NUMBER,
    "FIELD_2"            VARCHAR2(5),
    "FIELD_3"            VARCHAR2(5)
)
ORGANIZATION external
(
    TYPE oracle_loader
    DEFAULT DIRECTORY MY_DIR
    ACCESS PARAMETERS
    (
      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
      BADFILE "MY_DIR":"TEST.bad"
      LOGFILE "MY_DIR":"TEST.log"
      READSIZE 1048576
      FIELDS TERMINATED BY "|" LDRTRIM
      MISSING FIELD VALUES ARE NULL
      REJECT ROWS WITH ALL NULL FIELDS
      (
      "LOAD"               CHAR(1),
      "FIELD_1"            CHAR(5),
      "FIELD_2"            INTEGER EXTERNAL(5),
      "FIELD_3"            CHAR(5)
      )
    )
    location
    (
    'Test.xls'
    )
)REJECT LIMIT 0;

文件Test.xls的示例内容如下。第二行是正确的。第一行应该失败,但它没有失败。

|11111|22222|33333|AAAAA
|22222|33333|44444|
英文:

I have the following code. Now the issue is that we expect three columns in the file but sometime the other team sends us 4 columns. So instead of failing the load, it will load first three columns. When the file has less than 3 columns, then it fails which is expected. What logic do I need to place where it fails when an extra column is present in the file?

CREATE TABLE TESTING_DUMP (
"FIELD_1"            NUMBER,
"FIELD_2"            VARCHAR2(5),
"FIELD_3"            VARCHAR2(5)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY MY_DIR
ACCESS PARAMETERS
(
  RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
  BADFILE "MY_DIR":"TEST.bad"
  LOGFILE "MY_DIR":"TEST.log"
  READSIZE 1048576
  FIELDS TERMINATED BY "|" LDRTRIM
  MISSING FIELD VALUES ARE NULL
  REJECT ROWS WITH ALL NULL FIELDS
  (
  "LOAD"               CHAR(1),
  "FIELD_1"            CHAR(5),
  "FIELD_2"            INTEGER EXTERNAL(5),
  "FIELD_3"            CHAR(5)
  )
  )
  location
  (
  'Test.xls'
  )
)REJECT LIMIT 0;

File Test.xls has sample content below. Second line is correct. It should fail for first line but it does not.

|11111|22222|33333|AAAAA
|22222|33333|44444|

答案1

得分: 1

我不知道如何一步做到这一点,所以我会提出一个解决方法 - 看看它是否有帮助。

这是目标表,最终应该只包含有效的行:

SQL> create table ext_target
  2    (col1 number,
  3     col2 varchar2(5),
  4     col3 varchar2(5));
  
  表已创建。

外部表只包含一个列,该列将包含整行(即没有单独的列):

SQL> create table ext_dump
  2    (col  varchar2(100))
  3  organization external (
  4    type oracle_loader
  5    default directory ext_dir
  6    access parameters (
  7      records delimited by newline
  8      fields terminated by ','
  9      missing field values are null
  10      (
  11        col      char(100)    )
  12    )
  13    location ('test.txt')
  14  )
  15  reject limit unlimited;
  
  表已创建。

这是整个文件的内容:

|11111|22222|33333|AAAAA
|22222|33333|44444|
|55555|66666|

外部表包含整个文件(没有被拒绝的内容):

SQL> select * from ext_dump;

COL
--------------------------------------------------------------------------------
|11111|22222|33333|AAAAA
|22222|33333|44444|
|55555|66666|

将只有有效行插入目标表(到目前为止,有两个条件:行中不能有4个“列”,并且只能有4个|“分隔符”:

SQL> insert into ext_target (col1, col2, col3)
  2  select regexp_substr(col, '\w+', 1, 1),
  3         regexp_substr(col, '\w+', 1, 2),
  4         regexp_substr(col, '\w+', 1, 3)
  5  from ext_dump
  6  where regexp_substr(col, '\w+', 1, 4) is null
  7    and regexp_count(col, '\|') = 4;
  
  已创建1行。

唯一有效的行:

SQL> select * from ext_target;

      COL1 COL2  COL3
---------- ----- -----
     22222 33333 44444

SQL>;

现在,您可以按您想要的方式调整where子句;我发布的只是一个示例。

英文:

I wouldn't know how to do that in single step, so I'll suggest a workaround - see if it helps.

This is target table, which is - at the end - supposed to contain valid rows only:

SQL> create table ext_target
  2    (col1 number,
  3     col2 varchar2(5),
  4     col3 varchar2(5));

Table created.

External table contains only one column which will contain the whole row (i.e. no separate columns):

SQL> create table ext_dump
  2    (col  varchar2(100))
  3  organization external (
  4    type oracle_loader
  5    default directory ext_dir
  6    access parameters (
  7      records delimited by newline
  8      fields terminated by ','
  9      missing field values are null
 10      (
 11        col      char(100)    )
 12    )
 13    location ('test.txt')
 14  )
 15  reject limit unlimited;

Table created.

This is the whole file contents:

|11111|22222|33333|AAAAA
|22222|33333|44444|
|55555|66666|

External table contains the whole file (nothing is rejected):

SQL> select * from ext_dump;

COL
--------------------------------------------------------------------------------
|11111|22222|33333|AAAAA
|22222|33333|44444|
|55555|66666|

Insert only valid rows into the target table (so far, there are two conditions: there shouldn't be 4 "columns", and there can be only 4 | separators:

SQL> insert into ext_target (col1, col2, col3)
  2  select regexp_substr(col, '\w+', 1, 1),
  3         regexp_substr(col, '\w+', 1, 2),
  4         regexp_substr(col, '\w+', 1, 3)
  5  from ext_dump
  6  where regexp_substr(col, '\w+', 1, 4) is null
  7    and regexp_count(col, '\|') = 4;

1 row created.

The only valid row:

SQL> select * from ext_target;

      COL1 COL2  COL3
---------- ----- -----
     22222 33333 44444

SQL>

Now, you can adjust the where clause any way you want; what I posted is just an example.

答案2

得分: 0

"可以添加第四个字段定义,将其映射到一种肯定会失败的数据类型,比如一个采用奇怪格式的日期列,确保不会被识别。当字段未出现时,“缺失字段值为空”应将其呈现为空值,而当字段出现时,数据类型转换应出现错误。"

英文:

How about adding a fourth field definition that maps to a datatype sure to fail, like a date column with a funky format sure not to be seen? The "MISSING FIELD VALUES ARE NULL" should render it NULL when not present, and the datatype conversion should error when it is present.

huangapple
  • 本文由 发表于 2023年2月18日 14:09:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491544.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定