加载具有指定列数值的 CSV 文件

huangapple go评论52阅读模式
英文:

LOAD csv file with specified column values

问题

我有两个包含四列(c1,c2,c3,c4)的csv文件,并创建了一个包含五列(a1,a2,a3,a4,a5)的表格。
现在我想将这两个文件分别加载到表格中,以便每次在a1列中都有一个恒定的值。

csv文件1中的数值:

    c1,c2,c3,c4
    ............
    1 2 3 4
    1 2 3 4
    1 2 3 4
    1 2 3 4

csv文件2中的数值:

    c1,c2,c3,c4
    ............
    5 6 7 8 
    5 6 7 8 
    5 6 7 8 
    5 6 7 8 

结果表格应为:

    a1, a2, a3, a4 ,a5
    ..................
    my_value1 1 2 3 4
    my_value1 1 2 3 4
    my_value1 1 2 3 4
    my_value1 1 2 3 4
    my_value2 5 6 7 8 
    my_value2 5 6 7 8 
    my_value2 5 6 7 8 
    my_value2 5 6 7 8 

我尝试了这个方法,但显然不起作用,我阅读了IBM网站上的加载文档,但没有找到任何信息。

load from path\file1 of del insert into table_name(my_value1, c1,c2,c3,c4)
load from path\file2 of del insert into table_name(my_value2, c1,c2,c3,c4)
英文:

I have 2 csv files that contains 4 cols(c1,c2,c3,c4) and created a table that contain 5 columns (a1,a2,a3,a4,a5).
Now I want to load those two files into the tables separately such that for each time I can have a contant value that goes in a1 column of the table.

Values in csv file 1

c1,c2,c3,c4
............
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

Values in csv file 2

c1,c2,c3,c4
............
5 6 7 8 
5 6 7 8 
5 6 7 8 
5 6 7 8 

resulting table should be

a1, a2, a3, a4 ,a5
..................
my_value1 1 2 3 4
my_value1 1 2 3 4
my_value1 1 2 3 4
my_value1 1 2 3 4
my_value2 5 6 7 8 
my_value2 5 6 7 8 
my_value2 5 6 7 8 
my_value2 5 6 7 8 

I tried this but clearly doesn't work and I read the load documentation from the IBM site but I wasn't able to find anything.

load from path\file1 of del insert into table_name(my_value1, c1,c2,c3,c4)
load from path\file2 of del insert into table_name(my_value2, c1,c2,c3,c4)

答案1

得分: 0

这是代码部分的翻译:

  • 使用了一些评论中提供的好建议,这只是另一种执行的方法:
  • 创建名为"target"的表,包括列"a1"(20个字符的字符串,非空)、"a2"、"a3"、"a4"和"a5"。
  • 修改表"target"的列"a1"的默认值为'my_value1'。
  • 从"f1.csv"文件中加载数据,插入到表"target"中的列"a2"、"a3"、"a4"和"a5"中。
  • 修改表"target"的列"a1"的默认值为'my_value2'。
  • 从"f2.csv"文件中加载数据,插入到表"target"中的列"a2"、"a3"、"a4"和"a5"中。
  • 删除表"target"的列"a1"的默认值。
  • 查询表"target"中的所有记录。

另外,代码中还提到了一些其他方法,包括使用"INGEST"和"EXTERNAL TABLE",以及相关的配置和操作。你可以查看提供的链接了解更多信息。

英文:

You have got some good suggestions in the comments, so this is just another way to do it:

db2 "create table target(a1 varchar(20) not null, a2 int, a3 int, a4 int, a5 int)"
db2 "alter table target alter column a1 set default 'my_value1'"
db2 "load from ./f1.csv of del insert into target (a2,a3,a4,a5)"
db2 "alter table target alter column a1 set default 'my_value2'"
db2 "load from ./f2.csv of del insert into target (a2,a3,a4,a5)"
db2 "alter table target alter column a1 drop default"
db2 "select * from target"

A1                   A2          A3          A4          A5         
-------------------- ----------- ----------- ----------- -----------
my_value1                      1           2           3           4
my_value1                      1           2           3           4
my_value1                      1           2           3           4
my_value1                      1           2           3           4
my_value2                      5           6           7           8
my_value2                      5           6           7           8
my_value2                      5           6           7           8
my_value2                      5           6           7           8

  8 record(s) selected.

You may also have a look at INGEST

-- create restart table
CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', NULL, NULL);

INGEST FROM FILE f1.csv      
    FORMAT DELIMITED (        
		$a2  INT EXTERNAL,
		$a3  INT EXTERNAL,
		$a4  INT EXTERNAL,
		$a5  INT EXTERNAL
	)     
    INSERT INTO target (a1,a2,a3,a4,a5)
    VALUES ('my_value1',$a2,$a3,$a4,$a5); 

From my understanding it is almost as fast as load, but much more flexible. There is a comparision at:

https://www.oreilly.com/library/view/ibm-db2-111/9781788626910/15f1d83a-dc08-432c-b91b-f48fba48756d.xhtml

Finally, there is a new kid in town named EXTERNAL TABLE. You have to enable a path for EXTBL_LOCATION in your db cfg. I.e:

db2 update db cfg using EXTBL_LOCATION /tmp

Then you can declare a curor:

db2 "declare c1 cursor for select 'myval1', a2, a3, a4, a5 from external '/tmp/f1.csv' (a2 int, a3 int, a4 int, a5 int) using (delimiter '|')"

For reasons unknown to me I could not get it to work with ',' as delimiter so I changed it in the file.

Now, you can load from that cursor:

db2 "load from c1 of cursor insert into target"  

Redefine the cursor and load another file (I used the same one)

db2 "declare c1 cursor for select 'myval2', a2, a3, a4, a5 from external '/tmp/f1.csv' (a2 int, a3 int, a4 int, a5 int) using (delimiter '|')"
db2 "load from c1 of cursor insert into target"

 db2 "select * from target"

A1                                       A2          A3          A4          A5         
---------------------------------------- ----------- ----------- ----------- -----------
myval1                                             1           2           3           4
myval1                                             1           2           3           4
myval1                                             1           2           3           4
myval1                                             1           2           3           4
myval2                                             1           2           3           4
myval2                                             1           2           3           4
myval2                                             1           2           3           4
myval2                                             1           2           3           4

  8 record(s) selected.

huangapple
  • 本文由 发表于 2023年2月18日 09:35:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75490644.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定