英文:
insert new data from csv to already existing postgresql database
问题
这部分代码是用于循环遍历文件夹并将多个CSV文件导入到PostgreSQL数据库中的:
import psycopg2
file_names = [
'C:/john/database_files'
]
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
for file_name in file_names:
with open(file_name, 'r') as file_in:
next(file_in)
with con.cursor() as cur:
cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
con.commit()
con.close()
如果后来创建了具有相同列标题的新CSV文件,我想仅导入新CSV文件中的数据,以便将新数据添加到数据库中的相同数据表(不覆盖)。例如,如果创建了一个包含10行的新CSV文件,而我的当前数据库有100行,则更新后的数据表将有110行。
非常感谢!
英文:
this part of my code is for looping through a folder and import multiple csv file to postgresql database:
import psycopg2
file_names = [
'C:/john/database_files'
]
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
for file_name in file_names:
with open(file_name, 'r') as file_in:
next(file_in)
with con.cursor() as cur:
cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
con.commit()
con.close()
if there are new csv files of the same column headers are created later on, I want to import the data from the new csv files only, so that the new data is added to the same data table in the database (not overwrite). For example, if a new csv file of 10 rows is created, and my current database has 100 rows, the updated data table will have 110 rows.
Many thanks!
答案1
得分: 1
我已经更新了代码,以创建一个包含所有索引文件列表的文本文件。这将允许代码仅读取未见过的 CSV 文件。
import os
import psycopg2
dir_name = 'C:/john/database_files'
processed_files = []
# 从文件中加载 processed_files,如果存在的话
try:
with open('processed_files.txt', 'r') as f:
processed_files = f.read().splitlines()
except FileNotFoundError:
pass
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
# 遍历目录中的所有 CSV 文件
for file_name in os.listdir(dir_name):
if file_name.endswith('.csv') and file_name not in processed_files:
with open(os.path.join(dir_name, file_name), 'r') as file_in:
next(file_in)
with con.cursor() as cur:
cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
con.commit()
# 将此文件标记为已处理
processed_files.append(file_name)
con.close()
# 保存已处理文件的列表
with open('processed_files.txt', 'w') as f:
for file_name in processed_files:
f.write("%s\n" % file_name)
如果您有任何其他问题,请随时提出。
英文:
I have updated the code to create a text file containing list of all the indexed files. This will allow the code to read only the unseen csv files.
import os
import psycopg2
dir_name = 'C:/john/database_files'
processed_files = []
# Load processed_files from a file, if it exists
try:
with open('processed_files.txt', 'r') as f:
processed_files = f.read().splitlines()
except FileNotFoundError:
pass
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
# Loop over all CSV files in the directory
for file_name in os.listdir(dir_name):
if file_name.endswith('.csv') and file_name not in processed_files:
with open(os.path.join(dir_name, file_name), 'r') as file_in:
next(file_in)
with con.cursor() as cur:
cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
con.commit()
# Mark this file as processed
processed_files.append(file_name)
con.close()
# Save the list of processed files
with open('processed_files.txt', 'w') as f:
for file_name in processed_files:
f.write("%s\n" % file_name)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论