将新数据从CSV插入到已存在的PostgreSQL数据库。

huangapple go评论77阅读模式
英文:

insert new data from csv to already existing postgresql database

问题

这部分代码是用于循环遍历文件夹并将多个CSV文件导入到PostgreSQL数据库中的:

import psycopg2

file_names = [
    'C:/john/database_files'
]

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

for file_name in file_names:
    with open(file_name, 'r') as file_in:
        next(file_in)
        with con.cursor() as cur:
            cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
        con.commit()

con.close()

如果后来创建了具有相同列标题的新CSV文件,我想仅导入新CSV文件中的数据,以便将新数据添加到数据库中的相同数据表(不覆盖)。例如,如果创建了一个包含10行的新CSV文件,而我的当前数据库有100行,则更新后的数据表将有110行。

非常感谢!

英文:

this part of my code is for looping through a folder and import multiple csv file to postgresql database:

import psycopg2

file_names = [
    'C:/john/database_files'
]

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

for file_name in file_names:
    with open(file_name, 'r') as file_in:
        next(file_in)
        with con.cursor() as cur:
            cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
        con.commit()

con.close()

if there are new csv files of the same column headers are created later on, I want to import the data from the new csv files only, so that the new data is added to the same data table in the database (not overwrite). For example, if a new csv file of 10 rows is created, and my current database has 100 rows, the updated data table will have 110 rows.

Many thanks!

答案1

得分: 1

我已经更新了代码,以创建一个包含所有索引文件列表的文本文件。这将允许代码仅读取未见过的 CSV 文件。

import os
import psycopg2

dir_name = 'C:/john/database_files'
processed_files = []

# 从文件中加载 processed_files,如果存在的话
try:
    with open('processed_files.txt', 'r') as f:
        processed_files = f.read().splitlines()
except FileNotFoundError:
    pass

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

# 遍历目录中的所有 CSV 文件
for file_name in os.listdir(dir_name):
    if file_name.endswith('.csv') and file_name not in processed_files:
        with open(os.path.join(dir_name, file_name), 'r') as file_in:
            next(file_in)
            with con.cursor() as cur:
                cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
            con.commit()

        # 将此文件标记为已处理
        processed_files.append(file_name)

con.close()

# 保存已处理文件的列表
with open('processed_files.txt', 'w') as f:
    for file_name in processed_files:
        f.write("%s\n" % file_name)

如果您有任何其他问题,请随时提出。

英文:

I have updated the code to create a text file containing list of all the indexed files. This will allow the code to read only the unseen csv files.

import os
import psycopg2

dir_name = 'C:/john/database_files'
processed_files = []

# Load processed_files from a file, if it exists
try:
    with open('processed_files.txt', 'r') as f:
        processed_files = f.read().splitlines()
except FileNotFoundError:
    pass

con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")

# Loop over all CSV files in the directory
for file_name in os.listdir(dir_name):
    if file_name.endswith('.csv') and file_name not in processed_files:
        with open(os.path.join(dir_name, file_name), 'r') as file_in:
            next(file_in)
            with con.cursor() as cur:
                cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
            con.commit()

        # Mark this file as processed
        processed_files.append(file_name)

con.close()

# Save the list of processed files
with open('processed_files.txt', 'w') as f:
    for file_name in processed_files:
        f.write("%s\n" % file_name)

huangapple
  • 本文由 发表于 2023年7月11日 14:11:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76659127.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定