2023年7月11日 14:11:50go评论112阅读模式

英文:

insert new data from csv to already existing postgresql database

问题

这部分代码是用于循环遍历文件夹并将多个CSV文件导入到PostgreSQL数据库中的：

import psycopg2
file_names = [
    'C:/john/database_files'
]
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
for file_name in file_names:
    with open(file_name, 'r') as file_in:
        next(file_in)
        with con.cursor() as cur:
            cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
        con.commit()
con.close()

如果后来创建了具有相同列标题的新CSV文件，我想仅导入新CSV文件中的数据，以便将新数据添加到数据库中的相同数据表（不覆盖）。例如，如果创建了一个包含10行的新CSV文件，而我的当前数据库有100行，则更新后的数据表将有110行。

非常感谢！

英文:

this part of my code is for looping through a folder and import multiple csv file to postgresql database:

import psycopg2
file_names = [
    &#39;C:/john/database_files&#39;
]
con = psycopg2.connect(database=&quot;xxxx&quot;, user=&quot;xxxx&quot;, password=&quot;xxxx&quot;, host=&quot;xxxx&quot;)
for file_name in file_names:
    with open(file_name, &#39;r&#39;) as file_in:
        next(file_in)
        with con.cursor() as cur:
            cur.copy_from(file_in, &quot;table_name&quot;, columns=(&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;, &#39;col5&#39;), sep=&quot;,&quot;)
        con.commit()
con.close()

if there are new csv files of the same column headers are created later on, I want to import the data from the new csv files only, so that the new data is added to the same data table in the database (not overwrite). For example, if a new csv file of 10 rows is created, and my current database has 100 rows, the updated data table will have 110 rows.

Many thanks!

答案1

得分: 1

我已经更新了代码，以创建一个包含所有索引文件列表的文本文件。这将允许代码仅读取未见过的 CSV 文件。

import os
import psycopg2
dir_name = 'C:/john/database_files'
processed_files = []
# 从文件中加载 processed_files，如果存在的话
try:
    with open('processed_files.txt', 'r') as f:
        processed_files = f.read().splitlines()
except FileNotFoundError:
    pass
con = psycopg2.connect(database="xxxx", user="xxxx", password="xxxx", host="xxxx")
# 遍历目录中的所有 CSV 文件
for file_name in os.listdir(dir_name):
    if file_name.endswith('.csv') and file_name not in processed_files:
        with open(os.path.join(dir_name, file_name), 'r') as file_in:
            next(file_in)
            with con.cursor() as cur:
                cur.copy_from(file_in, "table_name", columns=('col1', 'col2', 'col3', 'col4', 'col5'), sep=",")
            con.commit()
        # 将此文件标记为已处理
        processed_files.append(file_name)
con.close()
# 保存已处理文件的列表
with open('processed_files.txt', 'w') as f:
    for file_name in processed_files:
        f.write("%s\n" % file_name)

如果您有任何其他问题，请随时提出。

英文:

I have updated the code to create a text file containing list of all the indexed files. This will allow the code to read only the unseen csv files.

import os
import psycopg2
dir_name = &#39;C:/john/database_files&#39;
processed_files = []
# Load processed_files from a file, if it exists
try:
    with open(&#39;processed_files.txt&#39;, &#39;r&#39;) as f:
        processed_files = f.read().splitlines()
except FileNotFoundError:
    pass
con = psycopg2.connect(database=&quot;xxxx&quot;, user=&quot;xxxx&quot;, password=&quot;xxxx&quot;, host=&quot;xxxx&quot;)
# Loop over all CSV files in the directory
for file_name in os.listdir(dir_name):
    if file_name.endswith(&#39;.csv&#39;) and file_name not in processed_files:
        with open(os.path.join(dir_name, file_name), &#39;r&#39;) as file_in:
            next(file_in)
            with con.cursor() as cur:
                cur.copy_from(file_in, &quot;table_name&quot;, columns=(&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;, &#39;col5&#39;), sep=&quot;,&quot;)
            con.commit()
        # Mark this file as processed
        processed_files.append(file_name)
con.close()
# Save the list of processed files
with open(&#39;processed_files.txt&#39;, &#39;w&#39;) as f:
    for file_name in processed_files:
        f.write(&quot;%s\n&quot; % file_name)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将新数据从CSV插入到已存在的PostgreSQL数据库。

问题

答案1

添加物品到愿望清单

Deepdiff 如果有不匹配的情况，将打印所有的键。

‘say’未被识别为内部或外部命令、可操作的程序或批处理文件。

如何使用Space-Track网站的API生成数据。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。