2023年7月27日 23:36:43go评论136阅读模式

英文:

Upsert/Append to SQL database using SQL Alchemy/Pandas

问题

我正在使用Python 3.11和SSMS 19。

再次遇到了与我的项目有关的一大堆问题。我已经让它工作得相当好了，但我现在意识到它并没有更新，只是一遍又一遍地将相同的行添加到我的数据库中。这意味着如果主键匹配，它会拒绝所有条目，如果我移除PK约束，它不会更新，只会添加一行新的数据。

编辑：
只是为了澄清，我想按列而不是按行或条目来执行此操作。如果我必须逐个添加每个更新，那么我可能干脆不写这个程序，因为我每3天都要为大约300行编写更新行。我需要代码能够自动逐行进行操作，并根据列来更新、合并或忽略。

这是目前正在运行的代码，我已经通过SQLAlchemy创建了引擎/连接。

fulldf.to_sql('Shortage', engine, if_exists='append', index=False, chunksize=200, dtype={'Priority': BigInteger,
                                                               'PN': String(255),
                                                               'Part_Type': Text,
                                                               'MPN': Text,
                                                               'Proposed_MPN': Text,
                                                               'Distributor': String(100),
                                                               'Reason_for_Add': Text,
                                                               'Line_Down': Date,
                                                               'Engineering-Samples_Available': Text,
                                                               'Build_Samples-Available': Text,
                                                               'Comments': Text,                                                                                       
                                                               'Supplier_Comments': Text,
                                                               'Circuit_Location': Text,
                                                               'Last_Updated': Date

这对于初始表格效果很好，但如果我有主键重复的情况，就不太适用了。

我的主键（在SQL中限制，这里是为了方便）是'PN'，所有的主键都是唯一的。
有些列，比如'Part_Type'，'MPN'，'Distributor'等，我只想要覆盖/忽略。这些值不应该改变，如果它们改变了，那将是一个特例。

对于日期列，我希望旧日期被删除，然后添加新日期，我认为可以通过if_exists='update'或者if_exists='replace'来实现，不会太困难。

作为事后想法-我真的不在意SQL注入。这个程序将由大约5个受信任的人在本地Intranet上使用，如果有人真的想破坏一个简单的（不需要生产环境的）数据库，他们可以随时这样做。我宁愿不要在不需要的情况下让事情变得复杂。

我尝试了一些方法，特别是这个upsert方法：

https://blog.alexparunov.com/upserting-update-and-insert-with-pandas

由于我的笔记本电脑受到了严格的限制，所以这对我来说不起作用，所以PostgreSQL和psycopg2无法初始化。是的，我已经尝试安装了psycopg2-binary。

我只是需要一些指导，但在这里没有找到太多信息。

编辑：
以下是我要做的事情的示例，以便更清楚：

我有数据框中的这个条目：

P/N:		Supplier_Comments:		Comments:
1-100116	"6-27-23 Review with Holly"	"24000 reserved"

然后我已经在数据库中的部分如下：

P/N:		Supplier_Comments:		Comments:
1-100116	"6-29-23: Shipping 24k Oct"	"Next Dock Date 7/31"

我需要它变成：

P/N:		Supplier_Comments:		Comments:
1-100116	"6-27-23 Review with Holly"	"24000 reserved"
		"6-29-23: Shipping 24k Oct"	"Next Dock Date 7/31"

而不是：

1-100116	"6-27-23 Review with Holly"	"24000 reserved"	
1-100116	"6-29-23: Shipping 24k Oct"	"Next Dock Date 7/31"

我也不想合并不同的主键。我只想让已经存在的主键与SQL条目合并。

我希望这可以自动完成，而不需要我浏览行来查找数据库中已经存在的主键。

英文:

I'm using Python 3.11 and SSMS 19.

Once again, I'm running into a ton of fun with my project. I've gotten it working reasonably well, but I realize now that it isn't updating, just adding the same lines over and over into my database. Meaning it rejects all of the entries if a primary key matches, and if I remove the PK constraint, it doesn't update, it just adds a new line.

Edit:
Just for clarification, I want to do this by column, not by row or entry. If I have to add each update individually, I might as well not even write the program, as I would be writing update lines for ~300 rows every 3 days. I need the code to go line by line by itself, and either update, merge, or ignore depending on the column.

Here's what's working right now, I have the engine/connection created through SQLAlchemy.

fulldf.to_sql(&#39;Shortage&#39;,engine, if_exists=&#39;append&#39;, index=False, chunksize=200, dtype={&#39;Priority&#39;: BigInteger,
                                                               &#39;PN&#39;:String(255),
                                                               &#39;Part_Type&#39;: Text,
                                                               &#39;MPN&#39;: Text,
                                                               &#39;Proposed_MPN&#39;: Text,
                                                               &#39;Distributor&#39;: String(100),
                                                               &#39;Reason_for_Add&#39;: Text,
                                                               &#39;Line_Down&#39;: Date,
                                                               &#39;Engineering-Samples_Available&#39;: Text,
                                                               &#39;Build_Samples-Available&#39;: Text,
                                                               &#39;Comments&#39;: Text,                                                                                       
                                                               &#39;Supplier_Comments&#39;: Text,
                                                               &#39;Circuit_Location&#39;: Text,
                                                               &#39;Last_Updated&#39;: Date

This works awesome for the initial table, and if I didn't have any Primary Key repeats. Unfortunately I do and I will, so it's not great for that.

My primary key (constrained in SQL, not here for ease of use) is 'PN', of which all are unique.
Some columns like 'Part_Type','MPN','Distributor', etc I just want to be overwritten/ignored. Those values shouldn't change and if they do, that will be a case-by-case basis.

For the Date columns, I want the old date removed and the other added in, which I believe can be accomplished with if_exists='update' or perhaps if_exists='replace'? Either way, that wont be too much of a hurdle.

As an afterthought- I really don't care about SQL injection here. This program is going to be used by roughly 5 trusted people on the local intranet, and if someone really wants to blow up a simple (unnecessary for production) database, they're more than welcome to. I'd rather not complicate something when I don't need to.

I've tried a few things, namely this upsert method:

https://blog.alexparunov.com/upserting-update-and-insert-with-pandas

Which does not work for me as my laptop is locked down harder than Fort Knox, so PostgreSQL and psycopg2 can't initialize. And yes, I've tried installed psycopg2-binary.

I'm just needing some guidance and haven't found much on here.

Edit:
Here's an example for clarity of what I'm looking to do:

I have this entry in the dataframe:

P/N:		Supplier_Comments:		Comments:
1-100116	&quot;6-27-23 Review with Holly	&quot;24000 reserved

and then I have the part that's already in the database:

P/N:		Supplier_Comments:		Comments:
1-100116	&quot;6-29-23: Shipping 24k Oct&quot;	&quot;Next Dock Date 7/31&quot;

I need it to become:

P/N:		Supplier_Comments:		Comments:
1-100116	&quot;6-27-23 Review with Holly	&quot;24000 reserved
		6-29-23: Shipping 24k Oct&quot;	&quot;Next Dock Date 7/31&quot;

NOT

1-100116	&quot;6-27-23 Review with Holly	&quot;24000 reserved	
1-100116	&quot;6-29-23: Shipping 24k Oct&quot;	&quot;Next Dock Date 7/31&quot;

I also don't want to combine different primary keys, either. I just want primary keys that already exist to be joined with the SQL entry.

I want this to be done automatically, without me having to go through the rows to find which primary keys already exist in the database.

答案1

得分: 1

你可以使用.to_sql()将更新上传到临时表，然后使用MERGE来更新主表。

import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH@mssql_199")
# 为此示例设置环境
with engine.begin() as conn:
    conn.exec_driver_sql("DROP TABLE IF EXISTS Shortage")
    conn.exec_driver_sql(
        """
    CREATE TABLE Shortage (
    PN varchar(255) primary key,
    Part_Type varchar(50),
    Comments varchar(max)
    )
    """
    )
    conn.exec_driver_sql(
        """
    INSERT INTO Shortage (PN, Part_Type, Comments)
    VALUES ('Part_A', 'original type', 'A comment.')
    """
    )
# 更新表格
fulldf = pd.DataFrame(
    [
        ("Part_A", "updated type", "Another comment."),
        ("Part_B", "some other type", "This is a new part."),
    ],
    columns=["PN", "Part_Type", "Comments"],
)
with engine.begin() as conn:
    fulldf.to_sql("#temp_table", conn, if_exists="replace", index=False)
    conn.exec_driver_sql(
        """
    MERGE Shortage WITH (HOLDLOCK) AS main
    USING (SELECT PN, Part_Type, Comments FROM #temp_table) AS temp
    ON (main.PN = temp.PN)
    WHEN MATCHED THEN
      UPDATE SET 
        Part_Type = temp.Part_Type, 
        Comments = main.Comments + ' ' + temp.Comments
    WHEN NOT MATCHED THEN
      INSERT (PN, Part_Type, Comments) VALUES (temp.PN, temp.Part_Type, temp.Comments);
    """
    )
# 检查结果
print(pd.read_sql_table("Shortage", engine))

英文:

You can use .to_sql() to upload the updates to a temporary table and then use MERGE to update the main table.

import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine(&quot;mssql+pyodbc://scott:tiger^5HHH@mssql_199&quot;)
# set up environment for this example
with engine.begin() as conn:
    conn.exec_driver_sql(&quot;DROP TABLE IF EXISTS Shortage&quot;)
    conn.exec_driver_sql(
        &quot;&quot;&quot;\
    CREATE TABLE Shortage (
    PN varchar(255) primary key,
    Part_Type varchar(50),
    Comments varchar(max)
    )
    &quot;&quot;&quot;
    )
    conn.exec_driver_sql(
        &quot;&quot;&quot;\
    INSERT INTO Shortage (PN, Part_Type, Comments)
    VALUES (&#39;Part_A&#39;, &#39;original type&#39;, &#39;A comment.&#39;)
    &quot;&quot;&quot;
    )
# update the table
fulldf = pd.DataFrame(
    [
        (&quot;Part_A&quot;, &quot;updated type&quot;, &quot;Another comment.&quot;),
        (&quot;Part_B&quot;, &quot;some other type&quot;, &quot;This is a new part.&quot;),
    ],
    columns=[&quot;PN&quot;, &quot;Part_Type&quot;, &quot;Comments&quot;],
)
with engine.begin() as conn:
    fulldf.to_sql(&quot;#temp_table&quot;, conn, if_exists=&quot;replace&quot;, index=False)
    conn.exec_driver_sql(
        &quot;&quot;&quot;\
    MERGE Shortage WITH (HOLDLOCK) AS main
    USING (SELECT PN, Part_Type, Comments FROM #temp_table) AS temp
    ON (main.PN = temp.PN)
    WHEN MATCHED THEN
      UPDATE SET 
        Part_Type = temp.Part_Type, 
        Comments = main.Comments + &#39; &#39; + temp.Comments
    WHEN NOT MATCHED THEN
      INSERT (PN, Part_Type, Comments) VALUES (temp.PN, temp.Part_Type, temp.Comments);
    &quot;&quot;&quot;
    )
# check results
print(pd.read_sql_table(&quot;Shortage&quot;, engine))
&quot;&quot;&quot;
       PN        Part_Type                     Comments
0  Part_A     updated type  A comment. Another comment.
1  Part_B  some other type          This is a new part.
&quot;&quot;&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Upsert/Append to SQL database using SQL Alchemy/Pandas.

问题

答案1

根据条件获取不同的行

Pandas根据条件进行变换

Web抓取动态加载页面时出现问题。

计算CSV中的平均值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。