2023年6月19日 16:10:16go评论96阅读模式

英文:

How to update a csv from another csv considering a combination of columns as the primary key?

问题

我在我的任务中遇到了一个难题，需要构建一个Python项目来执行以下任务。我很诚恳地请求您的帮助，以使其完成！我有两个具有相同标题的CSV文件，它们是：

[ID,日期,时间,类型,状态,成员,账户,属性,信用数量,借记数量,净数量,信用价值,借记价值,净值,货币]

一个CSV是input.csv，其中包含了一个运行的输入值，另一个CSV是system.csv，它充当一个包含每次运行后所有汇总值的数据库。必须根据input.csv更新system.csv。

在更新systems.csv时，将考虑以下字段的组合作为主键：'日期'，'成员'，'账户'，'属性'

如果在systems.csv中找到主键的值，那么在信用数量、借记数量、净数量、信用价值、借记价值、净值下的每个值都必须被添加到现有值中，如下所示：

input.csv

ID,日期,时间,类型,状态,成员,账户,属性,信用数量,借记数量,净数量,信用价值,借记价值,净值,货币
id01,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,100, 0, 1000, 100, 0, 1000, usd
id02,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,200, 2000, 0, 200, 2000, usd

system.csv

id101,2023.03.16,08:00:00,visa,active,xyz,acc001,cc,500, 0, 5000, 400, 0, 4000, usd
id102,2023.03.16,09:00:00,visa,active,abc,acc002,cc,0,600, 6000, 0, 200, 2000, usd

运行后的system.csv

id101,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,600, 0, 6000, 500, 0, 5000, usd
id102,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,800, 8000, 0, 400, 4000, usd

目前我已经将两个csv文件作为数据帧，并尝试进行处理。但由于我知识的不足，请帮助我完成这个任务。提前感谢您！

英文:

I am having a block in my assignment to build a python project to do the following task. I'm kindly seeking help from you to make it complete!
I have 2 CSVs with same headers, which are:

[ID,Date,time,type,status,member,account,property,credit quantity, debitquantity, Net quantity, credit value, debit value, Net value, currency]

One csv is input.csv which contains input values for a run and other csv is system.csv which act like a database containing all the summed up values after each run. system.csv must be updated based on the input.csv.

When updating the systems.csv the following combination of fields is considered as the primary key: 'Date', 'member', 'account', 'property'

if the values of the primary key is found in the systems.csv each values under credit quantity, debit quantity, Net quantity, credit value, debit value, Net value must be added to the existing values as follows:

input.csv

ID,Date,time,type,status,member,account,property,credit quantity, debit quantity, Net quantity, credit value, debit value, Net value, currency
id01,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,100, 0, 1000, 100, 0, 1000, usd
id02,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,200, 2000, 0, 200, 2000, usd

system.csv

id101,2023.03.16,08:00:00,visa,active,xyz,acc001,cc,500, 0, 5000, 400, 0, 4000, usd
id102,2023.03.16,09:00:00,visa,active,abc,acc002,cc,0,600, 6000, 0, 200, 2000, usd

system.csv after run

id101,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,600, 0, 6000, 500, 0, 5000, usd
id102,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,800, 8000, 0, 400, 4000, usd

Currently I've taken two csvs as dataframes and trying to do the process. but since my lack of knowledge please help me to complete this. Thanks in advance!!

答案1

得分: 1

IIUC，您可以使用：

df_in = pd.read_csv("input.csv")
df_sys = pd.read_csv("system.csv")  # 可选的 `header=None, names=df_in.columns`
pkeys = ["Date", "member", "account", "property"]
scols = df_in.select_dtypes("object").columns.difference(pkeys)
ncols = df_in.columns.difference(scols.union(pkeys))
df_run = (
    pd.concat([df_in, df_sys])
    .groupby(pkeys, as_index=False, sort=False)
    .agg(
        {**{col: "sum" for col in ncols},
         **{col: "first" for col in scols}})
    [df_in.columns]
)
# df_run.to_csv("system.csv", index=False)  # 取消注释以覆盖旧的 `.csv` 文件

输出（以表格格式显示的 system.csv）：

ID	Date	time	type	status	member	account	property	credit quantity	debit quantity	Net quantity	credit value	debit value	Net value	currency
id101	2023.03.16	21:00:00	visa	active	xyz	acc001	cc	600	0	6000	500	0	5000	usd
id102	2023.03.16	22:00:00	visa	active	abc	acc002	cc	0	800	8000	0	400	4000	usd

英文:

IIUC, you can use :

df_in = pd.read_csv(&quot;input.csv&quot;)
df_sys = pd.read_csv(&quot;system.csv&quot;) #with optional `header=None, names=df_in.columns`
pkeys = [&quot;Date&quot;, &quot;member&quot;, &quot;account&quot;, &quot;property&quot;]
scols = df_in.select_dtypes(&quot;object&quot;).columns.difference(pkeys)
ncols = df_in.columns.difference(scols.union(pkeys))
df_run = (
    pd.concat([df_in, df_sys])
        .groupby(pkeys,  as_index=False, sort=False).agg(
            {**{col: &quot;sum&quot; for col in ncols},
             **{col: &quot;first&quot; for col in scols}})
        [df_in.columns]
)
# df_run.to_csv(&quot;system.csv&quot;, index=False) #uncomment to overwrite the old `.csv`

Output (system.csv in a tabular format) :

ID	Date	time	type	status	member	account	property	credit quantity	debit quantity	Net quantity	credit value	debit value	Net value	currency
id101	2023.03.16	21:00:00	visa	active	xyz	acc001	cc	600	0	6000	500	0	5000	usd
id102	2023.03.16	22:00:00	visa	active	abc	acc002	cc	0	800	8000	0	400	4000	usd

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据一组列的组合作为主键，从另一个CSV文件更新CSV文件？

问题

答案1

Cannot start docker VM: ImportError: cannot import name 'format_lazy' from 'django.utils.text' (Python Error)

我怎么在 folium 中添加文本框（其行为与 LayerControl 面板差不多）？

如何在运行时根据纹理大小调整RecycleView中的标签大小？

从嵌套字典中根据条件提取数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。