英文:
How to update a csv from another csv considering a combination of columns as the primary key?
问题
我在我的任务中遇到了一个难题,需要构建一个Python项目来执行以下任务。我很诚恳地请求您的帮助,以使其完成!我有两个具有相同标题的CSV文件,它们是:
[ID,日期,时间,类型,状态,成员,账户,属性,信用数量,借记数量,净数量,信用价值,借记价值,净值,货币]
一个CSV是input.csv,其中包含了一个运行的输入值,另一个CSV是system.csv,它充当一个包含每次运行后所有汇总值的数据库。必须根据input.csv更新system.csv。
在更新systems.csv
时,将考虑以下字段的组合作为主键:'日期','成员','账户','属性'
如果在systems.csv中找到主键的值,那么在信用数量、借记数量、净数量、信用价值、借记价值、净值下的每个值都必须被添加到现有值中,如下所示:
input.csv
ID,日期,时间,类型,状态,成员,账户,属性,信用数量,借记数量,净数量,信用价值,借记价值,净值,货币
id01,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,100, 0, 1000, 100, 0, 1000, usd
id02,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,200, 2000, 0, 200, 2000, usd
system.csv
id101,2023.03.16,08:00:00,visa,active,xyz,acc001,cc,500, 0, 5000, 400, 0, 4000, usd
id102,2023.03.16,09:00:00,visa,active,abc,acc002,cc,0,600, 6000, 0, 200, 2000, usd
运行后的system.csv
id101,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,600, 0, 6000, 500, 0, 5000, usd
id102,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,800, 8000, 0, 400, 4000, usd
目前我已经将两个csv文件作为数据帧,并尝试进行处理。但由于我知识的不足,请帮助我完成这个任务。提前感谢您!
英文:
I am having a block in my assignment to build a python project to do the following task. I'm kindly seeking help from you to make it complete!
I have 2 CSVs with same headers, which are:
[ID,Date,time,type,status,member,account,property,credit quantity, debitquantity, Net quantity, credit value, debit value, Net value, currency]
One csv is input.csv which contains input values for a run and other csv is system.csv which act like a database containing all the summed up values after each run. system.csv must be updated based on the input.csv.
When updating the systems.csv
the following combination of fields is considered as the primary key: 'Date', 'member', 'account', 'property'
if the values of the primary key is found in the systems.csv each values under credit quantity, debit quantity, Net quantity, credit value, debit value, Net value must be added to the existing values as follows:
input.csv
ID,Date,time,type,status,member,account,property,credit quantity, debit quantity, Net quantity, credit value, debit value, Net value, currency
id01,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,100, 0, 1000, 100, 0, 1000, usd
id02,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,200, 2000, 0, 200, 2000, usd
system.csv
id101,2023.03.16,08:00:00,visa,active,xyz,acc001,cc,500, 0, 5000, 400, 0, 4000, usd
id102,2023.03.16,09:00:00,visa,active,abc,acc002,cc,0,600, 6000, 0, 200, 2000, usd
system.csv after run
id101,2023.03.16,21:00:00,visa,active,xyz,acc001,cc,600, 0, 6000, 500, 0, 5000, usd
id102,2023.03.16,22:00:00,visa,active,abc,acc002,cc,0,800, 8000, 0, 400, 4000, usd
Currently I've taken two csvs as dataframes and trying to do the process. but since my lack of knowledge please help me to complete this. Thanks in advance!!
答案1
得分: 1
IIUC,您可以使用:
df_in = pd.read_csv("input.csv")
df_sys = pd.read_csv("system.csv") # 可选的 `header=None, names=df_in.columns`
pkeys = ["Date", "member", "account", "property"]
scols = df_in.select_dtypes("object").columns.difference(pkeys)
ncols = df_in.columns.difference(scols.union(pkeys))
df_run = (
pd.concat([df_in, df_sys])
.groupby(pkeys, as_index=False, sort=False)
.agg(
{**{col: "sum" for col in ncols},
**{col: "first" for col in scols}})
[df_in.columns]
)
# df_run.to_csv("system.csv", index=False) # 取消注释以覆盖旧的 `.csv` 文件
输出(以表格格式显示的 system.csv
):
ID | Date | time | type | status | member | account | property | credit quantity | debit quantity | Net quantity | credit value | debit value | Net value | currency |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id101 | 2023.03.16 | 21:00:00 | visa | active | xyz | acc001 | cc | 600 | 0 | 6000 | 500 | 0 | 5000 | usd |
id102 | 2023.03.16 | 22:00:00 | visa | active | abc | acc002 | cc | 0 | 800 | 8000 | 0 | 400 | 4000 | usd |
英文:
IIUC, you can use :
df_in = pd.read_csv("input.csv")
df_sys = pd.read_csv("system.csv") #with optional `header=None, names=df_in.columns`
pkeys = ["Date", "member", "account", "property"]
scols = df_in.select_dtypes("object").columns.difference(pkeys)
ncols = df_in.columns.difference(scols.union(pkeys))
df_run = (
pd.concat([df_in, df_sys])
.groupby(pkeys, as_index=False, sort=False).agg(
{**{col: "sum" for col in ncols},
**{col: "first" for col in scols}})
[df_in.columns]
)
# df_run.to_csv("system.csv", index=False) #uncomment to overwrite the old `.csv`
Output (system.csv
in a tabular format) :
ID | Date | time | type | status | member | account | property | credit quantity | debit quantity | Net quantity | credit value | debit value | Net value | currency |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id101 | 2023.03.16 | 21:00:00 | visa | active | xyz | acc001 | cc | 600 | 0 | 6000 | 500 | 0 | 5000 | usd |
id102 | 2023.03.16 | 22:00:00 | visa | active | abc | acc002 | cc | 0 | 800 | 8000 | 0 | 400 | 4000 | usd |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论