英文:
Vlookup value from master data frame and summarize it
问题
I have a unique data frame having a list of seller names. Below is the sample.
DF_Master_seller
| Seller |
|---|
| Hari |
| Om |
| Cat |
| Mat |
| John |
| Messi |
| John |
| Ronaldo |
| Shannon |
| Catthy |
| Ray |
I have multiple data frames like -
DF1
| Seller | Value | Date | Rating |
|---|---|---|---|
| Hari | 2 | Jan-02 | 4.2 |
| Om | 12 | Jan-02 | 3.4 |
| Cat | 0 | Jan-02 | 4.3 |
| Mat | 14 | Jan-02 | 4.3 |
DF2
| Seller | Value | Date | Rating |
|---|---|---|---|
| Hari | 222 | Oct-02 | 4 |
| Om | 120 | Oct-02 | 3 |
| Cat | 10 | Oct-02 | 2.3 |
| Mat | 143 | Oct-02 | 2.5 |
| John | 30 | Oct-02 | 4.3 |
DF3
| Seller | Value | Date | Rating |
|---|---|---|---|
| Ray | 22 | Feb-02 | 2.4 |
| Jay | 12 | Feb-02 | 2.5 |
| Sohan | 0 | Feb-02 | 2.2 |
| Mat | 144 | Feb-02 | 5 |
| John | 10 | Feb-02 | 3.4 |
Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.
Final DF
| Seller | Value_Jan-02 | Value_Feb-02 | Value_Oct-02 |
|---|---|---|---|
| Hari | 2 | 222 | |
| Om | 12 | 120 | |
| Cat | 0 | 10 | |
| Mat | 14 | 144 | 143 |
| John | 30 | ||
| Messi | |||
| John | 10 | ||
| Ronaldo | |||
| Shannon | |||
| Catthy | |||
| Ray | 22 | ||
| Total | 28 | 176 | 515 |
Note - Column should be ordered by date
英文:
I have a unique data frame having a list of seller names. Below is the sample.
DF_Master_seller
| Seller |
|---|
| Hari |
| Om |
| Cat |
| Mat |
| John |
| Messi |
| John |
| Ronaldo |
| Shannon |
| Catthy |
| Ray |
I have multiple data frames like -
DF1
| Seller | Vaule | Date | Rating |
|---|---|---|---|
| Hari | 2 | Jan-02 | 4.2 |
| Om | 12 | Jan-02 | 3.4 |
| Cat | 0 | Jan-02 | 4.3 |
| Mat | 14 | Jan-02 | 4.3 |
DF2
| Seller | Vaule | Date | Rating |
|---|---|---|---|
| Hari | 222 | Oct-02 | 4 |
| Om | 120 | Oct-02 | 3 |
| Cat | 10 | Oct-02 | 2.3 |
| Mat | 143 | Oct-02 | 2.5 |
| John | 30 | Oct-02 | 4.3 |
DF3
| Seller | Vaule | Date | Rating |
|---|---|---|---|
| Ray | 22 | Feb-02 | 2.4 |
| Jay | 12 | Feb-02 | 2.5 |
| Sohan | 0 | Feb-02 | 2.2 |
| Mat | 144 | Feb-02 | 5 |
| John | 10 | Feb-02 | 3.4 |
Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.
Final DF
| Seller | Value_Jan-02 | Value_Feb-02 | Value_Oct-02 |
|---|---|---|---|
| Hari | 2 | 222 | |
| Om | 12 | 120 | |
| Cat | 0 | 10 | |
| Mat | 14 | 144 | 143 |
| John | 30 | ||
| Messi | |||
| John | 10 | ||
| Ronaldo | |||
| Shannon | |||
| Catthy | |||
| Ray | 22 | ||
| Total | 28 | 176 | 515 |
Note - Column should be ordered by date
答案1
得分: 2
以下是您提供的代码的中文翻译:
与 @Corralien 的方法相同,但使用了 margins 和排序列:
out = (pd.concat([df1, df2, df3])
.pivot_table(index="Seller", values="Value", columns="Date",
aggfunc="sum", margins=True, margins_name="Total", sort=False)
.drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
.sort_index(key=lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
.add_prefix("Value").reset_index().rename_axis(columns=None)
)
输出:
print(out)
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2.00 NaN 222.00
1 Om 12.00 NaN 120.00
2 Cat 0.00 NaN 10.00
3 Mat 14.00 144.00 143.00
4 John NaN 10.00 30.00
5 Messi NaN NaN NaN
6 John NaN 10.00 30.00
7 Ronaldo NaN NaN NaN
8 Shannon NaN NaN NaN
9 Catthy NaN NaN NaN
10 Ray NaN 22.00 NaN
11 Total 28.00 188.00 525.00
英文:
Same approach as @Corralien's but with margins and sorted columns :
out = (pd.concat([df1, df2, df3])
.pivot_table(index="Seller", values="Vaule", columns="Date",
aggfunc="sum", margins=True, margins_name="Total", sort=False)
.drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
.sort_index(key= lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
.add_prefix("Value").reset_index().rename_axis(columns=None)
)
Output :
print(out)
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2.00 NaN 222.00
1 Om 12.00 NaN 120.00
2 Cat 0.00 NaN 10.00
3 Mat 14.00 144.00 143.00
4 John NaN 10.00 30.00
5 Messi NaN NaN NaN
6 John NaN 10.00 30.00
7 Ronaldo NaN NaN NaN
8 Shannon NaN NaN NaN
9 Catthy NaN NaN NaN
10 Ray NaN 22.00 NaN
11 Total 28.00 188.00 525.00
答案2
得分: 1
你可以使用 pd.concat 来合并 df1、df2 和 df3,然后按主数据框 (dfM) 的 Seller 列重新索引:
# 第一步:合并所有数据框,不论日期如何
out = (pd.concat([df1, df2, df3])
.pivot_table(index='Seller', columns='Date', values='Value',
sort=False, fill_value=0)
.reindex(dfM['Seller'], fill_value=0))
# 第二步:重新组织输出
out = (pd.concat([out, out.sum().to_frame('Total').T])
.sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
.add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())
输出:
>>> out
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2 0 222
1 Om 12 0 120
2 Cat 0 0 10
3 Mat 14 144 143
4 John 0 10 30
5 Messi 0 0 0
6 John 0 10 30
7 Ronaldo 0 0 0
8 Shannon 0 0 0
9 Catthy 0 0 0
10 Ray 0 22 0
11 Total 28 186 555
英文:
You can use pd.concat to merge df1, df2 and df3 then reindex by the Seller column of master dataframe (dfM):
# 1st pass: aggregate all dataframes whatever the dates
out = (pd.concat([df1, df2, df3])
.pivot_table(index='Seller', columns='Date', values='Vaule',
sort=False, fill_value=0)
.reindex(dfM['Seller'], fill_value=0))
# 2nd pass: reorganize the output
out = (pd.concat([out, out.sum().to_frame('Total').T])
.sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
.add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())
Output:
>>> out
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2 0 222
1 Om 12 0 120
2 Cat 0 0 10
3 Mat 14 144 143
4 John 0 10 30
5 Messi 0 0 0
6 John 0 10 30
7 Ronaldo 0 0 0
8 Shannon 0 0 0
9 Catthy 0 0 0
10 Ray 0 22 0
11 Total 28 186 555
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论