英文:
Vlookup value from master data frame and summarize it
问题
I have a unique data frame having a list of seller names. Below is the sample.
DF_Master_seller
Seller |
---|
Hari |
Om |
Cat |
Mat |
John |
Messi |
John |
Ronaldo |
Shannon |
Catthy |
Ray |
I have multiple data frames like -
DF1
Seller | Value | Date | Rating |
---|---|---|---|
Hari | 2 | Jan-02 | 4.2 |
Om | 12 | Jan-02 | 3.4 |
Cat | 0 | Jan-02 | 4.3 |
Mat | 14 | Jan-02 | 4.3 |
DF2
Seller | Value | Date | Rating |
---|---|---|---|
Hari | 222 | Oct-02 | 4 |
Om | 120 | Oct-02 | 3 |
Cat | 10 | Oct-02 | 2.3 |
Mat | 143 | Oct-02 | 2.5 |
John | 30 | Oct-02 | 4.3 |
DF3
Seller | Value | Date | Rating |
---|---|---|---|
Ray | 22 | Feb-02 | 2.4 |
Jay | 12 | Feb-02 | 2.5 |
Sohan | 0 | Feb-02 | 2.2 |
Mat | 144 | Feb-02 | 5 |
John | 10 | Feb-02 | 3.4 |
Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.
Final DF
Seller | Value_Jan-02 | Value_Feb-02 | Value_Oct-02 |
---|---|---|---|
Hari | 2 | 222 | |
Om | 12 | 120 | |
Cat | 0 | 10 | |
Mat | 14 | 144 | 143 |
John | 30 | ||
Messi | |||
John | 10 | ||
Ronaldo | |||
Shannon | |||
Catthy | |||
Ray | 22 | ||
Total | 28 | 176 | 515 |
Note - Column should be ordered by date
英文:
I have a unique data frame having a list of seller names. Below is the sample.
DF_Master_seller
Seller |
---|
Hari |
Om |
Cat |
Mat |
John |
Messi |
John |
Ronaldo |
Shannon |
Catthy |
Ray |
I have multiple data frames like -
DF1
Seller | Vaule | Date | Rating |
---|---|---|---|
Hari | 2 | Jan-02 | 4.2 |
Om | 12 | Jan-02 | 3.4 |
Cat | 0 | Jan-02 | 4.3 |
Mat | 14 | Jan-02 | 4.3 |
DF2
Seller | Vaule | Date | Rating |
---|---|---|---|
Hari | 222 | Oct-02 | 4 |
Om | 120 | Oct-02 | 3 |
Cat | 10 | Oct-02 | 2.3 |
Mat | 143 | Oct-02 | 2.5 |
John | 30 | Oct-02 | 4.3 |
DF3
Seller | Vaule | Date | Rating |
---|---|---|---|
Ray | 22 | Feb-02 | 2.4 |
Jay | 12 | Feb-02 | 2.5 |
Sohan | 0 | Feb-02 | 2.2 |
Mat | 144 | Feb-02 | 5 |
John | 10 | Feb-02 | 3.4 |
Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.
Final DF
Seller | Value_Jan-02 | Value_Feb-02 | Value_Oct-02 |
---|---|---|---|
Hari | 2 | 222 | |
Om | 12 | 120 | |
Cat | 0 | 10 | |
Mat | 14 | 144 | 143 |
John | 30 | ||
Messi | |||
John | 10 | ||
Ronaldo | |||
Shannon | |||
Catthy | |||
Ray | 22 | ||
Total | 28 | 176 | 515 |
Note - Column should be ordered by date
答案1
得分: 2
以下是您提供的代码的中文翻译:
与 @Corralien 的方法相同,但使用了 margins
和排序列:
out = (pd.concat([df1, df2, df3])
.pivot_table(index="Seller", values="Value", columns="Date",
aggfunc="sum", margins=True, margins_name="Total", sort=False)
.drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
.sort_index(key=lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
.add_prefix("Value").reset_index().rename_axis(columns=None)
)
输出:
print(out)
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2.00 NaN 222.00
1 Om 12.00 NaN 120.00
2 Cat 0.00 NaN 10.00
3 Mat 14.00 144.00 143.00
4 John NaN 10.00 30.00
5 Messi NaN NaN NaN
6 John NaN 10.00 30.00
7 Ronaldo NaN NaN NaN
8 Shannon NaN NaN NaN
9 Catthy NaN NaN NaN
10 Ray NaN 22.00 NaN
11 Total 28.00 188.00 525.00
英文:
Same approach as @Corralien's but with margins
and sorted columns :
out = (pd.concat([df1, df2, df3])
.pivot_table(index="Seller", values="Vaule", columns="Date",
aggfunc="sum", margins=True, margins_name="Total", sort=False)
.drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
.sort_index(key= lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
.add_prefix("Value").reset_index().rename_axis(columns=None)
)
Output :
print(out)
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2.00 NaN 222.00
1 Om 12.00 NaN 120.00
2 Cat 0.00 NaN 10.00
3 Mat 14.00 144.00 143.00
4 John NaN 10.00 30.00
5 Messi NaN NaN NaN
6 John NaN 10.00 30.00
7 Ronaldo NaN NaN NaN
8 Shannon NaN NaN NaN
9 Catthy NaN NaN NaN
10 Ray NaN 22.00 NaN
11 Total 28.00 188.00 525.00
答案2
得分: 1
你可以使用 pd.concat
来合并 df1
、df2
和 df3
,然后按主数据框 (dfM
) 的 Seller
列重新索引:
# 第一步:合并所有数据框,不论日期如何
out = (pd.concat([df1, df2, df3])
.pivot_table(index='Seller', columns='Date', values='Value',
sort=False, fill_value=0)
.reindex(dfM['Seller'], fill_value=0))
# 第二步:重新组织输出
out = (pd.concat([out, out.sum().to_frame('Total').T])
.sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
.add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())
输出:
>>> out
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2 0 222
1 Om 12 0 120
2 Cat 0 0 10
3 Mat 14 144 143
4 John 0 10 30
5 Messi 0 0 0
6 John 0 10 30
7 Ronaldo 0 0 0
8 Shannon 0 0 0
9 Catthy 0 0 0
10 Ray 0 22 0
11 Total 28 186 555
英文:
You can use pd.concat
to merge df1
, df2
and df3
then reindex by the Seller
column of master dataframe (dfM
):
# 1st pass: aggregate all dataframes whatever the dates
out = (pd.concat([df1, df2, df3])
.pivot_table(index='Seller', columns='Date', values='Vaule',
sort=False, fill_value=0)
.reindex(dfM['Seller'], fill_value=0))
# 2nd pass: reorganize the output
out = (pd.concat([out, out.sum().to_frame('Total').T])
.sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
.add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())
Output:
>>> out
Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
0 Hari 2 0 222
1 Om 12 0 120
2 Cat 0 0 10
3 Mat 14 144 143
4 John 0 10 30
5 Messi 0 0 0
6 John 0 10 30
7 Ronaldo 0 0 0
8 Shannon 0 0 0
9 Catthy 0 0 0
10 Ray 0 22 0
11 Total 28 186 555
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论