从主数据框中查找值并总结。

huangapple go评论86阅读模式
英文:

Vlookup value from master data frame and summarize it

问题

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller Value Date Rating
Hari 2 Jan-02 4.2
Om 12 Jan-02 3.4
Cat 0 Jan-02 4.3
Mat 14 Jan-02 4.3

DF2

Seller Value Date Rating
Hari 222 Oct-02 4
Om 120 Oct-02 3
Cat 10 Oct-02 2.3
Mat 143 Oct-02 2.5
John 30 Oct-02 4.3

DF3

Seller Value Date Rating
Ray 22 Feb-02 2.4
Jay 12 Feb-02 2.5
Sohan 0 Feb-02 2.2
Mat 144 Feb-02 5
John 10 Feb-02 3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
Hari 2 222
Om 12 120
Cat 0 10
Mat 14 144 143
John 30
Messi
John 10
Ronaldo
Shannon
Catthy
Ray 22
Total 28 176 515

Note - Column should be ordered by date

英文:

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller Vaule Date Rating
Hari 2 Jan-02 4.2
Om 12 Jan-02 3.4
Cat 0 Jan-02 4.3
Mat 14 Jan-02 4.3

DF2

Seller Vaule Date Rating
Hari 222 Oct-02 4
Om 120 Oct-02 3
Cat 10 Oct-02 2.3
Mat 143 Oct-02 2.5
John 30 Oct-02 4.3

DF3

Seller Vaule Date Rating
Ray 22 Feb-02 2.4
Jay 12 Feb-02 2.5
Sohan 0 Feb-02 2.2
Mat 144 Feb-02 5
John 10 Feb-02 3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
Hari 2 222
Om 12 120
Cat 0 10
Mat 14 144 143
John 30
Messi
John 10
Ronaldo
Shannon
Catthy
Ray 22
Total 28 176 515

Note - Column should be ordered by date

答案1

得分: 2

以下是您提供的代码的中文翻译:

与 @Corralien 的方法相同,但使用了 margins 和排序列:

  1. out = (pd.concat([df1, df2, df3])
  2. .pivot_table(index="Seller", values="Value", columns="Date",
  3. aggfunc="sum", margins=True, margins_name="Total", sort=False)
  4. .drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
  5. .sort_index(key=lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
  6. .add_prefix("Value").reset_index().rename_axis(columns=None)
  7. )
  8. 输出
  9. print(out)
  10. Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
  11. 0 Hari 2.00 NaN 222.00
  12. 1 Om 12.00 NaN 120.00
  13. 2 Cat 0.00 NaN 10.00
  14. 3 Mat 14.00 144.00 143.00
  15. 4 John NaN 10.00 30.00
  16. 5 Messi NaN NaN NaN
  17. 6 John NaN 10.00 30.00
  18. 7 Ronaldo NaN NaN NaN
  19. 8 Shannon NaN NaN NaN
  20. 9 Catthy NaN NaN NaN
  21. 10 Ray NaN 22.00 NaN
  22. 11 Total 28.00 188.00 525.00
英文:

Same approach as @Corralien's but with margins and sorted columns :

  1. out = (pd.concat([df1, df2, df3])
  2. .pivot_table(index="Seller", values="Vaule", columns="Date",
  3. aggfunc="sum", margins=True, margins_name="Total", sort=False)
  4. .drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
  5. .sort_index(key= lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
  6. .add_prefix("Value").reset_index().rename_axis(columns=None)
  7. )

Output :

  1. print(out)
  2. Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
  3. 0 Hari 2.00 NaN 222.00
  4. 1 Om 12.00 NaN 120.00
  5. 2 Cat 0.00 NaN 10.00
  6. 3 Mat 14.00 144.00 143.00
  7. 4 John NaN 10.00 30.00
  8. 5 Messi NaN NaN NaN
  9. 6 John NaN 10.00 30.00
  10. 7 Ronaldo NaN NaN NaN
  11. 8 Shannon NaN NaN NaN
  12. 9 Catthy NaN NaN NaN
  13. 10 Ray NaN 22.00 NaN
  14. 11 Total 28.00 188.00 525.00

答案2

得分: 1

你可以使用 pd.concat 来合并 df1df2df3,然后按主数据框 (dfM) 的 Seller 列重新索引:

  1. # 第一步:合并所有数据框,不论日期如何
  2. out = (pd.concat([df1, df2, df3])
  3. .pivot_table(index='Seller', columns='Date', values='Value',
  4. sort=False, fill_value=0)
  5. .reindex(dfM['Seller'], fill_value=0))
  6. # 第二步:重新组织输出
  7. out = (pd.concat([out, out.sum().to_frame('Total').T])
  8. .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
  9. .add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())

输出:

  1. >>> out
  2. Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
  3. 0 Hari 2 0 222
  4. 1 Om 12 0 120
  5. 2 Cat 0 0 10
  6. 3 Mat 14 144 143
  7. 4 John 0 10 30
  8. 5 Messi 0 0 0
  9. 6 John 0 10 30
  10. 7 Ronaldo 0 0 0
  11. 8 Shannon 0 0 0
  12. 9 Catthy 0 0 0
  13. 10 Ray 0 22 0
  14. 11 Total 28 186 555
英文:

You can use pd.concat to merge df1, df2 and df3 then reindex by the Seller column of master dataframe (dfM):

  1. # 1st pass: aggregate all dataframes whatever the dates
  2. out = (pd.concat([df1, df2, df3])
  3. .pivot_table(index='Seller', columns='Date', values='Vaule',
  4. sort=False, fill_value=0)
  5. .reindex(dfM['Seller'], fill_value=0))
  6. # 2nd pass: reorganize the output
  7. out = (pd.concat([out, out.sum().to_frame('Total').T])
  8. .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
  9. .add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())

Output:

  1. >>> out
  2. Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
  3. 0 Hari 2 0 222
  4. 1 Om 12 0 120
  5. 2 Cat 0 0 10
  6. 3 Mat 14 144 143
  7. 4 John 0 10 30
  8. 5 Messi 0 0 0
  9. 6 John 0 10 30
  10. 7 Ronaldo 0 0 0
  11. 8 Shannon 0 0 0
  12. 9 Catthy 0 0 0
  13. 10 Ray 0 22 0
  14. 11 Total 28 186 555

huangapple
  • 本文由 发表于 2023年5月11日 17:00:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225855.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定