从主数据框中查找值并总结。

huangapple go评论65阅读模式
英文:

Vlookup value from master data frame and summarize it

问题

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller Value Date Rating
Hari 2 Jan-02 4.2
Om 12 Jan-02 3.4
Cat 0 Jan-02 4.3
Mat 14 Jan-02 4.3

DF2

Seller Value Date Rating
Hari 222 Oct-02 4
Om 120 Oct-02 3
Cat 10 Oct-02 2.3
Mat 143 Oct-02 2.5
John 30 Oct-02 4.3

DF3

Seller Value Date Rating
Ray 22 Feb-02 2.4
Jay 12 Feb-02 2.5
Sohan 0 Feb-02 2.2
Mat 144 Feb-02 5
John 10 Feb-02 3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
Hari 2 222
Om 12 120
Cat 0 10
Mat 14 144 143
John 30
Messi
John 10
Ronaldo
Shannon
Catthy
Ray 22
Total 28 176 515

Note - Column should be ordered by date

英文:

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller Vaule Date Rating
Hari 2 Jan-02 4.2
Om 12 Jan-02 3.4
Cat 0 Jan-02 4.3
Mat 14 Jan-02 4.3

DF2

Seller Vaule Date Rating
Hari 222 Oct-02 4
Om 120 Oct-02 3
Cat 10 Oct-02 2.3
Mat 143 Oct-02 2.5
John 30 Oct-02 4.3

DF3

Seller Vaule Date Rating
Ray 22 Feb-02 2.4
Jay 12 Feb-02 2.5
Sohan 0 Feb-02 2.2
Mat 144 Feb-02 5
John 10 Feb-02 3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller Value_Jan-02 Value_Feb-02 Value_Oct-02
Hari 2 222
Om 12 120
Cat 0 10
Mat 14 144 143
John 30
Messi
John 10
Ronaldo
Shannon
Catthy
Ray 22
Total 28 176 515

Note - Column should be ordered by date

答案1

得分: 2

以下是您提供的代码的中文翻译:

与 @Corralien 的方法相同,但使用了 margins 和排序列:

out = (pd.concat([df1, df2, df3])
        .pivot_table(index="Seller", values="Value", columns="Date",
                     aggfunc="sum", margins=True, margins_name="Total", sort=False)
        .drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
        .sort_index(key=lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
        .add_prefix("Value").reset_index().rename_axis(columns=None)
)

输出

print(out)

     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari          2.00           NaN        222.00
1        Om         12.00           NaN        120.00
2       Cat          0.00           NaN         10.00
3       Mat         14.00        144.00        143.00
4      John           NaN         10.00         30.00
5     Messi           NaN           NaN           NaN
6      John           NaN         10.00         30.00
7   Ronaldo           NaN           NaN           NaN
8   Shannon           NaN           NaN           NaN
9    Catthy           NaN           NaN           NaN
10      Ray           NaN         22.00           NaN
11    Total         28.00        188.00        525.00
英文:

Same approach as @Corralien's but with margins and sorted columns :

out = (pd.concat([df1, df2, df3])
        .pivot_table(index="Seller", values="Vaule", columns="Date",
                     aggfunc="sum", margins=True, margins_name="Total", sort=False)
        .drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
        .sort_index(key= lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
        .add_prefix("Value").reset_index().rename_axis(columns=None)
)

Output :

print(out)

     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari          2.00           NaN        222.00
1        Om         12.00           NaN        120.00
2       Cat          0.00           NaN         10.00
3       Mat         14.00        144.00        143.00
4      John           NaN         10.00         30.00
5     Messi           NaN           NaN           NaN
6      John           NaN         10.00         30.00
7   Ronaldo           NaN           NaN           NaN
8   Shannon           NaN           NaN           NaN
9    Catthy           NaN           NaN           NaN
10      Ray           NaN         22.00           NaN
11    Total         28.00        188.00        525.00

答案2

得分: 1

你可以使用 pd.concat 来合并 df1df2df3,然后按主数据框 (dfM) 的 Seller 列重新索引:

# 第一步:合并所有数据框,不论日期如何
out = (pd.concat([df1, df2, df3])
         .pivot_table(index='Seller', columns='Date', values='Value', 
                      sort=False, fill_value=0)
         .reindex(dfM['Seller'], fill_value=0))

# 第二步:重新组织输出
out = (pd.concat([out, out.sum().to_frame('Total').T])
         .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
         .add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())

输出:

>>> out
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari             2             0           222
1        Om            12             0           120
2       Cat             0             0            10
3       Mat            14           144           143
4      John             0            10            30
5     Messi             0             0             0
6      John             0            10            30
7   Ronaldo             0             0             0
8   Shannon             0             0             0
9    Catthy             0             0             0
10      Ray             0            22             0
11    Total            28           186           555
英文:

You can use pd.concat to merge df1, df2 and df3 then reindex by the Seller column of master dataframe (dfM):

# 1st pass: aggregate all dataframes whatever the dates
out = (pd.concat([df1, df2, df3])
         .pivot_table(index='Seller', columns='Date', values='Vaule', 
                      sort=False, fill_value=0)
         .reindex(dfM['Seller'], fill_value=0))

# 2nd pass: reorganize the output
out = (pd.concat([out, out.sum().to_frame('Total').T])
         .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
         .add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())

Output:

>>> out
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari             2             0           222
1        Om            12             0           120
2       Cat             0             0            10
3       Mat            14           144           143
4      John             0            10            30
5     Messi             0             0             0
6      John             0            10            30
7   Ronaldo             0             0             0
8   Shannon             0             0             0
9    Catthy             0             0             0
10      Ray             0            22             0
11    Total            28           186           555

huangapple
  • 本文由 发表于 2023年5月11日 17:00:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225855.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定