2023年5月11日 17:00:47go评论86阅读模式

英文:

Vlookup value from master data frame and summarize it

问题

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller	Value	Date	Rating
Hari	2	Jan-02	4.2
Om	12	Jan-02	3.4
Cat	0	Jan-02	4.3
Mat	14	Jan-02	4.3

DF2

Seller	Value	Date	Rating
Hari	222	Oct-02	4
Om	120	Oct-02	3
Cat	10	Oct-02	2.3
Mat	143	Oct-02	2.5
John	30	Oct-02	4.3

DF3

Seller	Value	Date	Rating
Ray	22	Feb-02	2.4
Jay	12	Feb-02	2.5
Sohan	0	Feb-02	2.2
Mat	144	Feb-02	5
John	10	Feb-02	3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller	Value_Jan-02	Value_Feb-02	Value_Oct-02
Hari	2		222
Om	12		120
Cat	0		10
Mat	14	144	143
John			30
Messi
John		10
Ronaldo
Shannon
Catthy
Ray		22
Total	28	176	515

Note - Column should be ordered by date

英文:

I have a unique data frame having a list of seller names. Below is the sample.

DF_Master_seller

Seller
Hari
Om
Cat
Mat
John
Messi
John
Ronaldo
Shannon
Catthy
Ray

I have multiple data frames like -

DF1

Seller	Vaule	Date	Rating
Hari	2	Jan-02	4.2
Om	12	Jan-02	3.4
Cat	0	Jan-02	4.3
Mat	14	Jan-02	4.3

DF2

Seller	Vaule	Date	Rating
Hari	222	Oct-02	4
Om	120	Oct-02	3
Cat	10	Oct-02	2.3
Mat	143	Oct-02	2.5
John	30	Oct-02	4.3

DF3

Seller	Vaule	Date	Rating
Ray	22	Feb-02	2.4
Jay	12	Feb-02	2.5
Sohan	0	Feb-02	2.2
Mat	144	Feb-02	5
John	10	Feb-02	3.4

Search each data frame in DF_Master_seller, for each match get the value and calculate the total at last.

Final DF

Seller	Value_Jan-02	Value_Feb-02	Value_Oct-02
Hari	2		222
Om	12		120
Cat	0		10
Mat	14	144	143
John			30
Messi
John		10
Ronaldo
Shannon
Catthy
Ray		22
Total	28	176	515

Note - Column should be ordered by date

答案1

得分: 2

以下是您提供的代码的中文翻译：

与 @Corralien 的方法相同，但使用了 margins 和排序列：

out = (pd.concat([df1, df2, df3])
        .pivot_table(index="Seller", values="Value", columns="Date",
                     aggfunc="sum", margins=True, margins_name="Total", sort=False)
        .drop(columns="Total").reindex(list(df["Seller"]) + ["Total"])
        .sort_index(key=lambda x: pd.to_datetime(x, format="%b-%d"), axis=1)
        .add_prefix("Value").reset_index().rename_axis(columns=None)
)
输出：
print(out)
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari          2.00           NaN        222.00
1        Om         12.00           NaN        120.00
2       Cat          0.00           NaN         10.00
3       Mat         14.00        144.00        143.00
4      John           NaN         10.00         30.00
5     Messi           NaN           NaN           NaN
6      John           NaN         10.00         30.00
7   Ronaldo           NaN           NaN           NaN
8   Shannon           NaN           NaN           NaN
9    Catthy           NaN           NaN           NaN
10      Ray           NaN         22.00           NaN
11    Total         28.00        188.00        525.00

英文:

Same approach as @Corralien's but with margins and sorted columns :

out = (pd.concat([df1, df2, df3])
        .pivot_table(index=&quot;Seller&quot;, values=&quot;Vaule&quot;, columns=&quot;Date&quot;,
                     aggfunc=&quot;sum&quot;, margins=True, margins_name=&quot;Total&quot;, sort=False)
        .drop(columns=&quot;Total&quot;).reindex(list(df[&quot;Seller&quot;]) + [&quot;Total&quot;])
        .sort_index(key= lambda x: pd.to_datetime(x, format=&quot;%b-%d&quot;), axis=1)
        .add_prefix(&quot;Value&quot;).reset_index().rename_axis(columns=None)
)

Output :

print(out)
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari          2.00           NaN        222.00
1        Om         12.00           NaN        120.00
2       Cat          0.00           NaN         10.00
3       Mat         14.00        144.00        143.00
4      John           NaN         10.00         30.00
5     Messi           NaN           NaN           NaN
6      John           NaN         10.00         30.00
7   Ronaldo           NaN           NaN           NaN
8   Shannon           NaN           NaN           NaN
9    Catthy           NaN           NaN           NaN
10      Ray           NaN         22.00           NaN
11    Total         28.00        188.00        525.00

答案2

得分: 1

你可以使用 pd.concat 来合并 df1、df2 和 df3，然后按主数据框 (dfM) 的 Seller 列重新索引：

# 第一步：合并所有数据框，不论日期如何
out = (pd.concat([df1, df2, df3])
         .pivot_table(index='Seller', columns='Date', values='Value', 
                      sort=False, fill_value=0)
         .reindex(dfM['Seller'], fill_value=0))
# 第二步：重新组织输出
out = (pd.concat([out, out.sum().to_frame('Total').T])
         .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format='%b-%y'))
         .add_prefix('Value_').rename_axis(index='Seller', columns=None).reset_index())

输出:

>>> out
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari             2             0           222
1        Om            12             0           120
2       Cat             0             0            10
3       Mat            14           144           143
4      John             0            10            30
5     Messi             0             0             0
6      John             0            10            30
7   Ronaldo             0             0             0
8   Shannon             0             0             0
9    Catthy             0             0             0
10      Ray             0            22             0
11    Total            28           186           555

英文:

You can use pd.concat to merge df1, df2 and df3 then reindex by the Seller column of master dataframe (dfM):

# 1st pass: aggregate all dataframes whatever the dates
out = (pd.concat([df1, df2, df3])
         .pivot_table(index=&#39;Seller&#39;, columns=&#39;Date&#39;, values=&#39;Vaule&#39;, 
                      sort=False, fill_value=0)
         .reindex(dfM[&#39;Seller&#39;], fill_value=0))
# 2nd pass: reorganize the output
out = (pd.concat([out, out.sum().to_frame(&#39;Total&#39;).T])
         .sort_index(axis=1, key=lambda x: pd.to_datetime(x, format=&#39;%b-%y&#39;))
         .add_prefix(&#39;Value_&#39;).rename_axis(index=&#39;Seller&#39;, columns=None).reset_index())

Output:

&gt;&gt;&gt; out
     Seller  Value_Jan-02  Value_Feb-02  Value_Oct-02
0      Hari             2             0           222
1        Om            12             0           120
2       Cat             0             0            10
3       Mat            14           144           143
4      John             0            10            30
5     Messi             0             0             0
6      John             0            10            30
7   Ronaldo             0             0             0
8   Shannon             0             0             0
9    Catthy             0             0             0
10      Ray             0            22             0
11    Total            28           186           555

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从主数据框中查找值并总结。

问题

答案1

答案2

cget 在 tkinter 中的成员测试中在哪里？

如何根据多个条件删除数据框的行

在Python 3中如何在输入后面添加字符串？

Python mariadb-connector function returns empty cursor.fetchall() on 252nd iteration with different WHERE clauses

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。