2023年1月9日 05:23:10go评论106阅读模式

英文:

Iterating over rows and groups in dataframe

问题

假设我有以下数据框：

d = {'Date': ['2020-1-1', '2020-1-2', '2020-1-3', '2020-1-1', '2020-1-2', '2020-1-3', '2020-1-1', '2020-1-2', '2020-1-3'],
     'col2': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
     'col3': [0.01, 0.02, 0.03, 0.02, 0.03, 0.04, 0.05, 0.1, 0.01]}
d = pd.DataFrame(data=d)
d['Date'] = pd.to_datetime(d['Date'])

并且得到了以下数据框：

        Date    col2  col3
0  2020-01-01       A  0.01
1  2020-01-02       A  0.02
2  2020-01-03       A  0.03
3  2020-01-01       B  0.02
4  2020-01-02       B  0.03
5  2020-01-03       B  0.04
6  2020-01-01       C  0.05
7  2020-01-02       C  0.10
8  2020-01-03       C  0.01

你想要遍历行，以便对于每个Date获取col3的两个最高值，并按col2分组。例如，你应该得到以下结果：

        Date col2  col3
3  2020-01-01    B  0.02
6  2020-01-01    C  0.05
4  2020-01-02    B  0.03
7  2020-01-02    C  0.10
2  2020-01-03    A  0.03
5  2020-01-03    B  0.04

最后，对于每一天，你想要计算col3的总和，如下所示：

         Date   sum
0  2020-01-01  0.07
1  2020-01-02  0.13
2  2020-01-03  0.07

请注意，实际示例中可能有更多的组（col2）和更多的日期。

英文:

Suppose I have following data frame:

d = {&#39;Date&#39;: [&#39;2020-1-1&#39;, &#39;2020-1-2&#39;, &#39;2020-1-3&#39;, &#39;2020-1-1&#39;, &#39;2020-1-2&#39;, 
              &#39;2020-1-3&#39;,&#39;2020-1-1&#39;, &#39;2020-1-2&#39;, &#39;2020-1-3&#39;], &#39;col2&#39;: [&#39;A&#39;,&#39;A&#39;,&#39;A&#39;, &#39;B&#39;,
                                                    &#39;B&#39;,&#39;B&#39;, &#39;C&#39;,&#39;C&#39;,&#39;C&#39;],
     &#39;col3&#39;:[0.01,0.02,0.03,0.02,0.03,0.04,0.05,0.1,0.01]}
d = pd.DataFrame(data=d)
d[&#39;Date&#39;] = pd.to_datetime(d[&#39;Date&#39;])
d

and get:

    Date	  col2	col3
0	2020-01-01	A	0.01
1	2020-01-02	A	0.02
2	2020-01-03	A	0.03
3	2020-01-01	B	0.02
4	2020-01-02	B	0.03
5	2020-01-03	B	0.04
6	2020-01-01	C	0.05
7	2020-01-02	C	0.10
8	2020-01-03	C	0.01

How could I iterate over rows, so that I get for each Date the 2 highest values of col3 and group from col2?
So for example I should get:

2020-01-01	B	0.02
2020-01-01	C	0.05
2020-01-02	B	0.03
2020-01-02	C	0.10
2020-01-03	A	0.03
2020-01-03	B	0.04

And at the end to sum for each day col 3:

Date        sum
2020-01-01	0.07
2020-01-02  0.13
2020-01-03  0.07

But of course real example has much more groups (col2) and more dates.

答案1

得分: 1

我认为这样：
    d[['日期', 'col3']].groupby('日期')['col3'].nlargest(2).groupby('日期').sum()

英文:

I think this:

d[[&#39;Date&#39;, &#39;col3&#39;]].groupby(&#39;Date&#39;)[&#39;col3&#39;].nlargest(2).groupby(&#39;Date&#39;).sum()

答案2

得分: 1

grouped_df = d.groupby("日期")
results = []
for name, group in grouped_df:
    group.sort_values(by="col3", ascending=False, inplace=True)
    top_2 = group.nlargest(2, "col3")
    top_2_sum = top_2["col3"].sum()
    results.append((name, top_2_sum))
sum_df = pd.DataFrame(results, columns=["日期", "总和"])
print(sum_df)

Output:

         日期    总和
0  2020-01-01  0.07
1  2020-01-02  0.13
2  2020-01-03  0.07

英文:

grouped_df = d.groupby(&quot;Date&quot;)
results = []
for name, group in grouped_df:
    group.sort_values(by=&quot;col3&quot;, ascending=False, inplace=True)
    top_2 = group.nlargest(2, &quot;col3&quot;)
    top_2_sum = top_2[&quot;col3&quot;].sum()
    results.append((name, top_2_sum))
sum_df = pd.DataFrame(results, columns=[&quot;Date&quot;, &quot;Sum&quot;])
print(sum_df)

Output:

        Date   Sum
0 2020-01-01  0.07
1 2020-01-02  0.13
2 2020-01-03  0.07

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框中迭代行和分组

问题

答案1

答案2

开发者令牌在此项目中不被允许 – Google广告API

sklearnex（sklearn-intel-extension）真的支持线性回归吗？

用另一个数据框中的值替换数据框中的逗号分隔值。

将不同长度的列分配给数据框

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。