2020年1月3日 23:07:04go评论103阅读模式

英文:

How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?

问题

我正在尝试根据另一个 pandas.Series 的值和索引将数据框的列分组在一起。该 Series 的索引指的是数据框的列，但它可能包含更多的元素。有什么最好的 Pythonic 方法可以做到这一点？

为了进一步明确，以下是我试图解决的单元测试（使用 pytest）：

def test_sum_weights_by_classification_labels_default_arguments():
    portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
                                      [0.25, 0.3, 0.25, 0.2],
                                      [0.2, 0.3, 0.1, 0.4]],
                                     index=['2001-01-02', '2001-01-03', '2001-01-04'],
                                     columns=['ABC', 'DEF', 'UVW', 'XYZ'])
    security_classification = pd.Series(['Consumer', 'Energy', 'Consumer', 'Materials', 'Financials', 'Energy'],
                                        index=['ABC', 'DEF', 'GHI', 'RST', 'UVW', 'XYZ'],
                                        name='Classification')
    result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
                                          [0.25, 0.5, 0.25],
                                          [0.2, 0.7, 0.1]],
                                         index=['2001-01-02', '2001-01-03', '2001-01-04'],
                                         columns=['Consumer', 'Energy', 'Financials'])
    pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
                                  result_sector_weights)

非常感谢！

英文:

I'm trying to group by a dataframe's columns together based on another pandas.Series' values and indexes. The Series' indexes refer to the DataFrame's columns but there could be more elements to it. What is the best pythonic way to do this?

For further clarity, here's the unit test I'm trying to resolve (using pytest):

    def test_sum_weights_by_classification_labels_default_arguments():
    portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
                                      [0.25, 0.3, 0.25, 0.2],
                                      [0.2, 0.3, 0.1, 0.4]],
                                     index=[&#39;2001-01-02&#39;, &#39;2001-01-03&#39;, &#39;2001-01-04&#39;],
                                     columns=[&#39;ABC&#39;, &#39;DEF&#39;, &#39;UVW&#39;, &#39;XYZ&#39;])
    security_classification = pd.Series([&#39;Consumer&#39;, &#39;Energy&#39;, &#39;Consumer&#39;, &#39;Materials&#39;, &#39;Financials&#39;, &#39;Energy&#39;],
                                        index=[&#39;ABC&#39;, &#39;DEF&#39;, &#39;GHI&#39;, &#39;RST&#39;, &#39;UVW&#39;, &#39;XYZ&#39;],
                                        name=&#39;Classification&#39;)
    result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
                                          [0.25, 0.5, 0.25],
                                          [0.2, 0.7, 0.1]],
                                         index=[&#39;2001-01-02&#39;, &#39;2001-01-03&#39;, &#39;2001-01-04&#39;],
                                         columns=[&#39;Consumer&#39;, &#39;Energy&#39;, &#39;Financials&#39;])
    pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
                                  result_sector_weights)

Many thanks in advance!

答案1

得分: 0

以下是翻译好的内容：

使用pandas.Series.map的解决方案：

def sum_weights_by_classification_labels(security_weights, security_classification):
    classification_weights = security_weights.copy()
    classification_weights.columns = classification_weights.columns.map(security_classification)
    classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
    return classification_weights

或者使用pandas.DataFrame.merge的解决方案：

def sum_weights_by_classification_labels(security_weights, security_classification):
    
    security_weights_transposed = security_weights.transpose()
    merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True, 
                                                    right_index=True)
    classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
    return classification_weights

对于第二种解决方案，需要在单元测试中添加以下行，因为不能合并没有名称的Series（添加的列需要有一个名称）：

result_sector_weights.columns.name = security_classification.name

希望这对将来有所帮助。

英文:

After further research, I have found a solution. Here's what I came up with using pandas.Series.map on the DataFrame's columns:

def sum_weights_by_classification_labels(security_weights, security_classification):
    classification_weights = security_weights.copy()
    classification_weights.columns = classification_weights.columns.map(security_classification)
    classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
    return classification_weights

Alternatively using pandas.DataFrame.merge:

def sum_weights_by_classification_labels(security_weights, security_classification):
    
    security_weights_transposed = security_weights.transpose()
    merged_data = security_weights_transposed.merge(security_classification, how=&#39;left&#39;, left_index=True, 
                                                    right_index=True)
    classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
    return classification_weights

And for the second solution need to add this line to the unit test because cannot merge a Series without a name (the added column needs to have one):

result_sector_weights.columns.name = security_classification.name

I'm keeping this post hoping it might help someone in the future.

This is the way...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据另一个pandas.Series的索引和值对pandas.Dataframe的列进行分组？

问题

答案1

如何在Tkinter中将两个小部件在同一行中分散到窗口的两端？

如何一次保存多个CSV文件，并更改它们的标题？

在Python中更改方程的位置

shutil.move 无法在不同驱动器之间移动

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。