英文:
How to group by a pandas.Dataframe's columns based on the indexes and values of another pandas.Series?
问题
我正在尝试根据另一个 pandas.Series 的值和索引将数据框的列分组在一起。该 Series 的索引指的是数据框的列,但它可能包含更多的元素。有什么最好的 Pythonic 方法可以做到这一点?
为了进一步明确,以下是我试图解决的单元测试(使用 pytest):
def test_sum_weights_by_classification_labels_default_arguments():
portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
[0.25, 0.3, 0.25, 0.2],
[0.2, 0.3, 0.1, 0.4]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['ABC', 'DEF', 'UVW', 'XYZ'])
security_classification = pd.Series(['Consumer', 'Energy', 'Consumer', 'Materials', 'Financials', 'Energy'],
index=['ABC', 'DEF', 'GHI', 'RST', 'UVW', 'XYZ'],
name='Classification')
result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
[0.25, 0.5, 0.25],
[0.2, 0.7, 0.1]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['Consumer', 'Energy', 'Financials'])
pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
result_sector_weights)
非常感谢!
英文:
I'm trying to group by a dataframe's columns together based on another pandas.Series' values and indexes. The Series' indexes refer to the DataFrame's columns but there could be more elements to it. What is the best pythonic way to do this?
For further clarity, here's the unit test I'm trying to resolve (using pytest):
def test_sum_weights_by_classification_labels_default_arguments():
portfolio_weights = pd.DataFrame([[0.1, 0.3, 0.4, 0.2],
[0.25, 0.3, 0.25, 0.2],
[0.2, 0.3, 0.1, 0.4]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['ABC', 'DEF', 'UVW', 'XYZ'])
security_classification = pd.Series(['Consumer', 'Energy', 'Consumer', 'Materials', 'Financials', 'Energy'],
index=['ABC', 'DEF', 'GHI', 'RST', 'UVW', 'XYZ'],
name='Classification')
result_sector_weights = pd.DataFrame([[0.1, 0.5, 0.4],
[0.25, 0.5, 0.25],
[0.2, 0.7, 0.1]],
index=['2001-01-02', '2001-01-03', '2001-01-04'],
columns=['Consumer', 'Energy', 'Financials'])
pd.testing.assert_frame_equal(clb.sum_weights_by_classification_labels(portfolio_weights, security_classification),
result_sector_weights)
Many thanks in advance!
答案1
得分: 0
以下是翻译好的内容:
使用pandas.Series.map
的解决方案:
def sum_weights_by_classification_labels(security_weights, security_classification):
classification_weights = security_weights.copy()
classification_weights.columns = classification_weights.columns.map(security_classification)
classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
return classification_weights
或者使用pandas.DataFrame.merge
的解决方案:
def sum_weights_by_classification_labels(security_weights, security_classification):
security_weights_transposed = security_weights.transpose()
merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True,
right_index=True)
classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
return classification_weights
对于第二种解决方案,需要在单元测试中添加以下行,因为不能合并没有名称的Series(添加的列需要有一个名称):
result_sector_weights.columns.name = security_classification.name
希望这对将来有所帮助。
英文:
After further research, I have found a solution. Here's what I came up with using pandas.Series.map
on the DataFrame's columns:
def sum_weights_by_classification_labels(security_weights, security_classification):
classification_weights = security_weights.copy()
classification_weights.columns = classification_weights.columns.map(security_classification)
classification_weights = classification_weights.groupby(classification_weights.columns, axis=1).sum()
return classification_weights
Alternatively using pandas.DataFrame.merge
:
def sum_weights_by_classification_labels(security_weights, security_classification):
security_weights_transposed = security_weights.transpose()
merged_data = security_weights_transposed.merge(security_classification, how='left', left_index=True,
right_index=True)
classification_weights = merged_data.groupby(security_classification.name).sum().transpose()
return classification_weights
And for the second solution need to add this line to the unit test because cannot merge a Series without a name (the added column needs to have one):
result_sector_weights.columns.name = security_classification.name
I'm keeping this post hoping it might help someone in the future.
This is the way...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论