英文:
elementwise division with panel data df.div(level=?). index level
问题
# 将以下代码翻译为中文
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
[11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
[22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)
# 感谢Shubham Sharma的回答,我能够获得标有TOTAL的列,它仅是水平求和,不包括匹配级别一的行索引和级别零的列索引。
# 要匹配来自索引和列的相同国家,然后排除匹配单元格进行相加
ix = df.index.get_level_values(1)
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values
df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)
# 现在我想继续使用元素级除法沿列(axis=1)进行操作。
df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1)
# 这会导致错误:
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份,这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示:
# 以下是我希望最终的df的外观
# 我已经查看了df.div的文档,并尝试包括level=1以按年份进行分组,但仍然出现相同的错误
# 也许我整个方法都不对,所以我接受任何简单的解决方案建议
# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
[11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
[22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)
# 感谢Shubham Sharma的回答,我能够获得标有TOTAL的列,它仅是水平求和,不包括匹配级别一的行索引和级别零的列索引。
# 要匹配来自索引和列的相同国家,然后排除匹配单元格进行相加
ix = df.index.get_level_values(1)
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values
df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)
# 现在我想继续使用元素级除法沿列(axis=1)进行操作。
df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1, level=1)
# 这会导致错误:
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份,这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示:
# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arr
<details>
<summary>英文:</summary>
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
[11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
[22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)
``` Country country1 country2
Sector A B A B
Year country sector
2000 country1 A 24 20 30 20
B 16 14 10 25
country2 A 28 22 6 28
B 11 10 10 4
2001 country1 A 6 7 12 16
B 19 24 6 9
country2 A 22 9 10 15
B 9 1 4 2
Thanks to Shubham Sharma answers https://stackoverflow.com/questions/76446671/multiindex-column-and-rows-match-if-the-column-and-row-names-are-similar-and-exc I am able to get the column labeled total. which is simply horizontal summation excluding matching level one row index and level zero column index.
#to match the same countries from index and column then exclude the matching cell from addition
ix = df.index.get_level_values(1)
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values
df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)
resulting df
Country country1 country2 TOTAL
Sector A B A B EX GR
Year country sector
2000 country1 A 24 20 30 20 50.0 94
B 16 14 10 25 35.0 65
country2 A 28 22 6 28 50.0 84
B 11 10 10 4 21.0 35
2001 country1 A 6 7 12 16 28.0 41
B 19 24 6 9 15.0 58
country2 A 22 9 10 15 31.0 56
B 9 1 4 2 10.0 16
now I want to proceed and use element-wise division along the column (axis=1).
df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1)
which gives me
ValueError: Unable to coerce to Series, length must be 4: given 8
this is because this doesnt take into account the year index. but if there is only a single year this result would have given me the expected result nicely. but I want to do it each year. taking into account the level.
what I want the final df look like
Country country1 country2 TOTAL
Sector A B A B EX GR
Year country sector
2000 country1 A 24/94 20/65 30/84 20/35 50.0 94
B 16/94 14/65 10/84 25/35 35.0 65
country2 A 28/94 22/65 6/84 28/35 50.0 84
B 11/94 10/65 10/84 4/35 21.0 35
2001 country1 A 6/41 7/58 12/56 16/16 28.0 41
B 19/41 24/58 6/56 9/16 15.0 58
country2 A 22/41 9/58 10/56 15/16 31.0 56
B 9/41 1/58 4/56 2/16 10.0 16```
I have seen the documentation for df.div and tried to include level 1 to group by year while dividing element-wise division but the same error appeared
df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1, level=1)
ValueError: Unable to coerce to Series, length must be 4: given 8
maybe I am doing the whole thing not right, so i accept any easy work arround recommendations
</details>
# 答案1
**得分**: 2
IIUC,您可以尝试:
```py
def fn(x):
x.iloc[:, :-2] = x.iloc[:, :-2].div(x[('TOTAL', 'GR')].values, axis=1)
return x
x = df.groupby(level='Year', group_keys=False).apply(fn)
print(x)
打印:
country country1 country2 TOTAL
sector agri1 cons2 agri1 cons2 EX GR
country Year sector
country1 2000 agri1 0.493671 0.443038 0.037975 0.025316 790 790
2001 agri1 0.444444 0.503704 0.014815 0.037037 675 675
2000 cons2 0.493671 0.443038 0.037975 0.025316 790 790
2001 cons2 0.444444 0.503704 0.014815 0.037037 675 675
country2 2000 agri1 0.493671 0.443038 0.037975 0.025316 790 790
2001 agri1 0.444444 0.503704 0.014815 0.037037 675 675
2000 cons2 0.493671 0.443038 0.037975 0.025316 790 790
2001 cons2 0.444444 0.503704 0.014815 0.037037 675 675
初始的 df
来自您的问题:
country country1 country2 TOTAL
sector agri1 cons2 agri1 cons2 EX GR
country Year sector
country1 2000 agri1 390 350 30 20 790 790
2001 agri1 300 340 10 25 675 675
2000 cons2 390 350 30 20 790 790
2001 cons2 300 340 10 25 675 675
country2 2000 agri1 390 350 30 20 790 790
2001 agri1 300 340 10 25 675 675
2000 cons2 390 350 30 20 790 790
2001 cons2 300 340 10 25 675 675
英文:
IIUC, you can try:
def fn(x):
x.iloc[:, :-2] = x.iloc[:, :-2].div(x[('TOTAL', 'GR')].values, axis=1)
return x
x = df.groupby(level='Year', group_keys=False).apply(fn)
print(x)
Prints:
country country1 country2 TOTAL
sector agri1 cons2 agri1 cons2 EX GR
country Year sector
country1 2000 agri1 0.493671 0.443038 0.037975 0.025316 790 790
2001 agri1 0.444444 0.503704 0.014815 0.037037 675 675
2000 cons2 0.493671 0.443038 0.037975 0.025316 790 790
2001 cons2 0.444444 0.503704 0.014815 0.037037 675 675
country2 2000 agri1 0.493671 0.443038 0.037975 0.025316 790 790
2001 agri1 0.444444 0.503704 0.014815 0.037037 675 675
2000 cons2 0.493671 0.443038 0.037975 0.025316 790 790
2001 cons2 0.444444 0.503704 0.014815 0.037037 675 675
Initial df
is from your question:
country country1 country2 TOTAL
sector agri1 cons2 agri1 cons2 EX GR
country Year sector
country1 2000 agri1 390 350 30 20 790 790
2001 agri1 300 340 10 25 675 675
2000 cons2 390 350 30 20 790 790
2001 cons2 300 340 10 25 675 675
country2 2000 agri1 390 350 30 20 790 790
2001 agri1 300 340 10 25 675 675
2000 cons2 390 350 30 20 790 790
2001 cons2 300 340 10 25 675 675
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论