elementwise division with panel data df.div(level=?). index level

huangapple go评论109阅读模式
英文:

elementwise division with panel data df.div(level=?). index level

问题

# 将以下代码翻译为中文
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
                   [11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
                   [22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)
# 感谢Shubham Sharma的回答,我能够获得标有TOTAL的列,它仅是水平求和,不包括匹配级别一的行索引和级别零的列索引。

# 要匹配来自索引和列的相同国家,然后排除匹配单元格进行相加
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)
# 现在我想继续使用元素级除法沿列(axis=1)进行操作。

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1)
# 这会导致错误:
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份,这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示:

# 以下是我希望最终的df的外观
# 我已经查看了df.div的文档,并尝试包括level=1以按年份进行分组,但仍然出现相同的错误
# 也许我整个方法都不对,所以我接受任何简单的解决方案建议
# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
                   [11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
                   [22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)
# 感谢Shubham Sharma的回答,我能够获得标有TOTAL的列,它仅是水平求和,不包括匹配级别一的行索引和级别零的列索引。

# 要匹配来自索引和列的相同国家,然后排除匹配单元格进行相加
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)
# 现在我想继续使用元素级除法沿列(axis=1)进行操作。

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1, level=1)
# 这会导致错误:
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份,这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示:

# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arr

<details>
<summary>英文:</summary>

Sample Data

arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
[11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
[22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)


```            Country	country1	country2	
               Sector	 A	 B	     A	 B	  
Year	country	sector						
2000	country1	A	24	20	    30	20	 
                    B	16	14	    10	25	 
        country2	A	28	22	    6	28	 
                    B	11	10	    10	4	
2001	country1	A	6	7	    12	16	
                    B	19	24	    6	9	
        country2	A	22	9	    10	15	
                    B	9	1	    4	2

Thanks to Shubham Sharma answers https://stackoverflow.com/questions/76446671/multiindex-column-and-rows-match-if-the-column-and-row-names-are-similar-and-exc I am able to get the column labeled total. which is simply horizontal summation excluding matching level one row index and level zero column index.

#to match the same countries from index and column then exclude the matching cell from addition
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[(&#39;TOTAL&#39;,&#39;EX&#39;)] = df.mask(m).sum(axis=1)
df[(&#39;TOTAL&#39;,&#39;GR&#39;)] = df.iloc[:,:-1].sum(axis=1)
resulting df
               Country	country1	country2  TOTAL	
               Sector	 A	 B	     A	 B	  EX	GR
Year	country	sector						
2000	country1	A	24	20	    30	20	 50.0	94
                    B	16	14	    10	25	 35.0	65
        country2	A	28	22	    6	28	 50.0	84
                    B	11	10	    10	4	 21.0	35
2001	country1	A	6	7	    12	16	 28.0	41
                    B	19	24	    6	9	 15.0	58
        country2	A	22	9	    10	15	 31.0	56
                    B	9	1	    4	2    10.0	16

now I want to proceed and use element-wise division along the column (axis=1).

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[(&#39;TOTAL&#39;,&#39;GR&#39;)].values,axis=1)

which gives me
ValueError: Unable to coerce to Series, length must be 4: given 8
this is because this doesnt take into account the year index. but if there is only a single year this result would have given me the expected result nicely. but I want to do it each year. taking into account the level.
what I want the final df look like

               Country	    country1	   country2         TOTAL	
               Sector	   A	    B	     A	    B	      EX	GR
Year	country	sector						
2000	country1	A	24/94	20/65      30/84	20/35	 50.0	94
                    B	16/94	14/65      10/84	25/35	 35.0	65
        country2	A	28/94	22/65       6/84	28/35	 50.0	84
                    B	11/94	10/65      10/84	4/35	 21.0	35
2001	country1	A	6/41	7/58       12/56	16/16	 28.0	41
                    B	19/41	24/58       6/56	9/16	 15.0	58
        country2	A	22/41	9/58       10/56	15/16	 31.0	56
                    B	9/41	1/58        4/56	2/16    10.0	16```

I have seen the documentation for df.div and tried to include level 1 to group by year while dividing element-wise division but the same error appeared

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[(&#39;TOTAL&#39;,&#39;GR&#39;)].values,axis=1, level=1)

ValueError: Unable to coerce to Series, length must be 4: given 8
maybe I am doing the whole thing not right, so i accept any easy work arround recommendations


</details>


# 答案1
**得分**: 2

IIUC,您可以尝试:

```py
def fn(x):
    x.iloc[:, :-2] = x.iloc[:, :-2].div(x[('TOTAL', 'GR')].values, axis=1)
    return x

x = df.groupby(level='Year', group_keys=False).apply(fn)
print(x)

打印:

country               country1            country2           TOTAL     
sector                   agri1     cons2     agri1     cons2    EX   GR
country  Year sector                                                   
country1 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675
country2 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675

初始的 df 来自您的问题:

country              country1       country2       TOTAL     
sector                  agri1 cons2    agri1 cons2    EX   GR
country  Year sector                                         
country1 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
country2 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
英文:

IIUC, you can try:

def fn(x):
    x.iloc[:, :-2] = x.iloc[:, :-2].div(x[(&#39;TOTAL&#39;, &#39;GR&#39;)].values, axis=1)
    return x

x = df.groupby(level=&#39;Year&#39;, group_keys=False).apply(fn)
print(x)

Prints:

country               country1            country2           TOTAL     
sector                   agri1     cons2     agri1     cons2    EX   GR
country  Year sector                                                   
country1 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675
country2 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675

Initial df is from your question:

country              country1       country2       TOTAL     
sector                  agri1 cons2    agri1 cons2    EX   GR
country  Year sector                                         
country1 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
country2 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675

huangapple
  • 本文由 发表于 2023年7月24日 00:47:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76749355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定