2023年7月24日 00:47:15go评论117阅读模式

英文:

elementwise division with panel data df.div(level=?). index level

问题

# 将以下代码翻译为中文
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
                   [11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
                   [22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)

# 感谢Shubham Sharma的回答，我能够获得标有TOTAL的列，它仅是水平求和，不包括匹配级别一的行索引和级别零的列索引。

# 要匹配来自索引和列的相同国家，然后排除匹配单元格进行相加
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)

# 现在我想继续使用元素级除法沿列（axis=1）进行操作。

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1)

# 这会导致错误：
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份，这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示：

# 以下是我希望最终的df的外观
# 我已经查看了df.div的文档，并尝试包括level=1以按年份进行分组，但仍然出现相同的错误
# 也许我整个方法都不对，所以我接受任何简单的解决方案建议

# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
                   [11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
                   [22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)

# 感谢Shubham Sharma的回答，我能够获得标有TOTAL的列，它仅是水平求和，不包括匹配级别一的行索引和级别零的列索引。

# 要匹配来自索引和列的相同国家，然后排除匹配单元格进行相加
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[('TOTAL','EX')] = df.mask(m).sum(axis=1)
df[('TOTAL','GR')] = df.iloc[:,:-1].sum(axis=1)

# 现在我想继续使用元素级除法沿列（axis=1）进行操作。

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1, level=1)

# 这会导致错误：
# ValueError: Unable to coerce to Series, length must be 4: given 8
# 这是因为这不考虑年份索引。但如果只有一个年份，这个结果会很好地给我预期的结果。但我想要每年都这样做。考虑级别。
# 我希望最终的df如下所示：

# 以下是我希望最终的df的外观
Sample Data
arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
           [2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
           ['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
          ['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arr

<details>
<summary>英文:</summary>

Sample Data

arrays1 = [['country1','country1','country1','country1','country2', 'country2', 'country2', 'country2'],
[2000, 2001, 2000, 2001,2000, 2001, 2000, 2001],
['agri1','agri1', 'cons2','cons2', 'agri1','agri1', 'cons2','cons2']]
arrays = [['country1', 'country1', 'country2', 'country2'],
['agri1', 'cons2', 'agri1', 'cons2']]
index = pd.MultiIndex.from_arrays(arrays1, names=('country','Year','sector'))
columns1 = pd.MultiIndex.from_arrays(arrays, names=('country','sector'))
df = pd.DataFrame(np.array([[24, 20, 30, 20],[16, 14, 10, 25],[28, 22, 6, 28],
[11, 10, 10, 4],[6, 7, 12, 16],[19, 24, 6, 9],
[22, 9, 10, 15],[9, 1, 4, 2]]),index=index, columns=columns1)


```            Country	country1	country2	
               Sector	 A	 B	     A	 B	  
Year	country	sector						
2000	country1	A	24	20	    30	20	 
                    B	16	14	    10	25	 
        country2	A	28	22	    6	28	 
                    B	11	10	    10	4	
2001	country1	A	6	7	    12	16	
                    B	19	24	    6	9	
        country2	A	22	9	    10	15	
                    B	9	1	    4	2

Thanks to Shubham Sharma answers https://stackoverflow.com/questions/76446671/multiindex-column-and-rows-match-if-the-column-and-row-names-are-similar-and-exc I am able to get the column labeled total. which is simply horizontal summation excluding matching level one row index and level zero column index.

#to match the same countries from index and column then exclude the matching cell from addition
ix = df.index.get_level_values(1) 
cx = df.columns.get_level_values(0)
m = ix.values[:, None] == cx.values

df[(&#39;TOTAL&#39;,&#39;EX&#39;)] = df.mask(m).sum(axis=1)
df[(&#39;TOTAL&#39;,&#39;GR&#39;)] = df.iloc[:,:-1].sum(axis=1)

resulting df

               Country	country1	country2  TOTAL	
               Sector	 A	 B	     A	 B	  EX	GR
Year	country	sector						
2000	country1	A	24	20	    30	20	 50.0	94
                    B	16	14	    10	25	 35.0	65
        country2	A	28	22	    6	28	 50.0	84
                    B	11	10	    10	4	 21.0	35
2001	country1	A	6	7	    12	16	 28.0	41
                    B	19	24	    6	9	 15.0	58
        country2	A	22	9	    10	15	 31.0	56
                    B	9	1	    4	2    10.0	16

now I want to proceed and use element-wise division along the column (axis=1).

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[('TOTAL','GR')].values,axis=1)

which gives me
ValueError: Unable to coerce to Series, length must be 4: given 8
this is because this doesnt take into account the year index. but if there is only a single year this result would have given me the expected result nicely. but I want to do it each year. taking into account the level.
what I want the final df look like

               Country	    country1	   country2         TOTAL	
               Sector	   A	    B	     A	    B	      EX	GR
Year	country	sector						
2000	country1	A	24/94	20/65      30/84	20/35	 50.0	94
                    B	16/94	14/65      10/84	25/35	 35.0	65
        country2	A	28/94	22/65       6/84	28/35	 50.0	84
                    B	11/94	10/65      10/84	4/35	 21.0	35
2001	country1	A	6/41	7/58       12/56	16/16	 28.0	41
                    B	19/41	24/58       6/56	9/16	 15.0	58
        country2	A	22/41	9/58       10/56	15/16	 31.0	56
                    B	9/41	1/58        4/56	2/16    10.0	16```

I have seen the documentation for df.div and tried to include level 1 to group by year while dividing element-wise division but the same error appeared

df.iloc[:,:-2] = df.iloc[:,:-2].div(df[(&#39;TOTAL&#39;,&#39;GR&#39;)].values,axis=1, level=1)

ValueError: Unable to coerce to Series, length must be 4: given 8
maybe I am doing the whole thing not right, so i accept any easy work arround recommendations


</details>


# 答案1
**得分**: 2

IIUC，您可以尝试：

```py
def fn(x):
    x.iloc[:, :-2] = x.iloc[:, :-2].div(x[('TOTAL', 'GR')].values, axis=1)
    return x

x = df.groupby(level='Year', group_keys=False).apply(fn)
print(x)

打印：

country               country1            country2           TOTAL     
sector                   agri1     cons2     agri1     cons2    EX   GR
country  Year sector                                                   
country1 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675
country2 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675

初始的 df 来自您的问题：

country              country1       country2       TOTAL     
sector                  agri1 cons2    agri1 cons2    EX   GR
country  Year sector                                         
country1 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
country2 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675

英文:

IIUC, you can try:

def fn(x):
    x.iloc[:, :-2] = x.iloc[:, :-2].div(x[(&#39;TOTAL&#39;, &#39;GR&#39;)].values, axis=1)
    return x

x = df.groupby(level=&#39;Year&#39;, group_keys=False).apply(fn)
print(x)

Prints:

country               country1            country2           TOTAL     
sector                   agri1     cons2     agri1     cons2    EX   GR
country  Year sector                                                   
country1 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675
country2 2000 agri1   0.493671  0.443038  0.037975  0.025316   790  790
         2001 agri1   0.444444  0.503704  0.014815  0.037037   675  675
         2000 cons2   0.493671  0.443038  0.037975  0.025316   790  790
         2001 cons2   0.444444  0.503704  0.014815  0.037037   675  675

Initial df is from your question:

country              country1       country2       TOTAL     
sector                  agri1 cons2    agri1 cons2    EX   GR
country  Year sector                                         
country1 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675
country2 2000 agri1       390   350       30    20   790  790
         2001 agri1       300   340       10    25   675  675
         2000 cons2       390   350       30    20   790  790
         2001 cons2       300   340       10    25   675  675

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

elementwise division with panel data df.div(level=?). index level

问题

SymPy 微分方程系统

无法在服务器上使用谷歌登录，但在本地主机上正常工作。

无法使用BeautifulSoup抓取网站信息。

寻找二进制列中的模式 r

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论