2023年6月16日 12:51:08go评论60阅读模式

英文:

How to reorder data with pandas

问题

I'm trying to reorder from raw data to a report form but I don't know how to do it. Below is my sample.

import pandas as pd
d = {'Customer Name': ['A', 'A', 'B', 'C', 'C', 'C'], 
     'ID_NO': ['AA', 'AA', 'BB', 'CC', 'CC', 'CC'],
     'Account Number': ['123456', '123457', '234567', '345678', '345679', '345680'],
     'Limit': [1000, 1000, 900, 1200, 1200, 1200], 
     'Balance': [400, 600, 800, 300, 400, 200]}
df = pd.DataFrame(data=d)
df

But I need something as below:

d2 = {'Customer Name': ['A', '123456', '123457', 'B', '234567', 'C', '345678', '345679', '345680'], 
     'ID_NO': ['AA', '', '', 'BB', '', 'CC', '', '', ''],
     'Limit': [1000, '', '', 900, '', 1200, '', '', ''], 
     'Balance': [1000, 600, 400, 800, 800, 900, 300, 400, 200]}
df2 = pd.DataFrame(data=d2)
df2

What should I do in this case. I appreciate all the suggestions and help.

One more question that if I have two Limit numbers for one customer as below:

import pandas as pd
d = {'Customer Name': ['A', 'A', 'B', 'C', 'C', 'C'], 
     'ID_NO': ['AA', 'AA', 'BB', 'CC', 'CC', 'CC'],
     'Limit Number': ['AAA', 'AAA', 'BBB', 'CCC', 'CCC', 'CCD'],
     'Account Number': ['123456', '123457', '234567', '345678', '345679', '345680'],
     'Limit': [1000, 1000, 900, 1200, 1200, 1500], 
     'Balance': [400, 600, 800, 300, 400, 600]}
df = pd.DataFrame(data=d)

If I use the code below:

out = (pd.concat([df.reset_index()
                    .groupby(['Customer Name','ID_NO', 'Limit Number'], as_index=False)
                    .agg({'Balance':'sum','Limit':'first','index':'first'})
                    .set_index('index'), 
                 df[['Account Number','Balance']]
                      .rename(columns={'Account Number':'Customer Name'})])
         .sort_index(kind='stable', ignore_index=True)
         [['Customer Name','ID_NO','Limit','Balance']])
out

I will get the below result:

Below is my expected Output:

英文:

I'm trying to reorder from raw data to a report form but I don't know how to do it. Below is my sample.

import pandas as pd
d = {&#39;Customer Name&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;], 
     &#39;ID_NO&#39;: [&#39;AA&#39;, &#39;AA&#39;, &#39;BB&#39;, &#39;CC&#39;, &#39;CC&#39;, &#39;CC&#39;],
     &#39;Account Number&#39;: [&#39;123456&#39;, &#39;123457&#39;, &#39;234567&#39;, &#39;345678&#39;, &#39;345679&#39;, &#39;345680&#39;],
     &#39;Limit&#39;: [1000, 1000, 900, 1200, 1200, 1200], 
     &#39;Balance&#39;: [400, 600, 800, 300, 400, 200]}
df = pd.DataFrame(data=d)
df

But I need something as below:

d2 = {&#39;Customer Name&#39;: [&#39;A&#39;, &#39;123456&#39;, &#39;123457&#39;, &#39;B&#39;, &#39;234567&#39;, &#39;C&#39;, &#39;345678&#39;, &#39;345679&#39;, &#39;345680&#39;], 
     &#39;ID_NO&#39;: [&#39;AA&#39;, &#39;&#39;, &#39;&#39;, &#39;BB&#39;, &#39;&#39;, &#39;CC&#39;, &#39;&#39;, &#39;&#39;, &#39;&#39;],
     &#39;Limit&#39;: [1000, &#39;&#39;, &#39;&#39;, 900, &#39;&#39;, 1200, &#39;&#39;, &#39;&#39;, &#39;&#39;], 
     &#39;Balance&#39;: [1000, 600, 400, 800, 800, 900, 300, 400, 200]}
df2 = pd.DataFrame(data=d2)
df2

What should I do in this case. I appreciate all the suggestions and help.

One more question that if I have two Limit number for one customer as below:

import pandas as pd
d = {&#39;Customer Name&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;], 
     &#39;ID_NO&#39;: [&#39;AA&#39;, &#39;AA&#39;, &#39;BB&#39;, &#39;CC&#39;, &#39;CC&#39;, &#39;CC&#39;],
     &#39;Limit Number&#39;: [&#39;AAA&#39;, &#39;AAA&#39;, &#39;BBB&#39;, &#39;CCC&#39;, &#39;CCC&#39;, &#39;CCD&#39;],
     &#39;Account Number&#39;: [&#39;123456&#39;, &#39;123457&#39;, &#39;234567&#39;, &#39;345678&#39;, &#39;345679&#39;, &#39;345680&#39;],
     &#39;Limit&#39;: [1000, 1000, 900, 1200, 1200, 1500], 
     &#39;Balance&#39;: [400, 600, 800, 300, 400, 600]}
df = pd.DataFrame(data=d)

If I used this below code:

out = (pd.concat([df.reset_index()
                    .groupby([&#39;Customer Name&#39;,&#39;ID_NO&#39;, &#39;Limit Number&#39;], as_index=False)
                    .agg({&#39;Balance&#39;:&#39;sum&#39;,&#39;Limit&#39;:&#39;first&#39;,&#39;index&#39;:&#39;first&#39;})
                    .set_index(&#39;index&#39;), 
                 df[[&#39;Account Number&#39;,&#39;Balance&#39;]]
                      .rename(columns={&#39;Account Number&#39;:&#39;Customer Name&#39;})])
         .sort_index(kind=&#39;stable&#39;, ignore_index=True)
         [[&#39;Customer Name&#39;,&#39;ID_NO&#39;,&#39;Limit&#39;,&#39;Balance&#39;]])
out

I will get below result:

Below is my expected Output:

答案1

得分: 2

以下是翻译好的部分：

你可以使用concat来合并数据，并通过删除重复值的方式使用第一个数据框，删除重复值的依据是Customer Name和ID_NO，同时删除Account Number列，并将Balance列设置为两列的总和，然后再使用rename选择必要的列，最后使用DataFrame.sort_index进行排序：

要求默认索引。

# 如果需要的话
# df = df.reset_index(drop=True)

out = (pd.concat([df.drop_duplicates(['Customer Name','ID_NO'])
                    .drop('Account Number', axis=1)
                    .assign(Balance=df.groupby(['Customer Name','ID_NO'])['Balance']
                    .transform('sum')), 
       df[['Account Number','Balance']]
                    .rename(columns={'Account Number':'Customer Name'})])
         .sort_index(kind='stable', ignore_index=True))
print(out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  1200.0      900
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      200

或者你可以使用sum和GroupBy.first进行聚合，只需在第一个数据框中避免原始索引，所以使用了reset_index和set_index的技巧：

要求默认索引。

# 如果需要的话
# df = df.reset_index(drop=True)

out = (pd.concat([df.reset_index()
                    .groupby(['Customer Name','ID_NO'], as_index=False)
                    .agg({'Balance':'sum','Limit':'first','index':'first'})
                    .set_index('index'), 
                 df[['Account Number','Balance']]
                      .rename(columns={'Account Number':'Customer Name'})])
         .sort_index(kind='stable', ignore_index=True)
         [['Customer Name','ID_NO','Limit','Balance']])
print(out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  1200.0      900
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      200

编辑：为了对Limit列的唯一值进行求和，可以使用自定义函数：

out = (pd.concat([df.reset_index()
                    .groupby(['Customer Name','ID_NO'], as_index=False)
                    .agg({'Balance':'sum',
                          'Limit':lambda x: x.unique().sum(),
                          'index':'first'})
                    .set_index('index'), 
                 df[['Account Number','Balance']]
                      .rename(columns={'Account Number':'Customer Name'})])
         .sort_index(kind='stable', ignore_index=True)
         [['Customer Name','ID_NO','Limit','Balance']])
    
print(out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  2700.0     1300
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      600

英文:

You can use concat and pass first DataFrame created by remove repeated values by Customer Name and ID_NO with delete Account Number column with set Balance to sum per both columns, then second by seelct necessary columns with rename and last use DataFrame.sort_index:

Requirement is default index.

#if necessary
#df = df.reset_index(drop=True)

out = (pd.concat([df.drop_duplicates([&#39;Customer Name&#39;,&#39;ID_NO&#39;])
                    .drop(&#39;Account Number&#39;, axis=1)
                    .assign(Balance=df.groupby([&#39;Customer Name&#39;,&#39;ID_NO&#39;])[&#39;Balance&#39;]
                    .transform(&#39;sum&#39;)), 
       df[[&#39;Account Number&#39;,&#39;Balance&#39;]]
                    .rename(columns={&#39;Account Number&#39;:&#39;Customer Name&#39;})])
         .sort_index(kind=&#39;stable&#39;, ignore_index=True))
print (out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  1200.0      900
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      200

Or you can use aggreagtion by sum and GroupBy.first, only necessary avoid original index in first DataFrame, so used reset_index with set_index trick:

Requirement is default index.

#if necessary
#df = df.reset_index(drop=True)

out = (pd.concat([df.reset_index()
                    .groupby([&#39;Customer Name&#39;,&#39;ID_NO&#39;], as_index=False)
                    .agg({&#39;Balance&#39;:&#39;sum&#39;,&#39;Limit&#39;:&#39;first&#39;,&#39;index&#39;:&#39;first&#39;})
                    .set_index(&#39;index&#39;), 
                 df[[&#39;Account Number&#39;,&#39;Balance&#39;]]
                      .rename(columns={&#39;Account Number&#39;:&#39;Customer Name&#39;})])
         .sort_index(kind=&#39;stable&#39;, ignore_index=True)
         [[&#39;Customer Name&#39;,&#39;ID_NO&#39;,&#39;Limit&#39;,&#39;Balance&#39;]])
print (out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  1200.0      900
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      200

EDIT: For sum unique values of Limit column use custom function:

out = (pd.concat([df.reset_index()
                    .groupby([&#39;Customer Name&#39;,&#39;ID_NO&#39;], as_index=False)
                    .agg({&#39;Balance&#39;:&#39;sum&#39;,
                          &#39;Limit&#39;:lambda x: x.unique().sum(),
                          &#39;index&#39;:&#39;first&#39;})
                    .set_index(&#39;index&#39;), 
                 df[[&#39;Account Number&#39;,&#39;Balance&#39;]]
                      .rename(columns={&#39;Account Number&#39;:&#39;Customer Name&#39;})])
         .sort_index(kind=&#39;stable&#39;, ignore_index=True)
         [[&#39;Customer Name&#39;,&#39;ID_NO&#39;,&#39;Limit&#39;,&#39;Balance&#39;]])

print (out)
  Customer Name ID_NO   Limit  Balance
0             A    AA  1000.0     1000
1        123456   NaN     NaN      400
2        123457   NaN     NaN      600
3             B    BB   900.0      800
4        234567   NaN     NaN      800
5             C    CC  2700.0     1300
6        345678   NaN     NaN      300
7        345679   NaN     NaN      400
8        345680   NaN     NaN      600

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用pandas重新排序数据

问题

答案1

在数据框中迭代行和分组

datetime struggle with a pandas dataframe

网页抓取数据的格式化 BS4

OpenPyXL – 格式化的日期列在输出的Excel文件中被设置为’自定义’

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论