使用pandas进行高级排序

huangapple go评论67阅读模式
英文:

Advance sorting with pandas

问题

Input data:

  col1 col2
0  X    X
1  Y    Z
2  Z    A
4  A    
5  B    

Output data:

  col1 col2
0  X    X
1  Y    
2  Z    Z
4  A    A
5  B    
英文:

So i have input data just like bellow. I have 5 xslx tables. Within each xslx table are 5 columns. I want to join together xslx tables. So i did that with pd.concat. Problem occurs when i want to do some advanced sorting on that big joint together table. Bellow I tried to explain what i want to do so please check.

Input data:

  col1 col2
0  X    X
1  Y    Z
2  Z    A
4  A
5  B

I would like it to be sort like:

Output data:

  col1 col2
0  X    X
1  Y    
2  Z    Z
4  A    A
5  B    

答案1

得分: 1

你确定你需要排序吗?这看起来更像是一个掩码操作(where + isin):

df['col2'] = df['col1'].where(df['col1'].isin(df['col2']), '')

输出:

  col1 col2
0    X    X
1    Y     
2    Z    Z
4    A    A
5    B     

或者也可以使用左连接的 merge

df[['col1']].merge(df[['col2']], left_on='col1', right_on='col2', how='left')

输出:

  col1 col2
0    X    X
1    Y  NaN
2    Z    Z
3    A    A
4    B  NaN
英文:

Are you sure you need sorting? This rather looks like a mask (where + isin):

df['col2'] = df['col1'].where(df['col1'].isin(df['col2']), '')

Output:

  col1 col2
0    X    X
1    Y     
2    Z    Z
4    A    A
5    B     

Or maybe a left-merge:

df[['col1']].merge(df[['col2']], left_on='col1', right_on='col2', how='left')

Output:

  col1 col2
0    X    X
1    Y  NaN
2    Z    Z
3    A    A
4    B  NaN

答案2

得分: 0

另一个可能的解决方案:

df.set_index('col1').join(df[['col2']].set_index('col2')).reset_index()

输出:

  col1 col2
0    A  NaN
1    B  NaN
2    X    X
3    Y    Z
4    Z    A
英文:

Another possible solution:

df.set_index('col1').join(df[['col2']].set_index('col2')).reset_index()

Output:

  col1 col2
0    A  NaN
1    B  NaN
2    X    X
3    Y    Z
4    Z    A

答案3

得分: 0

你可以通过遍历DataFrame中的每一行,并检查col1中的值是否在col2中来实现这一点。如果是的话,你可以将该行的col2值设置为col1的值。如果不是的话,你可以将该行的col2值设置为空字符串。

import pandas as pd

# 假设df是你的DataFrame

# 遍历每一行
for i in range(len(df)):
    # 如果col1中的值在col2中
    if df.loc[i, 'col1'] in df['col2'].values:
        # 将col2的值设置为col1的值
        df.loc[i, 'col2'] = df.loc[i, 'col1']
    else:
        # 将col2的值设置为空字符串
        df.loc[i, 'col2'] = ''

print(df)
英文:

You can achieve this by looping through each row in your DataFrame and checking if the value in col1 is in col2. If it is, you can set the value of col2 to the value of col1 for that row. If not, you can set the value of col2 to an empty string for that row.

import pandas as pd

# assuming df is your DataFrame

# iterate over each row
for i in range(len(df)):
    # if value in col1 is in col2
    if df.loc[i, 'col1'] in df['col2'].values:
        # set col2 value to col1 value
        df.loc[i, 'col2'] = df.loc[i, 'col1']
    else:
        # set col2 value to an empty string
        df.loc[i, 'col2'] = ''

print(df)

huangapple
  • 本文由 发表于 2023年6月15日 18:05:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76481388.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定