英文:
Advance sorting with pandas
问题
Input data:
col1 col2
0 X X
1 Y Z
2 Z A
4 A
5 B
Output data:
col1 col2
0 X X
1 Y
2 Z Z
4 A A
5 B
英文:
So i have input data just like bellow. I have 5 xslx tables. Within each xslx table are 5 columns. I want to join together xslx tables. So i did that with pd.concat
. Problem occurs when i want to do some advanced sorting on that big joint together table. Bellow I tried to explain what i want to do so please check.
Input data:
col1 col2
0 X X
1 Y Z
2 Z A
4 A
5 B
I would like it to be sort like:
Output data:
col1 col2
0 X X
1 Y
2 Z Z
4 A A
5 B
答案1
得分: 1
你确定你需要排序吗?这看起来更像是一个掩码操作(where
+ isin
):
df['col2'] = df['col1'].where(df['col1'].isin(df['col2']), '')
输出:
col1 col2
0 X X
1 Y
2 Z Z
4 A A
5 B
或者也可以使用左连接的 merge
:
df[['col1']].merge(df[['col2']], left_on='col1', right_on='col2', how='left')
输出:
col1 col2
0 X X
1 Y NaN
2 Z Z
3 A A
4 B NaN
英文:
Are you sure you need sorting? This rather looks like a mask (where
+ isin
):
df['col2'] = df['col1'].where(df['col1'].isin(df['col2']), '')
Output:
col1 col2
0 X X
1 Y
2 Z Z
4 A A
5 B
Or maybe a left-merge
:
df[['col1']].merge(df[['col2']], left_on='col1', right_on='col2', how='left')
Output:
col1 col2
0 X X
1 Y NaN
2 Z Z
3 A A
4 B NaN
答案2
得分: 0
另一个可能的解决方案:
df.set_index('col1').join(df[['col2']].set_index('col2')).reset_index()
输出:
col1 col2
0 A NaN
1 B NaN
2 X X
3 Y Z
4 Z A
英文:
Another possible solution:
df.set_index('col1').join(df[['col2']].set_index('col2')).reset_index()
Output:
col1 col2
0 A NaN
1 B NaN
2 X X
3 Y Z
4 Z A
答案3
得分: 0
你可以通过遍历DataFrame中的每一行,并检查col1中的值是否在col2中来实现这一点。如果是的话,你可以将该行的col2值设置为col1的值。如果不是的话,你可以将该行的col2值设置为空字符串。
import pandas as pd
# 假设df是你的DataFrame
# 遍历每一行
for i in range(len(df)):
# 如果col1中的值在col2中
if df.loc[i, 'col1'] in df['col2'].values:
# 将col2的值设置为col1的值
df.loc[i, 'col2'] = df.loc[i, 'col1']
else:
# 将col2的值设置为空字符串
df.loc[i, 'col2'] = ''
print(df)
英文:
You can achieve this by looping through each row in your DataFrame and checking if the value in col1 is in col2. If it is, you can set the value of col2 to the value of col1 for that row. If not, you can set the value of col2 to an empty string for that row.
import pandas as pd
# assuming df is your DataFrame
# iterate over each row
for i in range(len(df)):
# if value in col1 is in col2
if df.loc[i, 'col1'] in df['col2'].values:
# set col2 value to col1 value
df.loc[i, 'col2'] = df.loc[i, 'col1']
else:
# set col2 value to an empty string
df.loc[i, 'col2'] = ''
print(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论