英文:
Fill nan based on two columns in another data frame
问题
我有两个数据框,列标题不同,只是突出显示,因为人们将其标记为重复项。
df1 = pd.DataFrame({'Date':['7/03/2022', '9/3/2022'],
'Client':['Client 1','Client 2'],
'Course 2':['Computer skill','CCC']})
df2 = pd.DataFrame({'Session Date':['7/03/2022', '9/3/2022'],
'Org':['Client 1','Client 3'],
'Session name':[np.nan,'CCC']})
我想要做的是,如果客户和日期相同,将df2中"Session name"的空值填充为df1中的值。
以下是我的代码:
merged_df = pd.merge(df1, df2, left_on=['Date', 'Client'], right_on=['Session Date', 'Org'], how='inner')
df2['Session Name'] = merged_df.apply(lambda x: x['Course 2'] if pd.isna(x['Session name']) else x['Session name'], axis=1)
df2
但显然它不起作用,输出如下:
Session Date Org Session name
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 3 NaN
而应该输出:
Session Date Org Session Name
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 3 CCC
英文:
I have two data frames like (WITH DIFFERENT COLUMN HEADERS) only highlighting it because people are marking it as duplicates
df1 = pd.DataFrame({'Date':['7/03/2022', '9/3/2022'],
'Client':['Client 1','Client 2'],
'Course 2':['Computer skill','CCC']})
df2 = pd.DataFrame({'Session Date':['7/03/2022', '9/3/2022'],
'Org':['Client 1','Client 3'],
'Session name':[np.nan,'CCC']})
What I want to do is fill up the null values in Session name in df2 with the one in df1 if Client and dates are the same.
This is the code that I have
merged_df = pd.merge(df1, df2, left_on=['Date', 'Client'], right_on=['Session Date', 'Org'], how='inner')
df2['Session Name'] = merged_df.apply(lambda x: x['Course 2'] if pd.isna(x['Session Name']) else x['Session Name'], axis=1)
df2
But it's obviously not working,the output it prints is
Session Date Org Session Name
0 7/03/2022 Client 1 Computer skill
1 10/3/2022 Client 3 NaN
Where as it should print
Session Date Org Session Name
0 7/03/2022 Client 1 Computer skill
1 10/3/2022 Client 3 CCC
答案1
得分: 1
可以使用一个映射字典来重命名列,如果你的列是按位置对齐的:
cmap = dict(zip(df2.columns, df1.columns))
cidx = ['Date', 'Client']
out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)
# 输出
Date Client Course 2
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 2 CCC
英文:
You can use a mapping dict to rename columns if your columns are aligned (by position):
cmap = dict(zip(df2.columns, df1.columns))
cidx = ['Date', 'Client']
out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)
# Output
Date Client Course 2
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 2 CCC
答案2
得分: 0
以下是已翻译的内容:
可能的解决方案:
cols = df2.columns
df2.columns = df1.columns
df2.update(df1)
df2.columns = cols
输出:
会话日期 机构 会话名称
0 2022年7月3日 客户1 计算机技能
1 2022年9月3日 客户2 CCC
英文:
A possible solution:
cols = df2.columns
df2.columns = df1.columns
df2.update(df1)
df2.columns = cols
Output:
Session Date Org Session name
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 2 CCC
答案3
得分: 0
你可以通过使用reset_index
和set_index
来在合并后避免原始索引,因此可以使用Series.fillna
函数:
merged_df = pd.merge(df1,
df2.reset_index(),
left_on=['Date', 'Client'],
right_on=['Session Date', 'Org'], how='inner').set_index('index')
df2['Session name'] = df2['Session name'].fillna(merged_df['Course 2'])
print (df2)
Session Date Org Session name
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 3 CCC
英文:
You can avoid original index after merge by reset_index
and set_index
, so possible use Series.fillna
:
merged_df = pd.merge(df1,
df2.reset_index(),
left_on=['Date', 'Client'],
right_on=['Session Date', 'Org'], how='inner').set_index('index')
df2['Session name'] = df2['Session name'].fillna(merged_df['Course 2'])
print (df2)
Session Date Org Session name
0 7/03/2022 Client 1 Computer skill
1 9/3/2022 Client 3 CCC
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论