根据另一个数据框中的两列填充NaN值。

huangapple go评论71阅读模式
英文:

Fill nan based on two columns in another data frame

问题

我有两个数据框,列标题不同,只是突出显示,因为人们将其标记为重复项。

df1 = pd.DataFrame({'Date':['7/03/2022', '9/3/2022'],
                   'Client':['Client 1','Client 2'],
                   'Course 2':['Computer skill','CCC']})

df2 = pd.DataFrame({'Session Date':['7/03/2022', '9/3/2022'],
                   'Org':['Client 1','Client 3'],
                   'Session name':[np.nan,'CCC']})

我想要做的是,如果客户和日期相同,将df2中"Session name"的空值填充为df1中的值。

以下是我的代码:

merged_df = pd.merge(df1, df2, left_on=['Date', 'Client'], right_on=['Session Date', 'Org'], how='inner')
df2['Session Name'] = merged_df.apply(lambda x: x['Course 2'] if pd.isna(x['Session name']) else x['Session name'], axis=1)
df2

但显然它不起作用,输出如下:

  Session Date     Org  Session name
0     7/03/2022  Client 1  Computer skill
1      9/3/2022  Client 3           NaN

而应该输出:

  Session Date     Org  Session Name
0     7/03/2022  Client 1  Computer skill
1      9/3/2022  Client 3  CCC
英文:

I have two data frames like (WITH DIFFERENT COLUMN HEADERS) only highlighting it because people are marking it as duplicates

df1 = pd.DataFrame({'Date':['7/03/2022', '9/3/2022'],
                         'Client':['Client 1','Client 2'],
                         'Course 2':['Computer skill','CCC']})

df2 = pd.DataFrame({'Session Date':['7/03/2022', '9/3/2022'],
                         'Org':['Client 1','Client 3'],
                         'Session name':[np.nan,'CCC']})

What I want to do is fill up the null values in Session name in df2 with the one in df1 if Client and dates are the same.

This is the code that I have

merged_df = pd.merge(df1, df2, left_on=['Date', 'Client'], right_on=['Session Date', 'Org'], how='inner')
df2['Session Name'] = merged_df.apply(lambda x: x['Course 2'] if pd.isna(x['Session Name']) else x['Session Name'], axis=1)
df2

But it's obviously not working,the output it prints is

	Session Date	Org	   Session Name
0	7/03/2022	   Client 1	   Computer skill
1	10/3/2022	   Client 3	    NaN

Where as it should print

	Session Date	Org	   Session Name
0	7/03/2022	   Client 1	   Computer skill
1	10/3/2022	   Client 3	    CCC

答案1

得分: 1

可以使用一个映射字典来重命名列,如果你的列是按位置对齐的:

cmap = dict(zip(df2.columns, df1.columns))
cidx = ['Date', 'Client']

out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)

# 输出
        Date    Client        Course 2
0  7/03/2022  Client 1  Computer skill
1   9/3/2022  Client 2             CCC
英文:

You can use a mapping dict to rename columns if your columns are aligned (by position):

cmap = dict(zip(df2.columns, df1.columns))
cidx = ['Date', 'Client']

out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)

# Output
        Date    Client        Course 2
0  7/03/2022  Client 1  Computer skill
1   9/3/2022  Client 2             CCC

答案2

得分: 0

以下是已翻译的内容:

可能的解决方案:

    cols = df2.columns
    df2.columns = df1.columns
    df2.update(df1)
    df2.columns = cols

输出:

      会话日期       机构     会话名称
    0    2022年7月3日  客户1    计算机技能
    1    2022年9月3日  客户2    CCC
英文:

A possible solution:

cols = df2.columns
df2.columns = df1.columns
df2.update(df1)
df2.columns = cols

Output:

  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 2             CCC

答案3

得分: 0

你可以通过使用reset_indexset_index来在合并后避免原始索引,因此可以使用Series.fillna函数:

merged_df = pd.merge(df1, 
                     df2.reset_index(), 
                     left_on=['Date', 'Client'], 
                     right_on=['Session Date', 'Org'], how='inner').set_index('index')

df2['Session name'] = df2['Session name'].fillna(merged_df['Course 2'])
print (df2)
  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 3             CCC
英文:

You can avoid original index after merge by reset_index and set_index, so possible use Series.fillna:

merged_df = pd.merge(df1, 
                     df2.reset_index(), 
                     left_on=['Date', 'Client'], 
                     right_on=['Session Date', 'Org'], how='inner').set_index('index')

df2['Session name'] = df2['Session name'].fillna(merged_df['Course 2'])
print (df2)
  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 3             CCC

huangapple
  • 本文由 发表于 2023年4月4日 18:19:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75928204.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定