2023年4月4日 18:19:53go评论71阅读模式

英文:

Fill nan based on two columns in another data frame

问题

我有两个数据框，列标题不同，只是突出显示，因为人们将其标记为重复项。

df1 = pd.DataFrame({'Date':['7/03/2022', '9/3/2022'],
                   'Client':['Client 1','Client 2'],
                   'Course 2':['Computer skill','CCC']})

df2 = pd.DataFrame({'Session Date':['7/03/2022', '9/3/2022'],
                   'Org':['Client 1','Client 3'],
                   'Session name':[np.nan,'CCC']})

我想要做的是，如果客户和日期相同，将df2中"Session name"的空值填充为df1中的值。

以下是我的代码：

merged_df = pd.merge(df1, df2, left_on=['Date', 'Client'], right_on=['Session Date', 'Org'], how='inner')
df2['Session Name'] = merged_df.apply(lambda x: x['Course 2'] if pd.isna(x['Session name']) else x['Session name'], axis=1)
df2

但显然它不起作用，输出如下：

  Session Date     Org  Session name
0     7/03/2022  Client 1  Computer skill
1      9/3/2022  Client 3           NaN

而应该输出：

  Session Date     Org  Session Name
0     7/03/2022  Client 1  Computer skill
1      9/3/2022  Client 3  CCC

英文:

I have two data frames like (WITH DIFFERENT COLUMN HEADERS) only highlighting it because people are marking it as duplicates

df1 = pd.DataFrame({&#39;Date&#39;:[&#39;7/03/2022&#39;, &#39;9/3/2022&#39;],
                         &#39;Client&#39;:[&#39;Client 1&#39;,&#39;Client 2&#39;],
                         &#39;Course 2&#39;:[&#39;Computer skill&#39;,&#39;CCC&#39;]})

df2 = pd.DataFrame({&#39;Session Date&#39;:[&#39;7/03/2022&#39;, &#39;9/3/2022&#39;],
                         &#39;Org&#39;:[&#39;Client 1&#39;,&#39;Client 3&#39;],
                         &#39;Session name&#39;:[np.nan,&#39;CCC&#39;]})

What I want to do is fill up the null values in Session name in df2 with the one in df1 if Client and dates are the same.

This is the code that I have

merged_df = pd.merge(df1, df2, left_on=[&#39;Date&#39;, &#39;Client&#39;], right_on=[&#39;Session Date&#39;, &#39;Org&#39;], how=&#39;inner&#39;)
df2[&#39;Session Name&#39;] = merged_df.apply(lambda x: x[&#39;Course 2&#39;] if pd.isna(x[&#39;Session Name&#39;]) else x[&#39;Session Name&#39;], axis=1)
df2

But it's obviously not working,the output it prints is

	Session Date	Org	   Session Name
0	7/03/2022	   Client 1	   Computer skill
1	10/3/2022	   Client 3	    NaN

Where as it should print

	Session Date	Org	   Session Name
0	7/03/2022	   Client 1	   Computer skill
1	10/3/2022	   Client 3	    CCC

答案1

得分: 1

可以使用一个映射字典来重命名列，如果你的列是按位置对齐的：

cmap = dict(zip(df2.columns, df1.columns))
cidx = ['Date', 'Client']

out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)

# 输出
        Date    Client        Course 2
0  7/03/2022  Client 1  Computer skill
1   9/3/2022  Client 2             CCC

英文:

You can use a mapping dict to rename columns if your columns are aligned (by position):

cmap = dict(zip(df2.columns, df1.columns))
cidx = [&#39;Date&#39;, &#39;Client&#39;]

out = df1.set_index(cidx).fillna(df2.rename(columns=cmap).set_index(cidx)).reset_index()
print(out)

# Output
        Date    Client        Course 2
0  7/03/2022  Client 1  Computer skill
1   9/3/2022  Client 2             CCC

答案2

得分: 0

以下是已翻译的内容：

可能的解决方案：

    cols = df2.columns
    df2.columns = df1.columns
    df2.update(df1)
    df2.columns = cols

输出：

      会话日期       机构     会话名称
    0    2022年7月3日  客户1    计算机技能
    1    2022年9月3日  客户2    CCC

英文:

A possible solution:

cols = df2.columns
df2.columns = df1.columns
df2.update(df1)
df2.columns = cols

Output:

  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 2             CCC

答案3

得分: 0

你可以通过使用reset_index和set_index来在合并后避免原始索引，因此可以使用Series.fillna函数：

merged_df = pd.merge(df1, 
                     df2.reset_index(), 
                     left_on=['Date', 'Client'], 
                     right_on=['Session Date', 'Org'], how='inner').set_index('index')

df2['Session name'] = df2['Session name'].fillna(merged_df['Course 2'])
print (df2)
  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 3             CCC

英文:

You can avoid original index after merge by reset_index and set_index, so possible use Series.fillna:

merged_df = pd.merge(df1, 
                     df2.reset_index(), 
                     left_on=[&#39;Date&#39;, &#39;Client&#39;], 
                     right_on=[&#39;Session Date&#39;, &#39;Org&#39;], how=&#39;inner&#39;).set_index(&#39;index&#39;)

df2[&#39;Session name&#39;] = df2[&#39;Session name&#39;].fillna(merged_df[&#39;Course 2&#39;])
print (df2)
  Session Date       Org    Session name
0    7/03/2022  Client 1  Computer skill
1     9/3/2022  Client 3             CCC

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一个数据框中的两列填充NaN值。

问题

答案1

答案2

答案3

WEKA在名义数据集上的性能表现

将列表 [1, 1, 2, 2, ..] 添加到数据框的一列中。

ZeroDivisionError: division by zero (osu learning)

Python：动态地向类实例添加属性，属性返回带有输入的函数值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论