2023年2月19日 07:24:23go评论97阅读模式

英文:

How to fill missing value in a few columns at the same time

问题

I need to drop missing values in a few columns. I wrote this to do it one by one:

df2['A'].fillna(df1['A'].mean(), inplace=True)
df2['B'].fillna(df1['B'].mean(), inplace=True)
df2['C'].fillna(df1['C'].mean(), inplace=True)

Any other ways I can fill them all in one line of code?

英文:

I need to drop missing values in a few columns. I wrote this to do it one by one:

df2[&#39;A&#39;].fillna(df1[&#39;A&#39;].mean(), inplace=True)
df2[&#39;B&#39;].fillna(df1[&#39;B&#39;].mean(), inplace=True)
df2[&#39;C&#39;].fillna(df1[&#39;C&#39;].mean(), inplace=True)

Any other ways I can fill them all in one line of code?

答案1

得分: 1

你可以使用单个指令：

cols = ['A', 'B', 'C']
df[cols] = df[cols].fillna(df[cols].mean())

或者对所有数值列应用 select_dtypes：

cols = df.select_dtypes('number').columns
df[cols] = df[cols].fillna(df[cols].mean())

注意：我强烈不建议使用 inplace 参数。它可能在Pandas 2中被移除。

英文:

You can use a single instructions:

cols = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;]
df[cols] = df[cols].fillna(df[cols].mean())

Or for apply on all numeric columns, use select_dtypes:

cols = df.select_dtypes(&#39;number&#39;).columns
df[cols] = df[cols].fillna(df[cols].mean())

Note: I strongly discourage you to use inplace parameter. It will probably disappear in Pandas 2

答案2

得分: 0

[lambda c: df2[c].fillna(df1[c].mean(), inplace=True) for c in df2.columns]

英文:

[lambda c: df2[c].fillna(df1[c].mean(), inplace=True) for c in df2.columns]

答案3

得分: 0

Example 1: 使用均值填充所有列

df = df.fillna(df.mean())

结果:

	A	B	C
0	1	5	10
1	2	7.33333	11
2	2.33333	7.33333	12
3	4	8	11.75
4	2.33333	9	14

Example 2: 使用中位数填充某些列

df[["A","B"]] = df[["A","B"]].fillna(df.median())

结果:

	A	B	C
0	1	5	10
1	2	8	11
2	2	8	12
3	4	8	nan
4	2	9	14

Example 3: 使用ffill()填充所有列

解释: 缺失值用同一列中最近可用的值替代。因此，使用同一列中前一行的值来填充空白。

df = df.fillna(method='ffill')

结果:

	A	B	C
0	1	5	10
1	2	8	11
2	2	8	12
3	4	8	12
4	2	9	14

Example 4: 使用bfill()填充所有列

解释: 列中的缺失值使用上一行的下一个值来填充，也就是从底部向顶部填充值。

df = df.fillna(method='bfill')

结果:

	A	B	C
0	1	5	10
1	2	8	11
2	4	8	12
3	4	8	14
4	nan	9	14

如果要删除（不进行填充）缺失值，可以这样做：

Option 1: 删除具有一个或多个缺失值的行

df = df.dropna(how="any")

结果:

	A	B	C
0	1	5	10

Option 2: 删除所有缺失值的行

df = df.dropna(how="all")

英文:

There are few options to work with nans in a df. I'll explain some of them...

Given this example df:

	A	B	C
0	1	5	10
1	2	nan	11
2	nan	nan	12
3	4	8	nan
4	nan	9	14

> Example 1: fill all columns with mean

df = df.fillna(df.mean())

Result:

	A	B	C
0	1	5	10
1	2	7.33333	11
2	2.33333	7.33333	12
3	4	8	11.75
4	2.33333	9	14

> Example 2: fill some columns with median

df[[&quot;A&quot;,&quot;B&quot;]] = df[[&quot;A&quot;,&quot;B&quot;]].fillna(df.median())

Result:

	A	B	C
0	1	5	10
1	2	8	11
2	2	8	12
3	4	8	nan
4	2	9	14

> Example 3: fill all columns using ffill()

Explanation: Missing values are replaced with the most recent available value in the same column. So, the value of the preceding row in the same column is used to fill in the blanks.

df = df.fillna(method=&#39;ffill&#39;)

Result:

	A	B	C
0	1	5	10
1	2	8	11
2	2	8	12
3	4	8	12
4	2	9	14

> Example 4: fill all columns using bfill()

Explanation: Missing values in a column are filled using the value of the next row going up, meaning the values are filled from the bottom to the top. Basically, you're replacing the missing values with the next known non-missing value.

df = df.fillna(method=&#39;bfill&#39;)

Result:

	A	B	C
0	1	5	10
1	2	8	11
2	4	8	12
3	4	8	14
4	nan	9	14

If you wanted to DROP (no fill) the missing values. You can do this:

> Option 1: remove rows with one or more missing values

df = df.dropna(how=&quot;any&quot;)

Result:

	A	B	C
0	1	5	10

> Option 2: remove rows with all missing values

df = df.dropna(how=&quot;all&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何同时填充几列中的缺失数值

问题

答案1

答案2

答案3

有一个包含所有类型的NumPy对象吗？

Python Pandas：使用跨行比较选择多个相关行的数据框行

Itertools按键/值对对变长字典列表进行分组

创建带有观测和平均值的xarray数据集，该数据集具有合并的索引。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论