英文:
Polars dataframe drop nans
问题
我需要删除任何列中存在NaN值的行。对于使用`drop_nulls()`来处理空值,
df.drop_nulls()
但是对于NaN值,我发现Series中存在`drop_nans`方法,但在DataFrame中不存在。
df['A'].drop_nans()
我正在使用的Pandas代码:
df = pd.DataFrame(
{
'A': [0, 0, 0, 1, None, 1],
'B': [1, 2, 2, 1, 1, np.nan]
}
)
df.dropna()
英文:
I need to drop rows that have a nan value in any column. As for null values with drop_nulls()
df.drop_nulls()
but for nans. I have found that the method drop_nans
exist for Series but not for DataFrames
df['A'].drop_nans()
Pandas code that I'm using:
df = pd.DataFrame(
{
'A': [0, 0, 0, 1,None, 1],
'B': [1, 2, 2, 1,1, np.nan]
}
)
df.dropna()
答案1
得分: 3
另一个定义是:保留所有值都不是 NaN
的行。
为此,我们可以使用:
.is_not_nan()
来测试 "不是 NaN"pl.col(pl.Float32, pl.Float64)
以选择只有浮点数列.all_horizontal()
来计算逐行的 True/False 比较DataFrame.filter
以保留 "True" 行
df = pl.from_repr("""
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a │
│ 0.0 ┆ 2.0 ┆ b │
│ 0.0 ┆ 2.0 ┆ c │
│ 1.0 ┆ 1.0 ┆ d │
│ NaN ┆ 1.0 ┆ e │
│ 1.0 ┆ NaN ┆ g │
└─────┴─────┴─────┘
""")
df.filter(
pl.all_horizontal(pl.col(pl.Float32, pl.Float64).is_not_nan())
)
shape: (4, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a │
│ 0.0 ┆ 2.0 ┆ b │
│ 0.0 ┆ 2.0 ┆ c │
│ 1.0 ┆ 1.0 ┆ d │
└─────┴─────┴─────┘
polars.selectors
也已经添加,提供 cs.float()
df.filter(
pl.all_horizontal(cs.float().is_not_nan())
)
英文:
Another definition is: to keep rows where all values are not NaN
For that, we can use:
.is_not_nan()
to test for "not nan"pl.col(pl.Float32, pl.Float64)
to select only float columns.all_horizontal()
to compute a row-wise True/False comparisonDataFrame.filter
to keep only the "True" rows
df = pl.from_repr("""
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a │
│ 0.0 ┆ 2.0 ┆ b │
│ 0.0 ┆ 2.0 ┆ c │
│ 1.0 ┆ 1.0 ┆ d │
│ NaN ┆ 1.0 ┆ e │
│ 1.0 ┆ NaN ┆ g │
└─────┴─────┴─────┘
""")
df.filter(
pl.all_horizontal(pl.col(pl.Float32, pl.Float64).is_not_nan())
)
shape: (4, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a │
│ 0.0 ┆ 2.0 ┆ b │
│ 0.0 ┆ 2.0 ┆ c │
│ 1.0 ┆ 1.0 ┆ d │
└─────┴─────┴─────┘
polars.selectors
has also since been added which provides cs.float()
df.filter(
pl.all_horizontal(cs.float().is_not_nan())
)
答案2
得分: 2
如果您的数据同时包含了null和NaN值,那么最简单的做法是将NaN替换为null,然后使用drop_nulls()函数。
从内到外:
pl.col(pl.Float32, pl.Float64) 选择所有的浮点列,因此能够包含NaN值。
fill_nan(None) 用None(即合适的null值)替换任何NaN值。
drop_nulls() 的功能就是其字面意思。
英文:
If you have mixed nulls and nans then the easiest thing to do is replace the nans with nulls then use drop_nulls()
df.with_columns(pl.col(pl.Float32, pl.Float64).fill_nan(None)).drop_nulls()
From inside out:
pl.col(pl.Float32, pl.Float64)
picks all the columns that are floats and hence able to be nan.
fill_nan(None)
replaces any nan value with, in this case, None which is a proper null
drop_nulls()
does exactly what it seems like it does.
答案3
得分: 0
如@jqurious建议,但使用列名
df = pl.DataFrame(
{
'A': [0, 1.0, 1, np.nan, 2],
'B': ['1', '1','1','1', None]
}
)
# 获取所有具有浮点类型的列
float_col = df.columns
float_col = [c for c in float_col if df[c].dtype in [pl.Float64, pl.Float32]]
df.filter(pl.all(pl.col(float_col).is_not_nan())).drop_nulls()
英文:
As @jqurious suggested but with column names
df = pl.DataFrame(
{
'A': [0, 1.0, 1, np.nan, 2],
'B': ['1', '1','1','1', None]
}
)
# get all columns that have a float type
float_col = df.columns
float_col = [c for c in float_col if df[c].dtype in [pl.Float64, pl.Float32]]
df.filter(pl.all(pl.col(float_col).is_not_nan())).drop_nulls()
答案4
得分: -2
试试这个:
import polars as pl
import numpy as np
# 创建一个带有一些NaN值的DataFrame
df = pl.DataFrame({
'A': [1, 2, np.nan, 4, 5],
'B': ['foo', 'bar', 'app', 'ctx', 'mpq']
})
df.to_pandas().dropna()
英文:
Try this:
import polars as pl
import numpy as np
# create a DataFrame with some NaN values
df = pl.DataFrame({
'A': [1, 2, np.nan, 4, 5],
'B': ['foo', 'bar', 'app', 'ctx', 'mpq']
})
df.to_pandas().dropna()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论