Polars数据框删除NaN值

huangapple go评论62阅读模式
英文:

Polars dataframe drop nans

问题

我需要删除任何列中存在NaN值的行对于使用`drop_nulls()`来处理空值

    df.drop_nulls()

但是对于NaN值我发现Series中存在`drop_nans`方法但在DataFrame中不存在

    df['A'].drop_nans()

我正在使用的Pandas代码

    df = pd.DataFrame(
        {
            'A': [0, 0, 0, 1, None, 1],
            'B': [1, 2, 2, 1, 1, np.nan]
        }
    )
    df.dropna()
英文:

I need to drop rows that have a nan value in any column. As for null values with drop_nulls()

df.drop_nulls()

but for nans. I have found that the method drop_nans exist for Series but not for DataFrames

df['A'].drop_nans()

Pandas code that I'm using:

df = pd.DataFrame(
    {
        'A': [0, 0, 0, 1,None, 1],
        'B': [1, 2, 2, 1,1, np.nan]
    }
)
df.dropna()

答案1

得分: 3

另一个定义是:保留所有值都不是 NaN 的行。

为此,我们可以使用:

df = pl.from_repr("""
┌─────┬─────┬─────┐
 A    B    C   
 ---  ---  --- 
 f64  f64  str 
╞═════╪═════╪═════╡
 0.0  1.0  a   
 0.0  2.0  b   
 0.0  2.0  c   
 1.0  1.0  d   
 NaN  1.0  e   
 1.0  NaN  g   
└─────┴─────┴─────┘
""")
df.filter(
   pl.all_horizontal(pl.col(pl.Float32, pl.Float64).is_not_nan())
)
shape: (4, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a   │
│ 0.0 ┆ 2.0 ┆ b   │
│ 0.0 ┆ 2.0 ┆ c   │
│ 1.0 ┆ 1.0 ┆ d   │
└─────┴─────┴─────┘

polars.selectors 也已经添加,提供 cs.float()

df.filter(
   pl.all_horizontal(cs.float().is_not_nan())
)
英文:

Another definition is: to keep rows where all values are not NaN

For that, we can use:

df = pl.from_repr("""
┌─────┬─────┬─────┐
 A    B    C   
 ---  ---  --- 
 f64  f64  str 
╞═════╪═════╪═════╡
 0.0  1.0  a   
 0.0  2.0  b   
 0.0  2.0  c   
 1.0  1.0  d   
 NaN  1.0  e   
 1.0  NaN  g   
└─────┴─────┴─────┘
""")
df.filter(
   pl.all_horizontal(pl.col(pl.Float32, pl.Float64).is_not_nan())
)
shape: (4, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 0.0 ┆ 1.0 ┆ a   │
│ 0.0 ┆ 2.0 ┆ b   │
│ 0.0 ┆ 2.0 ┆ c   │
│ 1.0 ┆ 1.0 ┆ d   │
└─────┴─────┴─────┘

polars.selectors has also since been added which provides cs.float()

df.filter(
   pl.all_horizontal(cs.float().is_not_nan())
)

答案2

得分: 2

如果您的数据同时包含了null和NaN值,那么最简单的做法是将NaN替换为null,然后使用drop_nulls()函数。

从内到外:

pl.col(pl.Float32, pl.Float64) 选择所有的浮点列,因此能够包含NaN值。

fill_nan(None) 用None(即合适的null值)替换任何NaN值。

drop_nulls() 的功能就是其字面意思。

英文:

If you have mixed nulls and nans then the easiest thing to do is replace the nans with nulls then use drop_nulls()

df.with_columns(pl.col(pl.Float32, pl.Float64).fill_nan(None)).drop_nulls()

From inside out:

pl.col(pl.Float32, pl.Float64) picks all the columns that are floats and hence able to be nan.

fill_nan(None) replaces any nan value with, in this case, None which is a proper null

drop_nulls() does exactly what it seems like it does.

答案3

得分: 0

@jqurious建议但使用列名

df = pl.DataFrame(
    {
        'A': [0, 1.0, 1, np.nan, 2],
        'B': ['1', '1','1','1', None]
    }
)

# 获取所有具有浮点类型的列
float_col = df.columns
float_col = [c for c in float_col if df[c].dtype in [pl.Float64, pl.Float32]]

df.filter(pl.all(pl.col(float_col).is_not_nan())).drop_nulls()
英文:

As @jqurious suggested but with column names

df = pl.DataFrame(
    {
        'A': [0, 1.0, 1, np.nan, 2],
        'B': ['1', '1','1','1', None]
    }
)

# get all columns that have a float type
float_col = df.columns
float_col = [c for c in float_col if df[c].dtype in [pl.Float64, pl.Float32]]

df.filter(pl.all(pl.col(float_col).is_not_nan())).drop_nulls()

答案4

得分: -2

试试这个:

import polars as pl
import numpy as np

# 创建一个带有一些NaN值的DataFrame
df = pl.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': ['foo', 'bar', 'app', 'ctx', 'mpq']
})

df.to_pandas().dropna()
英文:

Try this:

import polars as pl
import numpy as np

# create a DataFrame with some NaN values
df = pl.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': ['foo', 'bar', 'app', 'ctx', 'mpq']
})

df.to_pandas().dropna()

huangapple
  • 本文由 发表于 2023年2月24日 01:43:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548444.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定