删除分组中如果任何列仅包含NaN值的所有行?

huangapple go评论62阅读模式
英文:

How can I drop all rows in a group if any column contains only nan values for that group?

问题

我试图删除属于一个组的所有行,如果在该组中存在一个只包含NaN值的列。

例如:

ID 列A 列B
1 2 NaN
1 3 NaN
2 2 3
3 3 NaN
3 NaN 4
4 NaN NaN
4 NaN 4

在这种情况下,我只想删除ID为1和ID为4的行,因为ID为1的列B只包含NaN,ID为4的列A只包含NaN。组2是正常的,因为没有NaN,组3也正常,因为两列都不仅包含NaN。

换句话说,我期望的输出是:

ID 列A 列B
2 2 3
3 3 NaN
3 NaN 4

如何实现这一点?

我尝试过遵循https://stackoverflow.com/questions/38574872/python-pandas-remove-group-based-on-collective-nan-count,但这仅考虑单个列,或者对所有列进行NaN聚合。

英文:

I am trying to drop all rows that belong to a group if within that group there exists a column that only contains nan values.

For example:

ID Column A Column B
1 2 Nan
1 3 Nan
2 2 3
3 3 Nan
3 NaN 4
4 Nan Nan
4 NaN 4

In this case, I want only the rows for ID 1 and ID 4 to be removed, as Column B contains only nans for ID 1 and Column A contains only nans for ID 4. Group 2 is fine because no nans, group 3 is fine because neither column contains only nan.

In other words, my expected output is this:

ID Column A Column B
2 2 3
3 3 Nan
3 NaN 4

How do I achieve this?

I tried following https://stackoverflow.com/questions/38574872/python-pandas-remove-group-based-on-collective-nan-count, but this only takes into account either a single column, or aggregates nans across all columns.

答案1

得分: 2

df.groupby('ID').filter(lambda x: ~(x.isna().all().any()))

英文:

Filter out by condition:

df.groupby('ID').filter(lambda x: ~(x.isna().all().any()))

   ID Column A Column B
2   2       2         3
3   3       3      None
4   3     None        4

答案2

得分: 0

grouped = df.groupby('ID').sum()
grouped = grouped[grouped['Column B'] > 0]
df[df.index.isin(grouped.index)]

英文:

Try this:

grouped = df.groupby('ID').sum()
grouped = grouped[grouped['Column B'] > 0]
df[df.index.isin(grouped.index)]

答案3

得分: 0

输出:

    	ID	    列A	    列B
    2	    2	    2.0	    3
    3	    3	    3.0	    NaN
    4	    3	    NaN	    4
英文:

Code

df.groupby('ID').filter(lambda x: x['Column B'].notna().sum() > 0)

output:

	ID	Column A	Column B
2	2	2.0	        3
3	3	3.0	        NaN
4	3	NaN	        4

huangapple
  • 本文由 发表于 2023年7月23日 16:35:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747313.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定