为连续的空值分配唯一值,直到出现非空值。

huangapple go评论145阅读模式
英文:

Assign a unique value to consecutive null values untill a non value

问题

我想应用一个累积计算空值的函数。

我找到的最接近的解决方案是这个:

import pandas as pd
import numpy as np

# 创建列
col = pd.Series([1, 2, np.nan, np.nan, 3, 4, np.nan, np.nan, 5])

col.isnull().cumsum()

但输出不符合我的要求:

0    0
1    0
2    1
3    2
4    2
5    2
6    3
7    4
8    4
dtype: int32

我希望输出如下:[0, 0, 1, 1, 1, 1, 2, 2, 2]。

如何实现这个目标?

英文:

I want to apply a function that does the cumulative count of null values.

The closest solution I came to was this:

import pandas as pd
import numpy as np

# create the column
col = pd.Series([1, 2, np.nan, np.nan, 3, 4, np.nan, np.nan, 5])

col.isnull().cumsum()

But the output is not the way I want:

0    0
1    0
2    1
3    2
4    2
5    2
6    3
7    4
8    4
dtype: int32

I want the output to be the following: [0, 0, 1, 1, 1, 1, 2, 2, 2].

How do I achieve this?

答案1

得分: 1

这是你要翻译的代码部分:

You seem to want to count only the first NA per stretch:
m = col.isna()
out = (m & ~m.shift(fill_value=False)).cumsum()
Shortcut:
m = col.isna()
out = (m & m.diff()).cumsum()
Output:
0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64
Intermediates:
   col      m  ~m.shift(fill_value=False)      &  cumsum
0  1.0  False                        True  False       0
1  2.0  False                        True  False       0
2  NaN   True                        True   True       1
3  NaN   True                       False  False       1
4  3.0  False                       False  False       1
5  4.0  False                        True  False       1
6  NaN   True                        True   True       2
7  NaN   True                       False  False       2
8  5.0  False                       False  False       2
Variant:
out = col.isna().astype(int).diff().eq(1).cumsum()
英文:

You seem to want to count only the first NA per stretch:

m = col.isna()
out = (m & ~m.shift(fill_value=False)).cumsum()

Shortcut:

m = col.isna()
out = (m & m.diff()).cumsum()

Output:

0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64

Intermediates:

   col      m  ~m.shift(fill_value=False)      &  cumsum
0  1.0  False                        True  False       0
1  2.0  False                        True  False       0
2  NaN   True                        True   True       1
3  NaN   True                       False  False       1
4  3.0  False                       False  False       1
5  4.0  False                        True  False       1
6  NaN   True                        True   True       2
7  NaN   True                       False  False       2
8  5.0  False                       False  False       2

Variant:

out = col.isna().astype(int).diff().eq(1).cumsum()

答案2

得分: 1

你可以使用以下代码:

# 当前行不是 n/a,而前一行是 n/a 时,递增
out = (col.shift().notna() & col.isna()).cumsum()
print(out)

# 输出
0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64
英文:

You can use:

# Increment when the previous row is not n/a AND the current row is n/a
out = (col.shift().notna() & col.isna()).cumsum()
print(out)

# Output
0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64

huangapple
  • 本文由 发表于 2023年3月9日 23:45:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75686934.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定