英文:
Assign a unique value to consecutive null values untill a non value
问题
我想应用一个累积计算空值的函数。
我找到的最接近的解决方案是这个:
import pandas as pd
import numpy as np
# 创建列
col = pd.Series([1, 2, np.nan, np.nan, 3, 4, np.nan, np.nan, 5])
col.isnull().cumsum()
但输出不符合我的要求:
0 0
1 0
2 1
3 2
4 2
5 2
6 3
7 4
8 4
dtype: int32
我希望输出如下:[0, 0, 1, 1, 1, 1, 2, 2, 2]。
如何实现这个目标?
英文:
I want to apply a function that does the cumulative count of null values.
The closest solution I came to was this:
import pandas as pd
import numpy as np
# create the column
col = pd.Series([1, 2, np.nan, np.nan, 3, 4, np.nan, np.nan, 5])
col.isnull().cumsum()
But the output is not the way I want:
0 0
1 0
2 1
3 2
4 2
5 2
6 3
7 4
8 4
dtype: int32
I want the output to be the following: [0, 0, 1, 1, 1, 1, 2, 2, 2].
How do I achieve this?
答案1
得分: 1
这是你要翻译的代码部分:
You seem to want to count only the first NA per stretch:
m = col.isna()
out = (m & ~m.shift(fill_value=False)).cumsum()
Shortcut:
m = col.isna()
out = (m & m.diff()).cumsum()
Output:
0 0
1 0
2 1
3 1
4 1
5 1
6 2
7 2
8 2
dtype: int64
Intermediates:
col m ~m.shift(fill_value=False) & cumsum
0 1.0 False True False 0
1 2.0 False True False 0
2 NaN True True True 1
3 NaN True False False 1
4 3.0 False False False 1
5 4.0 False True False 1
6 NaN True True True 2
7 NaN True False False 2
8 5.0 False False False 2
Variant:
out = col.isna().astype(int).diff().eq(1).cumsum()
英文:
You seem to want to count only the first NA per stretch:
m = col.isna()
out = (m & ~m.shift(fill_value=False)).cumsum()
Shortcut:
m = col.isna()
out = (m & m.diff()).cumsum()
Output:
0 0
1 0
2 1
3 1
4 1
5 1
6 2
7 2
8 2
dtype: int64
Intermediates:
col m ~m.shift(fill_value=False) & cumsum
0 1.0 False True False 0
1 2.0 False True False 0
2 NaN True True True 1
3 NaN True False False 1
4 3.0 False False False 1
5 4.0 False True False 1
6 NaN True True True 2
7 NaN True False False 2
8 5.0 False False False 2
Variant:
out = col.isna().astype(int).diff().eq(1).cumsum()
答案2
得分: 1
你可以使用以下代码:
# 当前行不是 n/a,而前一行是 n/a 时,递增
out = (col.shift().notna() & col.isna()).cumsum()
print(out)
# 输出
0 0
1 0
2 1
3 1
4 1
5 1
6 2
7 2
8 2
dtype: int64
英文:
You can use:
# Increment when the previous row is not n/a AND the current row is n/a
out = (col.shift().notna() & col.isna()).cumsum()
print(out)
# Output
0 0
1 0
2 1
3 1
4 1
5 1
6 2
7 2
8 2
dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论