英文:
Fast way of calculating number of consecutive nan values in a column
问题
我想要将我的数据框转换,使新的数据框具有相同的形状,其中每个条目表示其位置后连续NaN的数量,如下所示:
输入:
A B
0 0.1880 0.345
1 0.2510 0.585
2 NaN NaN
3 NaN NaN
4 NaN 1.150
5 0.2300 1.210
6 0.1670 1.290
7 0.0835 1.400
8 0.0418 NaN
9 0.0209 NaN
10 NaN NaN
11 NaN NaN
12 NaN NaN
输出:
A B
0 0 0
1 0 0
2 3 2
3 2 1
4 1 0
5 0 0
6 0 0
7 0 0
8 0 5
9 0 4
10 3 3
11 2 2
12 1 1
英文:
I want to transform my dataframe so that the new DataFrame is of the same shape where each entry represents the number of consecutive NaNs counted after its position as follows:
IN:
A B
0 0.1880 0.345
1 0.2510 0.585
2 NaN NaN
3 NaN NaN
4 NaN 1.150
5 0.2300 1.210
6 0.1670 1.290
7 0.0835 1.400
8 0.0418 NaN
9 0.0209 NaN
10 NaN NaN
11 NaN NaN
12 NaN NaN
OUT:
A B
0 0 0
1 0 0
2 3 2
3 2 1
4 1 0
5 0 0
6 0 0
7 0 0
8 0 5
9 0 4
10 3 3
11 2 2
12 1 1
Similar question that I was trying to modify - https://stackoverflow.com/questions/43517953/fast-way-to-get-the-number-of-nans-in-a-column-counted-from-the-last-valid-value
答案1
得分: 2
从这个答案 https://stackoverflow.com/a/52718619/3275464 受到启发
```python
from io import StringIO
import pandas as pd
s = """ A B
0 0.1880 0.345
1 0.2510 0.585
2 NaN NaN
3 NaN NaN
4 NaN 1.150
5 0.2300 1.210
6 0.1670 1.290
7 0.0835 1.400
8 0.0418 NaN
9 0.0209 NaN
10 NaN NaN
11 NaN NaN
12 NaN NaN """
df = pd.read_csv(StringIO(s), engine='python', sep='\s+')
_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c # 提供你似乎想要的输出
英文:
Inspired from this answer https://stackoverflow.com/a/52718619/3275464
from io import StringIO
import pandas as pd
s = """ A B
0 0.1880 0.345
1 0.2510 0.585
2 NaN NaN
3 NaN NaN
4 NaN 1.150
5 0.2300 1.210
6 0.1670 1.290
7 0.0835 1.400
8 0.0418 NaN
9 0.0209 NaN
10 NaN NaN
11 NaN NaN
12 NaN NaN """
df = pd.read_csv(StringIO(s), engine='python', sep='\s+')
_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c #gives the output you seem to want
答案2
得分: 1
以下是翻译好的部分:
您可以尝试使用此方法来转换您的DataFrame,通过计算每列中每个位置后续的连续NaN值的数量,并将NaN值替换为计数。
import pandas as pd
import numpy as np
# 创建输入的DataFrame
data = {
'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
# 初始化计数器
counters = {col: 0 for col in df.columns}
# 转换DataFrame
for col in df.columns:
for i in range(len(df)):
if pd.isna(df.at[i, col]):
counters[col] += 1
df.at[i, col] = counters[col]
else:
counters[col] = 0
print(df)
希望这对您有帮助。
英文:
You can try this way to transform your DataFrame by counting the consecutive NaN values after each position in each column and replacing the NaN values with the count.
import pandas as pd
import numpy as np
# Create the input DataFrame
data = {
'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
# Initialize counters
counters = {col: 0 for col in df.columns}
# Transform the DataFrame
for col in df.columns:
for i in range(len(df)):
if pd.isna(df.at[i, col]):
counters[col] += 1
df.at[i, col] = counters[col]
else:
counters[col] = 0
print(df)
答案3
得分: 1
输出:
A B
0 0 0
1 0 0
2 3 2
3 2 1
4 1 0
5 0 0
6 0 0
7 0 0
8 0 5
9 0 4
10 3 3
11 2 2
12 1 1
英文:
One option:
tmp = df.notna()
out = tmp.apply(lambda s: s[::-1].groupby(s.ne(s.shift()).cumsum()).cumcount().add(1)
).mask(tmp, 0)[::-1]
Output:
A B
0 0 0
1 0 0
2 3 2
3 2 1
4 1 0
5 0 0
6 0 0
7 0 0
8 0 5
9 0 4
10 3 3
11 2 2
12 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论