在一列中计算连续NaN值的快速方法

huangapple go评论86阅读模式
英文:

Fast way of calculating number of consecutive nan values in a column

问题

我想要将我的数据框转换,使新的数据框具有相同的形状,其中每个条目表示其位置后连续NaN的数量,如下所示:

输入:

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN     

输出:

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1     
英文:

I want to transform my dataframe so that the new DataFrame is of the same shape where each entry represents the number of consecutive NaNs counted after its position as follows:

IN:

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN     

OUT:

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1     

Similar question that I was trying to modify - https://stackoverflow.com/questions/43517953/fast-way-to-get-the-number-of-nans-in-a-column-counted-from-the-last-valid-value

答案1

得分: 2

从这个答案 https://stackoverflow.com/a/52718619/3275464 受到启发

```python
from io import StringIO
import pandas as pd

s = """    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    """

df = pd.read_csv(StringIO(s), engine='python', sep='\s+')

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c # 提供你似乎想要的输出
英文:

Inspired from this answer https://stackoverflow.com/a/52718619/3275464

from io import StringIO
import pandas as pd

s = """    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    """

df = pd.read_csv(StringIO(s), engine='python', sep='\s+')

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c #gives the output you seem to want

答案2

得分: 1

以下是翻译好的部分:

您可以尝试使用此方法来转换您的DataFrame,通过计算每列中每个位置后续的连续NaN值的数量,并将NaN值替换为计数。

import pandas as pd
import numpy as np

# 创建输入的DataFrame
data = {
    'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
    'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}

df = pd.DataFrame(data)

# 初始化计数器
counters = {col: 0 for col in df.columns}

# 转换DataFrame
for col in df.columns:
    for i in range(len(df)):
        if pd.isna(df.at[i, col]):
            counters[col] += 1
            df.at[i, col] = counters[col]
        else:
            counters[col] = 0

print(df)

希望这对您有帮助。

英文:

You can try this way to transform your DataFrame by counting the consecutive NaN values after each position in each column and replacing the NaN values with the count.

import pandas as pd
import numpy as np

# Create the input DataFrame
data = {
    'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
    'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}

df = pd.DataFrame(data)

# Initialize counters
counters = {col: 0 for col in df.columns}

# Transform the DataFrame
for col in df.columns:
    for i in range(len(df)):
        if pd.isna(df.at[i, col]):
            counters[col] += 1
            df.at[i, col] = counters[col]
        else:
            counters[col] = 0

print(df)

答案3

得分: 1

输出:

    A  B
0   0  0
1   0  0
2   3  2
3   2  1
4   1  0
5   0  0
6   0  0
7   0  0
8   0  5
9   0  4
10  3  3
11  2  2
12  1  1
英文:

One option:

tmp = df.notna()

out = tmp.apply(lambda s: s[::-1].groupby(s.ne(s.shift()).cumsum()).cumcount().add(1)
               ).mask(tmp, 0)[::-1]

Output:

    A  B
0   0  0
1   0  0
2   3  2
3   2  1
4   1  0
5   0  0
6   0  0
7   0  0
8   0  5
9   0  4
10  3  3
11  2  2
12  1  1

huangapple
  • 本文由 发表于 2023年8月10日 22:22:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876637.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定