2023年8月10日 22:22:14go评论86阅读模式

英文:

Fast way of calculating number of consecutive nan values in a column

问题

我想要将我的数据框转换，使新的数据框具有相同的形状，其中每个条目表示其位置后连续NaN的数量，如下所示：

输入：

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN

输出：

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1

英文:

I want to transform my dataframe so that the new DataFrame is of the same shape where each entry represents the number of consecutive NaNs counted after its position as follows:

IN:

    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN

OUT:

    A       B      
0   0       0    
1   0       0  
2   3       2  
3   2       1 
4   1       0  
5   0       0 
6   0       0 
7   0       0 
8   0       5    
9   0       4   
10  3       3   
11  2       2 
12  1       1

答案1

得分: 2

从这个答案 https://stackoverflow.com/a/52718619/3275464 受到启发

```python
from io import StringIO
import pandas as pd

s = &quot;&quot;&quot;    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    &quot;&quot;&quot;

df = pd.read_csv(StringIO(s), engine=&#39;python&#39;, sep=&#39;\s+&#39;)

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c # 提供你似乎想要的输出

英文:

Inspired from this answer https://stackoverflow.com/a/52718619/3275464

from io import StringIO
import pandas as pd

s = &quot;&quot;&quot;    A       B      
0   0.1880  0.345 
1   0.2510  0.585  
2   NaN     NaN  
3   NaN     NaN 
4   NaN     1.150  
5   0.2300  1.210  
6   0.1670  1.290  
7   0.0835  1.400  
8   0.0418  NaN    
9   0.0209  NaN    
10  NaN     NaN    
11  NaN     NaN    
12  NaN     NaN    &quot;&quot;&quot;

df = pd.read_csv(StringIO(s), engine=&#39;python&#39;, sep=&#39;\s+&#39;)

_df = df.isna().iloc[::-1]
b = _df.cumsum()
c = b.sub(b.mask(_df).ffill().fillna(0)).astype(int).iloc[::-1]
c #gives the output you seem to want

答案2

得分: 1

以下是翻译好的部分：

您可以尝试使用此方法来转换您的DataFrame，通过计算每列中每个位置后续的连续NaN值的数量，并将NaN值替换为计数。

import pandas as pd
import numpy as np

# 创建输入的DataFrame
data = {
    'A': [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
    'B': [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}

df = pd.DataFrame(data)

# 初始化计数器
counters = {col: 0 for col in df.columns}

# 转换DataFrame
for col in df.columns:
    for i in range(len(df)):
        if pd.isna(df.at[i, col]):
            counters[col] += 1
            df.at[i, col] = counters[col]
        else:
            counters[col] = 0

print(df)

希望这对您有帮助。

英文:

You can try this way to transform your DataFrame by counting the consecutive NaN values after each position in each column and replacing the NaN values with the count.

import pandas as pd
import numpy as np

# Create the input DataFrame
data = {
    &#39;A&#39;: [0.1880, 0.2510, np.nan, np.nan, np.nan, 0.2300, 0.1670, 0.0835, 0.0418, 0.0209, np.nan, np.nan, np.nan],
    &#39;B&#39;: [0.345, 0.585, np.nan, np.nan, 1.150, 1.210, 1.290, 1.400, np.nan, np.nan, np.nan, np.nan, np.nan]
}

df = pd.DataFrame(data)

# Initialize counters
counters = {col: 0 for col in df.columns}

# Transform the DataFrame
for col in df.columns:
    for i in range(len(df)):
        if pd.isna(df.at[i, col]):
            counters[col] += 1
            df.at[i, col] = counters[col]
        else:
            counters[col] = 0

print(df)

答案3

得分: 1

输出：

英文:

One option:

tmp = df.notna()

out = tmp.apply(lambda s: s[::-1].groupby(s.ne(s.shift()).cumsum()).cumcount().add(1)
               ).mask(tmp, 0)[::-1]

Output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一列中计算连续NaN值的快速方法

问题

答案1

答案2

答案3

如何使用Python在Excel中将列分割为两个子列，放在其父列下。

Android 13中Kivy应用程序的蓝牙权限

遇到一个语法错误，当我想根据列的数值删除行时。

在使用 Chaquopy 在 Android Studio 上运行 Python 脚本时无法打开相机。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论