2023年7月27日 15:38:13go评论104阅读模式

英文:

Find the sum of values in rows of one column for where the other column has NAN in Pandas

问题

我有一个包含列A和B的数据框。列A中的数据是不连续的，其中一些行是NAN，而B中的数据是连续的。我想创建第三列，对于每组A中的NAN行，它将具有这些相同行中B的值的总和加上B中的下一个有效值。

对于A中的NAN和在有效数字后的行，C中的所有其他值应为NAN。

示例：

data = {
    'A': [1, 1, None, None, 2, 5, None, None, 3, 4, 3, None, 5],
    'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130]
}

除了需要B的总和加上B中的下一个有效值的行之外，其他行都可以正常工作。

我使用以下代码。但是目前看起来有点混乱。

result = df.groupby(df['A'].isnull().cumsum())['B'].sum().reset_index()
df_result = pd.DataFrame({'C': result['Pumped']})
df_result.loc[1:, 'C'] -= result.loc[0, 'Pumped']
df.loc[~mask, 'C'] = df.loc[~mask, 'Pumped']
valid_rows_after_nan = df['dWL'].notnull() & mask.shift(1).fillna(False)
df.loc[valid_rows_after_nan, 'C'] = df_result
print(df)

我希望输出的结果如下所示：

data = {
    'A': [1, 1, None, None, 2, 5, None, None, 3, 4, 3, None, 5],
    'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130],
    'C': [10, 20, None, None, 120, 60, None, None, 240, 100, 110, None, 5]
}

英文:

I have a dataframe with columns A and B. Column A has non continuous data where some of the rows are NAN and B has continuous data. I would like to create a third column where for each set of A rows with NAN it will have the sum of values in those same rows in B + the next valid value in B.
All other values in C should be NAN for NAN in A AND the value of B for rows following a valid number in A.
Example:

data = {
    &#39;A&#39;: [1, 1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
    &#39;B&#39;: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130]}

Everything works fine except for the rows where I need the sum of B + next valid value in B.
I use the following code. I have this code but is seems it's a mess by now.

`result = df.groupby(df[&#39;A&#39;].isnull().cumsum())[&#39;B&#39;].sum().reset_index()
df_result = pd.DataFrame({&#39;C&#39;: result[&#39;Pumped&#39;]})
df_result.loc[1:, &#39;C&#39;] -= result.loc[0, &#39;Pumped&#39;]
df.loc[~mask, &#39;C&#39;] = df.loc[~mask, &#39;Pumped&#39;]
valid_rows_after_nan = df[&#39;dWL&#39;].notnull() &amp; mask.shift(1).fillna(False)
df.loc[valid_rows_after_nan, &#39;C&#39;] = df_result
print(df)`

I would like the output to look like this:

`data = {
    &#39;A&#39;: [1,  1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
    &#39;B&#39;: [10, 20, 30,  40,  50, 60, 70, 80, 90, 100, 110, 120, 130],
    &#39;C&#39;: [10, 20, None, None, 120, 60, None, None, 240, 100, 110, None, 5]
}

答案1

得分: 4

使用groupby.transform的简单版本：

# 识别非NA值并反转
m = df.loc[::-1, 'A'].notna()
# 对前面的NA进行分组求和，并在NA处进行掩码
df['C'] = df.groupby(m.cumsum())['B'].transform('sum').where(m)

输出结果：

      A    B      C
0   1.0   10   10.0
1   1.0   20   20.0
2   NaN   30    NaN
3   NaN   40    NaN
4   2.0   50  120.0
5   5.0   60   60.0
6   NaN   70    NaN
7   NaN   80    NaN
8   3.0   90  240.0
9   4.0  100  100.0
10  3.0  110  110.0
11  NaN  120    NaN
12  5.0  130  250.0

英文:

A simple version using groupby.transform:

# identify the non-NA and reverse
m = df.loc[::-1, &#39;A&#39;].notna()
# group the preceding NA, sum, mask where NA
df[&#39;C&#39;] = df.groupby(m.cumsum())[&#39;B&#39;].transform(&#39;sum&#39;).where(m)

Output:

      A    B      C
0   1.0   10   10.0
1   1.0   20   20.0
2   NaN   30    NaN
3   NaN   40    NaN
4   2.0   50  120.0
5   5.0   60   60.0
6   NaN   70    NaN
7   NaN   80    NaN
8   3.0   90  240.0
9   4.0  100  100.0
10  3.0  110  110.0
11  NaN  120    NaN
12  5.0  130  250.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中，找到另一列中存在NaN值的行的某一列的值之和。

问题

答案1

TypeError: ‘NoneType’ 对象不可调用

If语句用于向pandas数据框添加列，但给出相同的数值。

How to solve the error occurred when I try to use function in tf_conversions in ROS1-melodic in a python3 environment

How to schedule awaitables for sequential execution without awaiting, without prior knowing the number of awaitables?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。