2023年7月18日 08:10:30go评论90阅读模式

英文:

Creating a moving average with groupby() not using the first value in the group

问题

我正在尝试在使用groupby函数后创建移动平均值。我有患者（假设的）数据，已分组，然后我尝试创建他们的乳酸的移动平均值，回顾以前的数据，即第一个值应为NaN，然后第二个值应为第一个观察到的乳酸，依此类推。

df['LAC_missing'] = pd.to_numeric(df['LAC_missing'], errors='coerce')
grouped = df.groupby(['Test ID', 'Appointment Date:'])
# 计算累积和
df['Cumulative Sum'] = grouped['LAC_missing'].cumsum()
df['Count'] = grouped.cumcount()
# 计算移动平均值（当'Count' > 0时，除以'Count'，否则设为NaN）
df['LAC Moving Average'] = df['Cumulative Sum'] / df['Count'].where(df['Count'] > 0)

查看图片描述

英文:

I'm trying to create a moving average after using the groupby function. I have patient's (hypothetical) data that is made into groups, then I try to create a moving average of their Lactate looking back, as in the first value should be NaN, then the second value should be the first observed Lactate, and so on.

df[&#39;LAC_missing&#39;] = pd.to_numeric(df[&#39;LAC_missing&#39;], errors=&#39;coerce&#39;)
grouped = df.groupby([&#39;Test ID&#39;, &#39;Appointment Date:&#39;])
# Calculate the cumulative sum 
df[&#39;Cumulative Sum&#39;] = grouped[&#39;LAC_missing&#39;].cumsum()
df[&#39;Count&#39;] = grouped.cumcount()
# Calculate the moving average (divide by &#39;Count&#39; where &#39;Count&#39; &gt; 0, otherwise set to NaN)
df[&#39;LAC Moving Average&#39;] = df[&#39;Cumulative Sum&#39;] / df[&#39;Count&#39;].where(df[&#39;Count&#39;] &gt; 0)

enter image description here

答案1

得分: 0

看起来（根据你的代码），你想要类似以下的效果：

import pandas as pd
import numpy as np
# 创建一个示例数据集
data = {
    'Test ID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Appointment Date': ['2023-07-16', '2023-07-17', '2023-07-18'] * 3,
    'LAC_missing': [1.4, np.nan, 1.0, np.nan, 1.7, 1.8, 1.2, 1.3, 1.9]
}
df = pd.DataFrame(data)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'])
# 根据你的代码，似乎你想要这样做：
df['LAC Moving Average'] = df.groupby('Test ID')['LAC_missing'].cumsum() / df.groupby('Test ID')['LAC_missing'].cumcount().where(lambda x: x != 0, pd.NA)
# 但是“就像第一个值应该是NaN，然后第二个值应该是第一个观察到的Lactate，依此类推”让我认为可能是这样：
df['Previous LAC'] = df.groupby('Test ID')['LAC_missing'].shift(1)
   Test ID Appointment Date  LAC_missing  LAC Moving Average  Previous LAC
0        1       2023-07-16          1.4                 NaN           NaN
1        1       2023-07-17          NaN                 NaN           1.4
2        1       2023-07-18          1.0                1.20           NaN
3        2       2023-07-16          NaN                 NaN           NaN
4        2       2023-07-17          1.7                1.70           NaN
5        2       2023-07-18          1.8                1.75           1.7
6        3       2023-07-16          1.2                 NaN           NaN
7        3       2023-07-17          1.3                2.50           1.2
8        3       2023-07-18          1.9                2.20           1.3

英文:

It looks like (from your code) you're wanting something like this:

import pandas as pd
import numpy as np
# Create a toy dataset
data = {
    &#39;Test ID&#39;: [1, 1, 1, 2, 2, 2, 3, 3, 3],
    &#39;Appointment Date&#39;: [&#39;2023-07-16&#39;, &#39;2023-07-17&#39;, &#39;2023-07-18&#39;] * 3,
    &#39;LAC_missing&#39;: [1.4, np.nan, 1.0, np.nan, 1.7, 1.8, 1.2, 1.3, 1.9]
}
df = pd.DataFrame(data)
df[&#39;Appointment Date&#39;] = pd.to_datetime(df[&#39;Appointment Date&#39;])
# it looks like from your code you are doing this:
df[&#39;LAC Moving Average&#39;] = df.groupby(&#39;Test ID&#39;)[&#39;LAC_missing&#39;].cumsum() / df.groupby(&#39;Test ID&#39;)[&#39;LAC_missing&#39;].cumcount().where(lambda x: x != 0, pd.NA)
# but &quot;as in the first value should be NaN, then the second value should be the first observed Lactate, and so on&quot; makes me think it might be this:
df[&#39;Previous LAC&#39;] = df.groupby(&#39;Test ID&#39;)[&#39;LAC_missing&#39;].shift(1)
   Test ID Appointment Date  LAC_missing  LAC Moving Average  Previous LAC
0        1       2023-07-16          1.4                 NaN           NaN
1        1       2023-07-17          NaN                 NaN           1.4
2        1       2023-07-18          1.0                1.20           NaN
3        2       2023-07-16          NaN                 NaN           NaN
4        2       2023-07-17          1.7                1.70           NaN
5        2       2023-07-18          1.8                1.75           1.7
6        3       2023-07-16          1.2                 NaN           NaN
7        3       2023-07-17          1.3                2.50           1.2
8        3       2023-07-18          1.9                2.20           1.3

答案2

得分: 0

# 将 'LAC_missing' 列转换为数值，将非数值值替换为 NaN
df['LAC_missing'] = pd.to_numeric(df['LAC_missing'], errors='coerce')
grouped = df.groupby(['Test ID', 'Appointment Date:'])
# 但是 "就像第一个值应该是 NaN，然后第二个值应该是第一个观察到的乳酸，依此类推" 使我认为可能是这样的：
df['Previous LAC'] = grouped['LAC_missing'].shift(1)
# 计算累积和
df['Cumulative Sum'] = grouped['Previous LAC'].cumsum()
# 用每个组中前一个非空值来填充 'Cumulative Sum' 列中的 NaN 值
df['Cumulative Sum'] = df['Cumulative Sum'].fillna(method='ffill')
# 计算每个组中当前时间点的非 NaN 值的计数
count_df = grouped['Previous LAC'].apply(lambda x: x.notna().cumsum()).reset_index(drop=True)
df['Count'] = count_df.reset_index(drop=True)
# 计算移动平均值（除以 'Count'，其中 'Count' > 0，否则设置为 NaN）
df['LAC Moving Average'] = df['Cumulative Sum'] / df['Count'].where(df['Count'] > 0)

这是你提供的代码的中文翻译部分。

英文:

# Convert &#39;LAC_missing&#39; column to numeric, replacing non-numeric values with NaN
df[&#39;LAC_missing&#39;] = pd.to_numeric(df[&#39;LAC_missing&#39;], errors=&#39;coerce&#39;)
grouped = df.groupby([&#39;Test ID&#39;, &#39;Appointment Date:&#39;])
# but &quot;as in the first value should be NaN, then the second value should be the first observed Lactate, and so on&quot; makes me think it might be this:
df[&#39;Previous LAC&#39;] = grouped[&#39;LAC_missing&#39;].shift(1)
# Calculate the cumulative sum
df[&#39;Cumulative Sum&#39;] = grouped[&#39;Previous LAC&#39;].cumsum()
# Fill the NaN values in &#39;Cumulative Sum&#39; column with the previous non-null value in each group
df[&#39;Cumulative Sum&#39;] = df[&#39;Cumulative Sum&#39;].fillna(method=&#39;ffill&#39;)
# Calculate the count of non-NaN values for each group up to the current time
count_df = grouped[&#39;Previous LAC&#39;].apply(lambda x: x.notna().cumsum()).reset_index(drop=True)
df[&#39;Count&#39;] = count_df.reset_index(drop=True)
# Calculate the moving average (divide by &#39;Count&#39; where &#39;Count&#39; &gt; 0, otherwise set to NaN)
df[&#39;LAC Moving Average&#39;] = df[&#39;Cumulative Sum&#39;] / df[&#39;Count&#39;].where(df[&#39;Count&#39;] &gt; 0)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用groupby()创建移动平均，不使用组内的第一个值。

问题

答案1

答案2

将密码变成星号的方法是什么？

fastAPI后台任务的执行时间可能比直接调用函数长100倍。

最少步骤到达1

如何创建一个包含列名和唯一值的新数据框？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。