2023年6月19日 12:11:14go评论104阅读模式

英文:

Impute missing values with group by in pandas

问题

以下是要翻译的内容：

"Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this."

英文:

Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.

Input data:

Expected Intermediate Output:

After removing duplicates

答案1

得分: 1

以下是您要翻译的代码部分：

import pandas as pd
import numpy as np
# 使用输入数据创建一个DataFrame
data = {
    'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
             '8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
    'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}
df = pd.DataFrame(data)
# 将'Date'列转换为日期时间类型
df['Date'] = pd.to_datetime(df['Date'])
# 按'PatientId'和'Date'分组，并通过对每个组中的非空值求和来汇总值
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()
print(df_combined)
您可以使用相同的df来保存您的数据框。

英文:

import pandas as pd
import numpy as np
# Creating a DataFrame with the input data
data = {
    &#39;PatientId&#39;: [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    &#39;Date&#39;: [&#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/15/22 10:04&#39;, &#39;8/15/22 10:04&#39;,
             &#39;8/15/22 10:04&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;],
    &#39;value1&#39;: [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    &#39;value3&#39;: [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    &#39;value4&#39;: [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}
df = pd.DataFrame(data)
# Convert the &#39;Date&#39; column to datetime type
df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])
# Group by &#39;PatientId&#39; and &#39;Date&#39;, and aggregate the values by summing non-null values in each group
df_combined = df.groupby([&#39;PatientId&#39;, &#39;Date&#39;]).sum(numeric_only=True, min_count=1).reset_index()
print(df_combined)
You can use same df for save your dataframe

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用pandas按组填充缺失值。

问题

答案1

How to extract text from very large XML files in Python without interrupting tags while parsing incrementally?

Django prefetch_related over 3 models (one “relationship Model”

如何在Python中移除多个连续的重复字符序列

Non differentiable loss function keras

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。