英文:
Impute missing values with group by in pandas
问题
以下是要翻译的内容:
"Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this."
英文:
Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.
Input data:
Expected Intermediate Output:
After removing duplicates
答案1
得分: 1
以下是您要翻译的代码部分:
import pandas as pd
import numpy as np
# 使用输入数据创建一个DataFrame
data = {
'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
'8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}
df = pd.DataFrame(data)
# 将'Date'列转换为日期时间类型
df['Date'] = pd.to_datetime(df['Date'])
# 按'PatientId'和'Date'分组,并通过对每个组中的非空值求和来汇总值
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()
print(df_combined)
您可以使用相同的df来保存您的数据框。
英文:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
import pandas as pd
import numpy as np
# Creating a DataFrame with the input data
data = {
'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
'8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}
df = pd.DataFrame(data)
# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])
# Group by 'PatientId' and 'Date', and aggregate the values by summing non-null values in each group
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()
print(df_combined)
You can use same df for save your dataframe
<!-- end snippet -->
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论