用pandas按组填充缺失值。

huangapple go评论70阅读模式
英文:

Impute missing values with group by in pandas

问题

以下是要翻译的内容:

"Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this."

英文:

Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.

Input data:

用pandas按组填充缺失值。

Expected Intermediate Output:

用pandas按组填充缺失值。

After removing duplicates

用pandas按组填充缺失值。

答案1

得分: 1

以下是您要翻译的代码部分:

import pandas as pd
import numpy as np

# 使用输入数据创建一个DataFrame
data = {
    'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
             '8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
    'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}

df = pd.DataFrame(data)

# 将'Date'列转换为日期时间类型
df['Date'] = pd.to_datetime(df['Date'])

# 按'PatientId'和'Date'分组,并通过对每个组中的非空值求和来汇总值
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()

print(df_combined)

您可以使用相同的df来保存您的数据框
英文:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

import pandas as pd
import numpy as np

# Creating a DataFrame with the input data
data = {
    &#39;PatientId&#39;: [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    &#39;Date&#39;: [&#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/15/22 10:04&#39;, &#39;8/15/22 10:04&#39;,
             &#39;8/15/22 10:04&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;],
    &#39;value1&#39;: [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    &#39;value3&#39;: [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    &#39;value4&#39;: [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}

df = pd.DataFrame(data)

# Convert the &#39;Date&#39; column to datetime type
df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])

# Group by &#39;PatientId&#39; and &#39;Date&#39;, and aggregate the values by summing non-null values in each group
df_combined = df.groupby([&#39;PatientId&#39;, &#39;Date&#39;]).sum(numeric_only=True, min_count=1).reset_index()

print(df_combined)

You can use same df for save your dataframe

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年6月19日 12:11:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503563.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定