用pandas按组填充缺失值。

huangapple go评论104阅读模式
英文:

Impute missing values with group by in pandas

问题

以下是要翻译的内容:

"Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this."

英文:

Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.

Input data:

用pandas按组填充缺失值。

Expected Intermediate Output:

用pandas按组填充缺失值。

After removing duplicates

用pandas按组填充缺失值。

答案1

得分: 1

以下是您要翻译的代码部分:

  1. import pandas as pd
  2. import numpy as np
  3. # 使用输入数据创建一个DataFrame
  4. data = {
  5. 'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
  6. 'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
  7. '8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
  8. 'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
  9. 'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
  10. 'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
  11. }
  12. df = pd.DataFrame(data)
  13. # 将'Date'列转换为日期时间类型
  14. df['Date'] = pd.to_datetime(df['Date'])
  15. # 按'PatientId'和'Date'分组,并通过对每个组中的非空值求和来汇总值
  16. df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()
  17. print(df_combined)
  18. 您可以使用相同的df来保存您的数据框
英文:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

  1. import pandas as pd
  2. import numpy as np
  3. # Creating a DataFrame with the input data
  4. data = {
  5. &#39;PatientId&#39;: [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
  6. &#39;Date&#39;: [&#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/4/22 10:02&#39;, &#39;8/15/22 10:04&#39;, &#39;8/15/22 10:04&#39;,
  7. &#39;8/15/22 10:04&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;, &#39;10/21/22 12:19&#39;],
  8. &#39;value1&#39;: [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
  9. &#39;value3&#39;: [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
  10. &#39;value4&#39;: [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
  11. }
  12. df = pd.DataFrame(data)
  13. # Convert the &#39;Date&#39; column to datetime type
  14. df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])
  15. # Group by &#39;PatientId&#39; and &#39;Date&#39;, and aggregate the values by summing non-null values in each group
  16. df_combined = df.groupby([&#39;PatientId&#39;, &#39;Date&#39;]).sum(numeric_only=True, min_count=1).reset_index()
  17. print(df_combined)
  18. You can use same df for save your dataframe

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年6月19日 12:11:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503563.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定