根据连续日期合并记录在Python中

huangapple go评论82阅读模式
英文:

Merging records based on consecutive dates in python

问题

我想合并我的数据框记录,如果日期相同的话。在下面的示例中,我想将日期(13,14,15),(25,26),(30,31)合并在一起,因为日期是连续的。如果有任何单日间断,我想中断记录的合并。

  1. cust date description
  2. CUST123 2020-06-13 观察到增加的损失率
  3. CUST123 2020-06-13 切割工作已完成
  4. CUST123 2020-06-14 在狭小的区域工作
  5. CUST123 2020-06-15 生产关闭了附近的应用程序
  6. CUST123 2020-07-17 损失压力慢慢上升,发生了故障
  7. CUST123 2020-08-25 建立循环负载
  8. CUST123 2020-08-26 执行粘性测试
  9. CUST123 2020-08-28 工作会议之前的低能量
  10. CUST123 2020-08-30 执行维护服务
  11. CUST123 2020-08-31 重新连接控制线

预期输出

  1. cust date description
  2. CUST123 2020-06-13 观察到增加的损失率切割工作已完成在狭小的区域工作生产关闭了附近的应用程序
  3. CUST123 2020-07-17 损失压力慢慢上升,发生了故障
  4. CUST123 2020-08-25 建立循环负载执行粘性测试
  5. CUST123 2020-08-28 工作会议之前的低能量
  6. CUST123 2020-08-30 执行维护服务重新连接控制线
英文:

I want to merge records of my dataframe if dates are same.Here in the below example I want to merge date (13,14,15), (25,26), (30,31) together as there are continuous dates. I want to break the merging of record if there is any single day break.

  1. cust date description
  2. CUST123 2020-06-13 observed increased loss rate
  3. CUST123 2020-06-13 cut performed job
  4. CUST123 2020-06-14 working tight area
  5. CUST123 2020-06-15 production shut neighbouring app
  6. CUST123 2020-07-17 loss pressure slow gain trip
  7. CUST123 2020-08-25 established circulation load
  8. CUST123 2020-08-26 performed sticky test
  9. CUST123 2020-08-28 job meeting prior low energy
  10. CUST123 2020-08-30 performed maintenance service
  11. CUST123 2020-08-31 reconnected control line

expected output

  1. cust date description
  2. CUST123 2020-06-13 observed increased loss rate cut performed job
  3. working tight area production shut neighbouring app
  4. CUST123 2020-07-17 loss pressure slow gain trip
  5. CUST123 2020-08-25 established circulation load performed sticky test
  6. CUST123 2020-08-28 job meeting prior low energy
  7. CUST123 2020-08-30 performed maintenance service reconnected control line

答案1

得分: 3

为了在日期相同的情况下合并数据框的记录,你可以这样做:

  1. merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()

输出结果如下:

  1. cust date description
  2. 0 CUST123 2020-06-13 observed increased loss rate cut performed job
  3. 1 CUST123 2020-06-14 working tight area
  4. 2 CUST123 2020-06-15 production shut neighbouring app
  5. 3 CUST123 2020-07-17 loss pressure slow gain trip
  6. 4 CUST123 2020-08-25 established circulation load
  7. 5 CUST123 2020-08-26 performed sticky test
  8. 6 CUST123 2020-08-28 job meeting prior low energy
  9. 7 CUST123 2020-08-30 performed maintenance service
  10. 8 CUST123 2020-08-31 reconnected control line

编辑:如果你想要合并连续的日期,保留连续范围的第一个日期,可以这样做:

  1. # 按照 'date' 列对数据框进行排序(如果 'df' 尚未排序)
  2. df.sort_values('date', inplace=True)
  3. # 初始化变量
  4. merged_data = []
  5. prev_row = None
  6. # 遍历行
  7. for _, row in df.iterrows():
  8. if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
  9. merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
  10. else:
  11. merged_data[-1]['description'] += ' ' + row['description']
  12. prev_row = row
  13. # 创建合并后的数据框
  14. merged_df = pd.DataFrame(merged_data)
  15. print(merged_df)

输出结果如下:

  1. cust date description
  2. 0 CUST123 2020-06-13 observed increased loss rate cut performed job...
  3. 1 CUST123 2020-07-17 loss pressure slow gain trip
  4. 2 CUST123 2020-08-25 established circulation load performed sticky ...
  5. 3 CUST123 2020-08-28 job meeting prior low energy
  6. 4 CUST123 2020-08-30 performed maintenance service reconnected cont...
英文:

In order to merge records of a dataframe if dates are same, you could do:

  1. merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()

which outputs:

  1. cust date description
  2. 0 CUST123 2020-06-13 observed increased loss rate cut performed job
  3. 1 CUST123 2020-06-14 working tight area
  4. 2 CUST123 2020-06-15 production shut neighbouring app
  5. 3 CUST123 2020-07-17 loss pressure slow gain trip
  6. 4 CUST123 2020-08-25 established circulation load
  7. 5 CUST123 2020-08-26 performed sticky test
  8. 6 CUST123 2020-08-28 job meeting prior low energy
  9. 7 CUST123 2020-08-30 performed maintenance service
  10. 8 CUST123 2020-08-31 reconnected control line

EDIT: If you want to merge the consecutive dates, keeping the first date of the consecutive range, you could it like this:

  1. # Sort DataFrame by 'date' (in case 'df' is not already sorted)
  2. df.sort_values('date', inplace=True)
  3. # Initialize variables
  4. merged_data = []
  5. prev_row = None
  6. # Loop through the rows
  7. for _, row in df.iterrows():
  8. if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
  9. merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
  10. else:
  11. merged_data[-1]['description'] += ' ' + row['description']
  12. prev_row = row
  13. # Create merged DataFrame
  14. merged_df = pd.DataFrame(merged_data)
  15. print(merged_df)

Output:

  1. 0 CUST123 2020-06-13 observed increased loss rate cut performed job...
  2. 1 CUST123 2020-07-17 loss pressure slow gain trip
  3. 2 CUST123 2020-08-25 established circulation load performed sticky ...
  4. 3 CUST123 2020-08-28 job meeting prior low energy
  5. 4 CUST123 2020-08-30 performed maintenance service reconnected cont...

huangapple
  • 本文由 发表于 2023年7月17日 21:00:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76704752.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定