根据连续日期合并记录在Python中

huangapple go评论62阅读模式
英文:

Merging records based on consecutive dates in python

问题

我想合并我的数据框记录,如果日期相同的话。在下面的示例中,我想将日期(13,14,15),(25,26),(30,31)合并在一起,因为日期是连续的。如果有任何单日间断,我想中断记录的合并。

      cust               date         description
   CUST123              2020-06-13   观察到增加的损失率
   CUST123              2020-06-13   切割工作已完成
   CUST123              2020-06-14   在狭小的区域工作
   CUST123              2020-06-15   生产关闭了附近的应用程序
   CUST123              2020-07-17   损失压力慢慢上升,发生了故障
   CUST123              2020-08-25   建立循环负载
   CUST123              2020-08-26   执行粘性测试
   CUST123              2020-08-28   工作会议之前的低能量
   CUST123              2020-08-30   执行维护服务
   CUST123              2020-08-31   重新连接控制线

预期输出

      cust               date         description
   CUST123              2020-06-13   观察到增加的损失率切割工作已完成在狭小的区域工作生产关闭了附近的应用程序
   CUST123              2020-07-17   损失压力慢慢上升,发生了故障
   CUST123              2020-08-25   建立循环负载执行粘性测试
   CUST123              2020-08-28   工作会议之前的低能量
   CUST123              2020-08-30   执行维护服务重新连接控制线
英文:

I want to merge records of my dataframe if dates are same.Here in the below example I want to merge date (13,14,15), (25,26), (30,31) together as there are continuous dates. I want to break the merging of record if there is any single day break.

      cust               date         description
   CUST123              2020-06-13   observed increased loss rate
   CUST123              2020-06-13   cut performed job
   CUST123              2020-06-14   working tight area
   CUST123              2020-06-15   production shut neighbouring app
   CUST123              2020-07-17   loss pressure slow gain trip
   CUST123              2020-08-25   established circulation load
   CUST123              2020-08-26   performed sticky test
   CUST123              2020-08-28   job meeting prior low energy
   CUST123              2020-08-30   performed maintenance service
   CUST123              2020-08-31   reconnected control line

expected output

      cust               date         description
   CUST123              2020-06-13   observed increased loss rate cut performed job 
                                      working tight area production shut neighbouring app
   CUST123              2020-07-17   loss pressure slow gain trip
   CUST123              2020-08-25   established circulation load performed sticky test
   CUST123              2020-08-28   job meeting prior low energy
   CUST123              2020-08-30   performed maintenance service reconnected control line

答案1

得分: 3

为了在日期相同的情况下合并数据框的记录,你可以这样做:

merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()

输出结果如下:

   cust        date                                     description
0  CUST123  2020-06-13  observed increased loss rate cut performed job
1  CUST123  2020-06-14                              working tight area
2  CUST123  2020-06-15                production shut neighbouring app
3  CUST123  2020-07-17                    loss pressure slow gain trip
4  CUST123  2020-08-25                    established circulation load
5  CUST123  2020-08-26                           performed sticky test
6  CUST123  2020-08-28                    job meeting prior low energy
7  CUST123  2020-08-30                   performed maintenance service
8  CUST123  2020-08-31                        reconnected control line

编辑:如果你想要合并连续的日期,保留连续范围的第一个日期,可以这样做:

# 按照 'date' 列对数据框进行排序(如果 'df' 尚未排序)
df.sort_values('date', inplace=True)

# 初始化变量
merged_data = []
prev_row = None

# 遍历行
for _, row in df.iterrows():
    if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
        merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
    else:
        merged_data[-1]['description'] += ' ' + row['description']
    prev_row = row

# 创建合并后的数据框
merged_df = pd.DataFrame(merged_data)

print(merged_df)

输出结果如下:

     cust        date                                        description
0  CUST123  2020-06-13  observed increased loss rate cut performed job...
1  CUST123  2020-07-17                       loss pressure slow gain trip
2  CUST123  2020-08-25  established circulation load performed sticky ...
3  CUST123  2020-08-28                       job meeting prior low energy
4  CUST123  2020-08-30  performed maintenance service reconnected cont...
英文:

In order to merge records of a dataframe if dates are same, you could do:

merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()

which outputs:

cust       date                                     description
0  CUST123 2020-06-13  observed increased loss rate cut performed job
1  CUST123 2020-06-14                              working tight area
2  CUST123 2020-06-15                production shut neighbouring app
3  CUST123 2020-07-17                    loss pressure slow gain trip
4  CUST123 2020-08-25                    established circulation load
5  CUST123 2020-08-26                           performed sticky test
6  CUST123 2020-08-28                    job meeting prior low energy
7  CUST123 2020-08-30                   performed maintenance service
8  CUST123 2020-08-31                        reconnected control line

EDIT: If you want to merge the consecutive dates, keeping the first date of the consecutive range, you could it like this:

# Sort DataFrame by 'date' (in case 'df' is not already sorted)
df.sort_values('date', inplace=True)

# Initialize variables
merged_data = []
prev_row = None

# Loop through the rows
for _, row in df.iterrows():
    if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
        merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
    else:
        merged_data[-1]['description'] += ' ' + row['description']
    prev_row = row

# Create merged DataFrame
merged_df = pd.DataFrame(merged_data)

print(merged_df)

Output:

0  CUST123 2020-06-13  observed increased loss rate cut performed job...
1  CUST123 2020-07-17                       loss pressure slow gain trip
2  CUST123 2020-08-25  established circulation load performed sticky ...
3  CUST123 2020-08-28                       job meeting prior low energy
4  CUST123 2020-08-30  performed maintenance service reconnected cont...

huangapple
  • 本文由 发表于 2023年7月17日 21:00:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76704752.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定