英文:
Merging records based on consecutive dates in python
问题
我想合并我的数据框记录,如果日期相同的话。在下面的示例中,我想将日期(13,14,15),(25,26),(30,31)合并在一起,因为日期是连续的。如果有任何单日间断,我想中断记录的合并。
cust date description
CUST123 2020-06-13 观察到增加的损失率
CUST123 2020-06-13 切割工作已完成
CUST123 2020-06-14 在狭小的区域工作
CUST123 2020-06-15 生产关闭了附近的应用程序
CUST123 2020-07-17 损失压力慢慢上升,发生了故障
CUST123 2020-08-25 建立循环负载
CUST123 2020-08-26 执行粘性测试
CUST123 2020-08-28 工作会议之前的低能量
CUST123 2020-08-30 执行维护服务
CUST123 2020-08-31 重新连接控制线
预期输出
cust date description
CUST123 2020-06-13 观察到增加的损失率切割工作已完成在狭小的区域工作生产关闭了附近的应用程序
CUST123 2020-07-17 损失压力慢慢上升,发生了故障
CUST123 2020-08-25 建立循环负载执行粘性测试
CUST123 2020-08-28 工作会议之前的低能量
CUST123 2020-08-30 执行维护服务重新连接控制线
英文:
I want to merge records of my dataframe if dates are same.Here in the below example I want to merge date (13,14,15), (25,26), (30,31) together as there are continuous dates. I want to break the merging of record if there is any single day break.
cust date description
CUST123 2020-06-13 observed increased loss rate
CUST123 2020-06-13 cut performed job
CUST123 2020-06-14 working tight area
CUST123 2020-06-15 production shut neighbouring app
CUST123 2020-07-17 loss pressure slow gain trip
CUST123 2020-08-25 established circulation load
CUST123 2020-08-26 performed sticky test
CUST123 2020-08-28 job meeting prior low energy
CUST123 2020-08-30 performed maintenance service
CUST123 2020-08-31 reconnected control line
expected output
cust date description
CUST123 2020-06-13 observed increased loss rate cut performed job
working tight area production shut neighbouring app
CUST123 2020-07-17 loss pressure slow gain trip
CUST123 2020-08-25 established circulation load performed sticky test
CUST123 2020-08-28 job meeting prior low energy
CUST123 2020-08-30 performed maintenance service reconnected control line
答案1
得分: 3
为了在日期相同的情况下合并数据框的记录,你可以这样做:
merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()
输出结果如下:
cust date description
0 CUST123 2020-06-13 observed increased loss rate cut performed job
1 CUST123 2020-06-14 working tight area
2 CUST123 2020-06-15 production shut neighbouring app
3 CUST123 2020-07-17 loss pressure slow gain trip
4 CUST123 2020-08-25 established circulation load
5 CUST123 2020-08-26 performed sticky test
6 CUST123 2020-08-28 job meeting prior low energy
7 CUST123 2020-08-30 performed maintenance service
8 CUST123 2020-08-31 reconnected control line
编辑:如果你想要合并连续的日期,保留连续范围的第一个日期,可以这样做:
# 按照 'date' 列对数据框进行排序(如果 'df' 尚未排序)
df.sort_values('date', inplace=True)
# 初始化变量
merged_data = []
prev_row = None
# 遍历行
for _, row in df.iterrows():
if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
else:
merged_data[-1]['description'] += ' ' + row['description']
prev_row = row
# 创建合并后的数据框
merged_df = pd.DataFrame(merged_data)
print(merged_df)
输出结果如下:
cust date description
0 CUST123 2020-06-13 observed increased loss rate cut performed job...
1 CUST123 2020-07-17 loss pressure slow gain trip
2 CUST123 2020-08-25 established circulation load performed sticky ...
3 CUST123 2020-08-28 job meeting prior low energy
4 CUST123 2020-08-30 performed maintenance service reconnected cont...
英文:
In order to merge records of a dataframe if dates are same, you could do:
merged_df = df.groupby(['cust', 'date'])['description'].apply(' '.join).reset_index()
which outputs:
cust date description
0 CUST123 2020-06-13 observed increased loss rate cut performed job
1 CUST123 2020-06-14 working tight area
2 CUST123 2020-06-15 production shut neighbouring app
3 CUST123 2020-07-17 loss pressure slow gain trip
4 CUST123 2020-08-25 established circulation load
5 CUST123 2020-08-26 performed sticky test
6 CUST123 2020-08-28 job meeting prior low energy
7 CUST123 2020-08-30 performed maintenance service
8 CUST123 2020-08-31 reconnected control line
EDIT: If you want to merge the consecutive dates, keeping the first date of the consecutive range, you could it like this:
# Sort DataFrame by 'date' (in case 'df' is not already sorted)
df.sort_values('date', inplace=True)
# Initialize variables
merged_data = []
prev_row = None
# Loop through the rows
for _, row in df.iterrows():
if prev_row is None or row['cust'] != prev_row['cust'] or (row['date'] - prev_row['date']).days > 1:
merged_data.append({'cust': row['cust'], 'date': row['date'], 'description': row['description']})
else:
merged_data[-1]['description'] += ' ' + row['description']
prev_row = row
# Create merged DataFrame
merged_df = pd.DataFrame(merged_data)
print(merged_df)
Output:
0 CUST123 2020-06-13 observed increased loss rate cut performed job...
1 CUST123 2020-07-17 loss pressure slow gain trip
2 CUST123 2020-08-25 established circulation load performed sticky ...
3 CUST123 2020-08-28 job meeting prior low energy
4 CUST123 2020-08-30 performed maintenance service reconnected cont...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论