英文:
Assigning values to records in a dataframe based on datetime column being between a reference datetime range
问题
你可以修改你的函数以实现你的期望输出。在函数内部,你可以首先将记录的日期与期望的日期范围进行比较,如果不在范围内,可以尝试前一天和后一天的日期,直到找到匹配的期间为止。以下是修改后的函数示例:
import pandas as pd
from datetime import datetime, timedelta
# 将日期字符串转换为 datetime 对象
period_df['Start time'] = pd.to_datetime(period_df['Start time'])
period_df['End time'] = pd.to_datetime(period_df['End time'])
records_df['Original time'] = pd.to_datetime(records_df['Original time'])
def assign_period(record):
for _, period in period_df.iterrows():
if record['Group1'] == period['Group1'] and \
record['Group2'] == period['Group2'] and \
period['Start time'] <= record['Original time'] <= period['End time']:
return period['Period']
# 如果未找到匹配的期间,尝试前一天和后一天的日期
one_day = timedelta(days=1)
for day_adjustment in [-1, 1]:
adjusted_date = record['Original time'] + day_adjustment * one_day
for _, period in period_df.iterrows():
if record['Group1'] == period['Group1'] and \
record['Group2'] == period['Group2'] and \
period['Start time'] <= adjusted_date <= period['End time']:
return period['Period']
return None
# 应用函数来分配期间
records_df['Period'] = records_df.apply(assign_period, axis=1)
# 打印结果
print(records_df)
这个修改后的函数会首先尝试在原日期范围内找到匹配的期间,如果找不到,它会尝试前一天和后一天的日期,直到找到匹配的期间或返回None。这样,你应该能够得到你期望的输出。
英文:
I have the following data frames:
period_df:
Group1 Group2 Period Start time End time
G1 G2 Period 1 1900-01-01 05:01:00 1900-01-01 06:00:00
G1 G2 Period 2 1900-01-01 06:01:00 1900-01-01 07:00:00
G1 G2 Period 3 1900-01-01 07:01:00 1900-01-01 08:00:00
G1 G2 Period 4 1900-01-01 08:01:00 1900-01-01 09:00:00
G1 G2 Period 5 1900-01-01 09:01:00 1900-01-01 10:00:00
records_df:
Group1 Group2 Original time
G1 G2 1900-01-01 05:05:00
G1 G2 1900-01-01 07:23:00
G1 G2 1900-01-00 07:45:00
G1 G2 1900-01-02 09:57:00
G1 G2 1900-01-02 08:23:00
I want to assign the corresponding Period
from period_df
to each record in records_df
, based on the Group1
and Group2
columns and the time being between Start time
and End time
.
I wrote the following function to do that:
def assign_period(record):
for _, period in period_df.iterrows():
if record['Group1'] == period['Group1'] and \
record['Group2'] == period['Group2'] and \
period['Start time'] <= record['Original time'] <= period['End time']:
return period['Period']
return None
And when I use the function to assign periods to the records I get the following output:
records_df['Period'] = records_df.apply(assign_period, axis=1)
Group1 Group2 Original time Period
G1 G2 1900-01-01 05:05:00 Period 1
G1 G2 1900-01-01 07:23:00 Period 3
G1 G2 1900-01-00 07:45:00 None
G1 G2 1900-01-02 09:57:00 None
G1 G2 1900-01-02 08:23:00 None
Some records don't get assigned a period because the date is either a day before or after the dates mentioned on reference period_df
dataframe.
The expected output is for Periods to be assigned irrespective of the date:
Group1 Group2 Original time Period
G1 G2 1900-01-01 05:05:00 Period 1
G1 G2 1900-01-01 07:23:00 Period 3
G1 G2 1900-01-00 07:45:00 Period 3
G1 G2 1900-01-02 09:57:00 Period 5
G1 G2 1900-01-02 08:23:00 Period 4
How can I also incorporate a check for records that are not assigned a period in the above function to either go a day ahead or before and match up with the Period
from period_df
?
import pandas as pd
period_df = pd.DataFrame({
'Group1': [
'G1',
'G1',
'G1',
'G1',
'G1'],
'Group2': [
'G2',
'G2',
'G2',
'G2',
'G2'],
'Period': [
'Period 1',
'Period 2',
'Period 3',
'Period 4',
'Period 5'],
'Start time': [
'1900-01-01 05:01:00',
'1900-01-01 06:01:00',
'1900-01-01 07:01:00',
'1900-01-01 08:01:00',
'1900-01-01 09:01:00'],
'End time': [
'1900-01-01 06:00:00',
'1900-01-01 07:00:00',
'1900-01-01 08:00:00',
'1900-01-01 09:00:00',
'1900-01-01 10:00:00']})
records_df = pd.DataFrame({
'Group1': [
'G1',
'G1',
'G1',
'G1',
'G1'],
'Group2': [
'G2',
'G2',
'G2',
'G2',
'G2'],
'Original time': [
'1900-01-01 05:05:00',
'1900-01-01 07:23:00',
'1900-01-00 07:45:00',
'1900-01-02 09:57:00',
'1900-01-02 08:23:00']})
答案1
得分: 1
示例
首先检查代码示例中的拼写错误
您的代码中有1900-01-
00 07:45:00
和1900-01-
02 09:57:00
。
使用以下代码来修复拼写错误
data = {'Group1': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1'},
'Group2': {0: 'G2', 1: 'G2', 2: 'G2', 3: 'G2', 4: 'G2'},
'Original time': {0: '1900-01-01 05:05:00',
1: '1900-01-01 07:23:00',
2: '1900-01-01 07:45:00',
3: '1900-01-01 09:57:00',
4: '1900-01-01 08:23:00'}}
record_df = pd.DataFrame(data)
步骤1
将dtype更改为datetime
records_df['Original time'] = pd.to_datetime(records_df['Original time'])
period_df['Start time'] = pd.to_datetime(period_df['Start time'])
period_df['End time'] = pd.to_datetime(period_df['End time'])
步骤2
创建bins
bins = period_df['Start time'].tolist() + [period_df['End time'].tolist()[-1]]
bins
[Timestamp('1900-01-01 05:01:00'),
Timestamp('1900-01-01 06:01:00'),
Timestamp('1900-01-01 07:01:00'),
Timestamp('1900-01-01 08:01:00'),
Timestamp('1900-01-01 09:01:00'),
Timestamp('1900-01-01 10:00:00')]
步骤3
使用pd.cut
创建Period列(使用bins
)
records_df.assign(Period=pd.cut(records_df['Original time'],
bins=bins, right=False,
labels=period_df['Period']))
输出:
Group1 Group2 Original time Period
0 G1 G2 1900-01-01 05:05:00 Period 1
1 G1 G2 1900-01-01 07:23:00 Period 3
2 G1 G2 1900-01-01 07:45:00 Period 3
3 G1 G2 1900-01-01 09:57:00 Period 5
4 G1 G2 1900-01-01 08:23:00 Period 4
如果不是拼写错误,请改用timedelta dtype而不是datetime
s = pd.to_timedelta(records_df['Original time'].str.split(' ').str[1])
s1 = pd.to_timedelta(period_df['Start time'].str.split(' ').str[1])
s2 = pd.to_timedelta(period_df['End time'].str.split(' ').str[1])
bins = s1.tolist() + [s2.tolist()[-1]]
out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df['Period']))
out
Group1 Group2 Original time Period
0 G1 G2 1900-01-01 05:05:00 Period 1
1 G1 G2 1900-01-01 07:23:00 Period 3
2 G1 G2 1900-01-00 07:45:00 Period 3
3 G1 G2 1900-01-02 09:57:00 Period 5
4 G1 G2 1900-01-02 08:23:00 Period 4
英文:
Example
at first chk your typo in example code of records_df
your code has '1900-01-00
07:45:00','1900-01-02
09:57:00'
use following code to fix typo
data = {'Group1': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1'},
'Group2': {0: 'G2', 1: 'G2', 2: 'G2', 3: 'G2', 4: 'G2'},
'Original time': {0: '1900-01-01 05:05:00',
1: '1900-01-01 07:23:00',
2: '1900-01-01 07:45:00',
3: '1900-01-01 09:57:00',
4: '1900-01-01 08:23:00'}}
record_df = pd.DataFrame(data)
Step1
make dtype to datetime
records_df['Original time'] = pd.to_datetime(records_df['Original time'])
period_df['Start time'] = pd.to_datetime(period_df['Start time'])
period_df['End time'] = pd.to_datetime(period_df['End time'])
Step2
make bins
bins = period_df['Start time'].tolist() + [period_df['End time'].tolist()[-1]]
bins
[Timestamp('1900-01-01 05:01:00'),
Timestamp('1900-01-01 06:01:00'),
Timestamp('1900-01-01 07:01:00'),
Timestamp('1900-01-01 08:01:00'),
Timestamp('1900-01-01 09:01:00'),
Timestamp('1900-01-01 10:00:00')]
Step3
make Period column by pd.cut
(using bins
)
records_df.assign(Period=pd.cut(records_df['Original time'],
bins=bins, right=False,
labels=period_df['Period']))
output:
Group1 Group2 Original time Period
0 G1 G2 1900-01-01 05:05:00 Period 1
1 G1 G2 1900-01-01 07:23:00 Period 3
2 G1 G2 1900-01-01 07:45:00 Period 3
3 G1 G2 1900-01-01 09:57:00 Period 5
4 G1 G2 1900-01-01 08:23:00 Period 4
if its not typo use timedelta dtype instead datetime
s = pd.to_timedelta(records_df['Original time'].str.split(' ').str[1])
s1 = pd.to_timedelta(period_df['Start time'].str.split(' ').str[1])
s2 = pd.to_timedelta(period_df['End time'].str.split(' ').str[1])
bins = s1.tolist() + [s2.tolist()[-1]]
out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df['Period']))
out
Group1 Group2 Original time Period
0 G1 G2 1900-01-01 05:05:00 Period 1
1 G1 G2 1900-01-01 07:23:00 Period 3
2 G1 G2 1900-01-00 07:45:00 Period 3
3 G1 G2 1900-01-02 09:57:00 Period 5
4 G1 G2 1900-01-02 08:23:00 Period 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论