基于日期时间列在参考日期时间范围内分配值给数据框中的记录。

huangapple go评论64阅读模式
英文:

Assigning values to records in a dataframe based on datetime column being between a reference datetime range

问题

你可以修改你的函数以实现你的期望输出。在函数内部,你可以首先将记录的日期与期望的日期范围进行比较,如果不在范围内,可以尝试前一天和后一天的日期,直到找到匹配的期间为止。以下是修改后的函数示例:

import pandas as pd
from datetime import datetime, timedelta

# 将日期字符串转换为 datetime 对象
period_df['Start time'] = pd.to_datetime(period_df['Start time'])
period_df['End time'] = pd.to_datetime(period_df['End time'])
records_df['Original time'] = pd.to_datetime(records_df['Original time'])

def assign_period(record):
    for _, period in period_df.iterrows():
        if record['Group1'] == period['Group1'] and \
        record['Group2'] == period['Group2'] and \
        period['Start time'] <= record['Original time'] <= period['End time']:
            return period['Period']
    
    # 如果未找到匹配的期间,尝试前一天和后一天的日期
    one_day = timedelta(days=1)
    for day_adjustment in [-1, 1]:
        adjusted_date = record['Original time'] + day_adjustment * one_day
        for _, period in period_df.iterrows():
            if record['Group1'] == period['Group1'] and \
            record['Group2'] == period['Group2'] and \
            period['Start time'] <= adjusted_date <= period['End time']:
                return period['Period']
    
    return None

# 应用函数来分配期间
records_df['Period'] = records_df.apply(assign_period, axis=1)

# 打印结果
print(records_df)

这个修改后的函数会首先尝试在原日期范围内找到匹配的期间,如果找不到,它会尝试前一天和后一天的日期,直到找到匹配的期间或返回None。这样,你应该能够得到你期望的输出。

英文:

I have the following data frames:

period_df:

Group1 Group2    Period           Start time             End time
G1     G2  Period 1  1900-01-01 05:01:00  1900-01-01 06:00:00
G1     G2  Period 2  1900-01-01 06:01:00  1900-01-01 07:00:00
G1     G2  Period 3  1900-01-01 07:01:00  1900-01-01 08:00:00
G1     G2  Period 4  1900-01-01 08:01:00  1900-01-01 09:00:00
G1     G2  Period 5  1900-01-01 09:01:00  1900-01-01 10:00:00

records_df:

Group1 Group2        Original time
G1     G2  1900-01-01 05:05:00
G1     G2  1900-01-01 07:23:00
G1     G2  1900-01-00 07:45:00
G1     G2  1900-01-02 09:57:00
G1     G2  1900-01-02 08:23:00

I want to assign the corresponding Period from period_df to each record in records_df, based on the Group1 and Group2 columns and the time being between Start time and End time.

I wrote the following function to do that:

def assign_period(record):
    for _, period in period_df.iterrows():
        if record[&#39;Group1&#39;] == period[&#39;Group1&#39;] and \
        record[&#39;Group2&#39;] == period[&#39;Group2&#39;] and \
        period[&#39;Start time&#39;] &lt;= record[&#39;Original time&#39;] &lt;= period[&#39;End time&#39;]:
            return period[&#39;Period&#39;]
    return None

And when I use the function to assign periods to the records I get the following output:

records_df[&#39;Period&#39;] = records_df.apply(assign_period, axis=1)

Group1 Group2        Original time    Period
G1     G2  1900-01-01 05:05:00  Period 1
G1     G2  1900-01-01 07:23:00  Period 3
G1     G2  1900-01-00 07:45:00      None
G1     G2  1900-01-02 09:57:00      None
G1     G2  1900-01-02 08:23:00      None

Some records don't get assigned a period because the date is either a day before or after the dates mentioned on reference period_df dataframe.

The expected output is for Periods to be assigned irrespective of the date:

Group1 Group2        Original time    Period
G1     G2  1900-01-01 05:05:00  Period 1
G1     G2  1900-01-01 07:23:00  Period 3
G1     G2  1900-01-00 07:45:00  Period 3
G1     G2  1900-01-02 09:57:00  Period 5
G1     G2  1900-01-02 08:23:00  Period 4

How can I also incorporate a check for records that are not assigned a period in the above function to either go a day ahead or before and match up with the Period from period_df?

import pandas as pd
period_df = pd.DataFrame({
                &#39;Group1&#39;: [
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;],
                &#39;Group2&#39;: [
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;],
                &#39;Period&#39;: [
                    &#39;Period 1&#39;,
                    &#39;Period 2&#39;,
                    &#39;Period 3&#39;,
                    &#39;Period 4&#39;,
                    &#39;Period 5&#39;],
                &#39;Start time&#39;: [
                    &#39;1900-01-01 05:01:00&#39;,
                    &#39;1900-01-01 06:01:00&#39;,
                    &#39;1900-01-01 07:01:00&#39;,
                    &#39;1900-01-01 08:01:00&#39;,
                    &#39;1900-01-01 09:01:00&#39;],
                &#39;End time&#39;: [
                    &#39;1900-01-01 06:00:00&#39;,
                    &#39;1900-01-01 07:00:00&#39;,
                    &#39;1900-01-01 08:00:00&#39;,
                    &#39;1900-01-01 09:00:00&#39;,
                    &#39;1900-01-01 10:00:00&#39;]})

records_df = pd.DataFrame({
                &#39;Group1&#39;: [
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;,
                    &#39;G1&#39;],
                &#39;Group2&#39;: [
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;,
                    &#39;G2&#39;],
                &#39;Original time&#39;: [
                    &#39;1900-01-01 05:05:00&#39;,
                    &#39;1900-01-01 07:23:00&#39;,
                    &#39;1900-01-00 07:45:00&#39;,
                    &#39;1900-01-02 09:57:00&#39;,
                    &#39;1900-01-02 08:23:00&#39;]})

答案1

得分: 1

示例

首先检查代码示例中的拼写错误

您的代码中有1900-01-00 07:45:001900-01-02 09:57:00

使用以下代码来修复拼写错误

data = {'Group1': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1'},
 'Group2': {0: 'G2', 1: 'G2', 2: 'G2', 3: 'G2', 4: 'G2'},
 'Original time': {0: '1900-01-01 05:05:00',
  1: '1900-01-01 07:23:00',
  2: '1900-01-01 07:45:00',
  3: '1900-01-01 09:57:00',
  4: '1900-01-01 08:23:00'}}
record_df = pd.DataFrame(data)

步骤1

将dtype更改为datetime

records_df['Original time'] = pd.to_datetime(records_df['Original time']) 
period_df['Start time'] = pd.to_datetime(period_df['Start time']) 
period_df['End time'] = pd.to_datetime(period_df['End time'])

步骤2

创建bins

bins = period_df['Start time'].tolist() + [period_df['End time'].tolist()[-1]]

bins

[Timestamp('1900-01-01 05:01:00'),
 Timestamp('1900-01-01 06:01:00'),
 Timestamp('1900-01-01 07:01:00'),
 Timestamp('1900-01-01 08:01:00'),
 Timestamp('1900-01-01 09:01:00'),
 Timestamp('1900-01-01 10:00:00')]

步骤3

使用pd.cut创建Period列(使用bins

records_df.assign(Period=pd.cut(records_df['Original time'], 
                                bins=bins, right=False, 
                                labels=period_df['Period']))

输出:

Group1	Group2	Original time	Period
0	G1	G2	1900-01-01 05:05:00	Period 1
1	G1	G2	1900-01-01 07:23:00	Period 3
2	G1	G2	1900-01-01 07:45:00	Period 3
3	G1	G2	1900-01-01 09:57:00	Period 5
4	G1	G2	1900-01-01 08:23:00	Period 4

如果不是拼写错误,请改用timedelta dtype而不是datetime

s = pd.to_timedelta(records_df['Original time'].str.split(' ').str[1]) 
s1 = pd.to_timedelta(period_df['Start time'].str.split(' ').str[1]) 
s2 = pd.to_timedelta(period_df['End time'].str.split(' ').str[1])

bins = s1.tolist() + [s2.tolist()[-1]]

out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df['Period']))

out

Group1	Group2	Original time	Period
0	G1	G2	1900-01-01 05:05:00	Period 1
1	G1	G2	1900-01-01 07:23:00	Period 3
2	G1	G2	1900-01-00 07:45:00	Period 3
3	G1	G2	1900-01-02 09:57:00	Period 5
4	G1	G2	1900-01-02 08:23:00	Period 4
英文:

Example

at first chk your typo in example code of records_df

your code has '1900-01-00 07:45:00','1900-01-02 09:57:00'

use following code to fix typo

data = {&#39;Group1&#39;: {0: &#39;G1&#39;, 1: &#39;G1&#39;, 2: &#39;G1&#39;, 3: &#39;G1&#39;, 4: &#39;G1&#39;},
 &#39;Group2&#39;: {0: &#39;G2&#39;, 1: &#39;G2&#39;, 2: &#39;G2&#39;, 3: &#39;G2&#39;, 4: &#39;G2&#39;},
 &#39;Original time&#39;: {0: &#39;1900-01-01 05:05:00&#39;,
  1: &#39;1900-01-01 07:23:00&#39;,
  2: &#39;1900-01-01 07:45:00&#39;,
  3: &#39;1900-01-01 09:57:00&#39;,
  4: &#39;1900-01-01 08:23:00&#39;}}
record_df = pd.DataFrame(data)

Step1

make dtype to datetime

records_df[&#39;Original time&#39;] = pd.to_datetime(records_df[&#39;Original time&#39;]) 
period_df[&#39;Start time&#39;] = pd.to_datetime(period_df[&#39;Start time&#39;]) 
period_df[&#39;End time&#39;] = pd.to_datetime(period_df[&#39;End time&#39;])

Step2

make bins

bins = period_df[&#39;Start time&#39;].tolist() + [period_df[&#39;End time&#39;].tolist()[-1]]

bins

[Timestamp(&#39;1900-01-01 05:01:00&#39;),
 Timestamp(&#39;1900-01-01 06:01:00&#39;),
 Timestamp(&#39;1900-01-01 07:01:00&#39;),
 Timestamp(&#39;1900-01-01 08:01:00&#39;),
 Timestamp(&#39;1900-01-01 09:01:00&#39;),
 Timestamp(&#39;1900-01-01 10:00:00&#39;)]

Step3

make Period column by pd.cut (using bins)

records_df.assign(Period=pd.cut(records_df[&#39;Original time&#39;], 
                                bins=bins, right=False, 
                                labels=period_df[&#39;Period&#39;]))

output:

Group1	Group2	Original time	Period
0	G1	G2	1900-01-01 05:05:00	Period 1
1	G1	G2	1900-01-01 07:23:00	Period 3
2	G1	G2	1900-01-01 07:45:00	Period 3
3	G1	G2	1900-01-01 09:57:00	Period 5
4	G1	G2	1900-01-01 08:23:00	Period 4

if its not typo use timedelta dtype instead datetime

s = pd.to_timedelta(records_df[&#39;Original time&#39;].str.split(&#39; &#39;).str[1]) 
s1 = pd.to_timedelta(period_df[&#39;Start time&#39;].str.split(&#39; &#39;).str[1]) 
s2 = pd.to_timedelta(period_df[&#39;End time&#39;].str.split(&#39; &#39;).str[1])

bins = s1.tolist() + [s2.tolist()[-1]]

out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df[&#39;Period&#39;]))

out

Group1	Group2	Original time	Period
0	G1	G2	1900-01-01 05:05:00	Period 1
1	G1	G2	1900-01-01 07:23:00	Period 3
2	G1	G2	1900-01-00 07:45:00	Period 3
3	G1	G2	1900-01-02 09:57:00	Period 5
4	G1	G2	1900-01-02 08:23:00	Period 4

huangapple
  • 本文由 发表于 2023年5月17日 10:07:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76268132.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定