基于日期时间列在参考日期时间范围内分配值给数据框中的记录。

huangapple go评论102阅读模式
英文:

Assigning values to records in a dataframe based on datetime column being between a reference datetime range

问题

你可以修改你的函数以实现你的期望输出。在函数内部,你可以首先将记录的日期与期望的日期范围进行比较,如果不在范围内,可以尝试前一天和后一天的日期,直到找到匹配的期间为止。以下是修改后的函数示例:

  1. import pandas as pd
  2. from datetime import datetime, timedelta
  3. # 将日期字符串转换为 datetime 对象
  4. period_df['Start time'] = pd.to_datetime(period_df['Start time'])
  5. period_df['End time'] = pd.to_datetime(period_df['End time'])
  6. records_df['Original time'] = pd.to_datetime(records_df['Original time'])
  7. def assign_period(record):
  8. for _, period in period_df.iterrows():
  9. if record['Group1'] == period['Group1'] and \
  10. record['Group2'] == period['Group2'] and \
  11. period['Start time'] <= record['Original time'] <= period['End time']:
  12. return period['Period']
  13. # 如果未找到匹配的期间,尝试前一天和后一天的日期
  14. one_day = timedelta(days=1)
  15. for day_adjustment in [-1, 1]:
  16. adjusted_date = record['Original time'] + day_adjustment * one_day
  17. for _, period in period_df.iterrows():
  18. if record['Group1'] == period['Group1'] and \
  19. record['Group2'] == period['Group2'] and \
  20. period['Start time'] <= adjusted_date <= period['End time']:
  21. return period['Period']
  22. return None
  23. # 应用函数来分配期间
  24. records_df['Period'] = records_df.apply(assign_period, axis=1)
  25. # 打印结果
  26. print(records_df)

这个修改后的函数会首先尝试在原日期范围内找到匹配的期间,如果找不到,它会尝试前一天和后一天的日期,直到找到匹配的期间或返回None。这样,你应该能够得到你期望的输出。

英文:

I have the following data frames:

period_df:

  1. Group1 Group2 Period Start time End time
  2. G1 G2 Period 1 1900-01-01 05:01:00 1900-01-01 06:00:00
  3. G1 G2 Period 2 1900-01-01 06:01:00 1900-01-01 07:00:00
  4. G1 G2 Period 3 1900-01-01 07:01:00 1900-01-01 08:00:00
  5. G1 G2 Period 4 1900-01-01 08:01:00 1900-01-01 09:00:00
  6. G1 G2 Period 5 1900-01-01 09:01:00 1900-01-01 10:00:00

records_df:

  1. Group1 Group2 Original time
  2. G1 G2 1900-01-01 05:05:00
  3. G1 G2 1900-01-01 07:23:00
  4. G1 G2 1900-01-00 07:45:00
  5. G1 G2 1900-01-02 09:57:00
  6. G1 G2 1900-01-02 08:23:00

I want to assign the corresponding Period from period_df to each record in records_df, based on the Group1 and Group2 columns and the time being between Start time and End time.

I wrote the following function to do that:

  1. def assign_period(record):
  2. for _, period in period_df.iterrows():
  3. if record[&#39;Group1&#39;] == period[&#39;Group1&#39;] and \
  4. record[&#39;Group2&#39;] == period[&#39;Group2&#39;] and \
  5. period[&#39;Start time&#39;] &lt;= record[&#39;Original time&#39;] &lt;= period[&#39;End time&#39;]:
  6. return period[&#39;Period&#39;]
  7. return None

And when I use the function to assign periods to the records I get the following output:

  1. records_df[&#39;Period&#39;] = records_df.apply(assign_period, axis=1)
  2. Group1 Group2 Original time Period
  3. G1 G2 1900-01-01 05:05:00 Period 1
  4. G1 G2 1900-01-01 07:23:00 Period 3
  5. G1 G2 1900-01-00 07:45:00 None
  6. G1 G2 1900-01-02 09:57:00 None
  7. G1 G2 1900-01-02 08:23:00 None

Some records don't get assigned a period because the date is either a day before or after the dates mentioned on reference period_df dataframe.

The expected output is for Periods to be assigned irrespective of the date:

  1. Group1 Group2 Original time Period
  2. G1 G2 1900-01-01 05:05:00 Period 1
  3. G1 G2 1900-01-01 07:23:00 Period 3
  4. G1 G2 1900-01-00 07:45:00 Period 3
  5. G1 G2 1900-01-02 09:57:00 Period 5
  6. G1 G2 1900-01-02 08:23:00 Period 4

How can I also incorporate a check for records that are not assigned a period in the above function to either go a day ahead or before and match up with the Period from period_df?

  1. import pandas as pd
  2. period_df = pd.DataFrame({
  3. &#39;Group1&#39;: [
  4. &#39;G1&#39;,
  5. &#39;G1&#39;,
  6. &#39;G1&#39;,
  7. &#39;G1&#39;,
  8. &#39;G1&#39;],
  9. &#39;Group2&#39;: [
  10. &#39;G2&#39;,
  11. &#39;G2&#39;,
  12. &#39;G2&#39;,
  13. &#39;G2&#39;,
  14. &#39;G2&#39;],
  15. &#39;Period&#39;: [
  16. &#39;Period 1&#39;,
  17. &#39;Period 2&#39;,
  18. &#39;Period 3&#39;,
  19. &#39;Period 4&#39;,
  20. &#39;Period 5&#39;],
  21. &#39;Start time&#39;: [
  22. &#39;1900-01-01 05:01:00&#39;,
  23. &#39;1900-01-01 06:01:00&#39;,
  24. &#39;1900-01-01 07:01:00&#39;,
  25. &#39;1900-01-01 08:01:00&#39;,
  26. &#39;1900-01-01 09:01:00&#39;],
  27. &#39;End time&#39;: [
  28. &#39;1900-01-01 06:00:00&#39;,
  29. &#39;1900-01-01 07:00:00&#39;,
  30. &#39;1900-01-01 08:00:00&#39;,
  31. &#39;1900-01-01 09:00:00&#39;,
  32. &#39;1900-01-01 10:00:00&#39;]})
  33. records_df = pd.DataFrame({
  34. &#39;Group1&#39;: [
  35. &#39;G1&#39;,
  36. &#39;G1&#39;,
  37. &#39;G1&#39;,
  38. &#39;G1&#39;,
  39. &#39;G1&#39;],
  40. &#39;Group2&#39;: [
  41. &#39;G2&#39;,
  42. &#39;G2&#39;,
  43. &#39;G2&#39;,
  44. &#39;G2&#39;,
  45. &#39;G2&#39;],
  46. &#39;Original time&#39;: [
  47. &#39;1900-01-01 05:05:00&#39;,
  48. &#39;1900-01-01 07:23:00&#39;,
  49. &#39;1900-01-00 07:45:00&#39;,
  50. &#39;1900-01-02 09:57:00&#39;,
  51. &#39;1900-01-02 08:23:00&#39;]})

答案1

得分: 1

示例

首先检查代码示例中的拼写错误

您的代码中有1900-01-00 07:45:001900-01-02 09:57:00

使用以下代码来修复拼写错误

  1. data = {'Group1': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1'},
  2. 'Group2': {0: 'G2', 1: 'G2', 2: 'G2', 3: 'G2', 4: 'G2'},
  3. 'Original time': {0: '1900-01-01 05:05:00',
  4. 1: '1900-01-01 07:23:00',
  5. 2: '1900-01-01 07:45:00',
  6. 3: '1900-01-01 09:57:00',
  7. 4: '1900-01-01 08:23:00'}}
  8. record_df = pd.DataFrame(data)

步骤1

将dtype更改为datetime

  1. records_df['Original time'] = pd.to_datetime(records_df['Original time'])
  2. period_df['Start time'] = pd.to_datetime(period_df['Start time'])
  3. period_df['End time'] = pd.to_datetime(period_df['End time'])

步骤2

创建bins

  1. bins = period_df['Start time'].tolist() + [period_df['End time'].tolist()[-1]]

bins

  1. [Timestamp('1900-01-01 05:01:00'),
  2. Timestamp('1900-01-01 06:01:00'),
  3. Timestamp('1900-01-01 07:01:00'),
  4. Timestamp('1900-01-01 08:01:00'),
  5. Timestamp('1900-01-01 09:01:00'),
  6. Timestamp('1900-01-01 10:00:00')]

步骤3

使用pd.cut创建Period列(使用bins

  1. records_df.assign(Period=pd.cut(records_df['Original time'],
  2. bins=bins, right=False,
  3. labels=period_df['Period']))

输出:

  1. Group1 Group2 Original time Period
  2. 0 G1 G2 1900-01-01 05:05:00 Period 1
  3. 1 G1 G2 1900-01-01 07:23:00 Period 3
  4. 2 G1 G2 1900-01-01 07:45:00 Period 3
  5. 3 G1 G2 1900-01-01 09:57:00 Period 5
  6. 4 G1 G2 1900-01-01 08:23:00 Period 4

如果不是拼写错误,请改用timedelta dtype而不是datetime

  1. s = pd.to_timedelta(records_df['Original time'].str.split(' ').str[1])
  2. s1 = pd.to_timedelta(period_df['Start time'].str.split(' ').str[1])
  3. s2 = pd.to_timedelta(period_df['End time'].str.split(' ').str[1])
  4. bins = s1.tolist() + [s2.tolist()[-1]]
  5. out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df['Period']))

out

  1. Group1 Group2 Original time Period
  2. 0 G1 G2 1900-01-01 05:05:00 Period 1
  3. 1 G1 G2 1900-01-01 07:23:00 Period 3
  4. 2 G1 G2 1900-01-00 07:45:00 Period 3
  5. 3 G1 G2 1900-01-02 09:57:00 Period 5
  6. 4 G1 G2 1900-01-02 08:23:00 Period 4
英文:

Example

at first chk your typo in example code of records_df

your code has '1900-01-00 07:45:00','1900-01-02 09:57:00'

use following code to fix typo

  1. data = {&#39;Group1&#39;: {0: &#39;G1&#39;, 1: &#39;G1&#39;, 2: &#39;G1&#39;, 3: &#39;G1&#39;, 4: &#39;G1&#39;},
  2. &#39;Group2&#39;: {0: &#39;G2&#39;, 1: &#39;G2&#39;, 2: &#39;G2&#39;, 3: &#39;G2&#39;, 4: &#39;G2&#39;},
  3. &#39;Original time&#39;: {0: &#39;1900-01-01 05:05:00&#39;,
  4. 1: &#39;1900-01-01 07:23:00&#39;,
  5. 2: &#39;1900-01-01 07:45:00&#39;,
  6. 3: &#39;1900-01-01 09:57:00&#39;,
  7. 4: &#39;1900-01-01 08:23:00&#39;}}
  8. record_df = pd.DataFrame(data)

Step1

make dtype to datetime

  1. records_df[&#39;Original time&#39;] = pd.to_datetime(records_df[&#39;Original time&#39;])
  2. period_df[&#39;Start time&#39;] = pd.to_datetime(period_df[&#39;Start time&#39;])
  3. period_df[&#39;End time&#39;] = pd.to_datetime(period_df[&#39;End time&#39;])

Step2

make bins

  1. bins = period_df[&#39;Start time&#39;].tolist() + [period_df[&#39;End time&#39;].tolist()[-1]]

bins

  1. [Timestamp(&#39;1900-01-01 05:01:00&#39;),
  2. Timestamp(&#39;1900-01-01 06:01:00&#39;),
  3. Timestamp(&#39;1900-01-01 07:01:00&#39;),
  4. Timestamp(&#39;1900-01-01 08:01:00&#39;),
  5. Timestamp(&#39;1900-01-01 09:01:00&#39;),
  6. Timestamp(&#39;1900-01-01 10:00:00&#39;)]

Step3

make Period column by pd.cut (using bins)

  1. records_df.assign(Period=pd.cut(records_df[&#39;Original time&#39;],
  2. bins=bins, right=False,
  3. labels=period_df[&#39;Period&#39;]))

output:

  1. Group1 Group2 Original time Period
  2. 0 G1 G2 1900-01-01 05:05:00 Period 1
  3. 1 G1 G2 1900-01-01 07:23:00 Period 3
  4. 2 G1 G2 1900-01-01 07:45:00 Period 3
  5. 3 G1 G2 1900-01-01 09:57:00 Period 5
  6. 4 G1 G2 1900-01-01 08:23:00 Period 4

if its not typo use timedelta dtype instead datetime

  1. s = pd.to_timedelta(records_df[&#39;Original time&#39;].str.split(&#39; &#39;).str[1])
  2. s1 = pd.to_timedelta(period_df[&#39;Start time&#39;].str.split(&#39; &#39;).str[1])
  3. s2 = pd.to_timedelta(period_df[&#39;End time&#39;].str.split(&#39; &#39;).str[1])
  4. bins = s1.tolist() + [s2.tolist()[-1]]
  5. out = records_df.assign(Period=pd.cut(s, bins=bins, right=False, labels=period_df[&#39;Period&#39;]))

out

  1. Group1 Group2 Original time Period
  2. 0 G1 G2 1900-01-01 05:05:00 Period 1
  3. 1 G1 G2 1900-01-01 07:23:00 Period 3
  4. 2 G1 G2 1900-01-00 07:45:00 Period 3
  5. 3 G1 G2 1900-01-02 09:57:00 Period 5
  6. 4 G1 G2 1900-01-02 08:23:00 Period 4

huangapple
  • 本文由 发表于 2023年5月17日 10:07:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76268132.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定