根据任务角色从数据框列中找到最新的任务,并用最新的列值替换列值。

huangapple go评论104阅读模式
英文:

find the latest task from the dataframe columns based on the task role and replace the column value with the latest column value

问题

  1. 我有一个包含以下列的数据框:

df1:
Task Start Date | Task Finish Date | Task Role
01-01-2021 01-03-2021 Lead
01-04-2021 02-02-2021 Team member
01-04-2021 02-23-2021 Unknown

  1. 我想创建另一列'Origin Role',基于以下条件:
  2. 如果Task RoleLead/任何其他值,那么跳过(在新列'Origin Role'中填入相同的Task Role);
  3. 否则,只有当Task RoleUnknown时:
  4. - 考虑具有Task Role 'Unknown'的记录的Task Start Date,并查找最新的记录(如果有多个记录),其Task Finish Date小于或等于Task Start Date(未知角色)。
  5. - 然后,在新列'Origin Role'中用最新角色的Task Role替代未知角色。
  6. 我的期望是:

df1:
Task Start Date | Task Finish Date | Task Role | Origin Role
01-01-2021 01-03-2021 Lead Lead
01-04-2021 02-02-2021 Team member Team Member
01-04-2021 02-23-2021 Unknown Lead

英文:

I have a dataframe with the following columns

  1. df1:
  2. Task Start Date | Task Finish Date | Task Role
  3. 01-01-2021 01-03-2021 Lead
  4. 01-04-2021 02-02-2021 Team member
  5. 01-04-2021 02-23-2021 Unknown

I want to create another column 'Origin Role' based on the following conditions:

if the Task Role is lead/anything then skip (fill the same Task Role into the new column ('Origin Role');

else only if Task Role is Unknown then:

  • consider the Task Start Date of the record with Task Role 'Unknown' and find the latest record(if there are multiple) with its Task Finish Date <=(less than or equal to) Task Start Date (unknown role).

  • Then fill in the Task role of the latest role in place of the unknown role in the new column 'Origin Role'.

My expectation:

  1. df1:
  2. Task Start Date | Task Finish Date | Task Role | Origin Role
  3. 01-01-2021 01-03-2021 Lead Lead
  4. 01-04-2021 02-02-2021 Team member Team Member
  5. 01-04-2021 02-23-2021 Unknown Lead

答案1

得分: 2

代码部分已经被排除,以下是翻译好的内容:

Steps:

  • 通过replace将"Unknown"替换为NaN
  • 按日期排序数值
  • 在最近的前一个日期上合并,忽略"Unknown"
  • 使用这个值进行fillna

Output:

  1. Task Start Date Task Finish Date Task Role Origin Role
  2. 0 2021-01-01 2021-01-03 Lead Lead
  3. 1 2021-01-04 2021-02-02 Team member Team member
  4. 2 2021-01-04 2021-02-23 Unknown Lead
英文:

One option is to use a merge_asof:

  1. # ensure datetime
  2. df[['Task Start Date', 'Task Finish Date']] = \
  3. df[['Task Start Date', 'Task Finish Date']].apply(pd.to_datetime, dayfirst=False)
  4. df['Origin Role'] = df['Task Role'].replace({'Unknown': np.nan}).fillna(
  5. pd.merge_asof(df['Task Start Date'].sort_values().reset_index(),
  6. df[['Task Finish Date', 'Task Role']]
  7. .loc[lambda d: d['Task Role'].ne('Unknown')]
  8. .sort_values(by='Task Finish Date'),
  9. left_on='Task Start Date', right_on='Task Finish Date',
  10. ).set_index('index')['Task Role']
  11. )

Steps:

  • replace "Unknown" by NaN
  • sort values by date
  • merge on the closest previous date, ignoring "Unknown"
  • fillna with this value

Output:

  1. Task Start Date Task Finish Date Task Role Origin Role
  2. 0 2021-01-01 2021-01-03 Lead Lead
  3. 1 2021-01-04 2021-02-02 Team member Team member
  4. 2 2021-01-04 2021-02-23 Unknown Lead

huangapple
  • 本文由 发表于 2023年4月4日 03:16:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923032.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定