根据任务角色从数据框列中找到最新的任务,并用最新的列值替换列值。

huangapple go评论64阅读模式
英文:

find the latest task from the dataframe columns based on the task role and replace the column value with the latest column value

问题

我有一个包含以下列的数据框:

df1:
Task Start Date | Task Finish Date | Task Role
01-01-2021 01-03-2021 Lead
01-04-2021 02-02-2021 Team member
01-04-2021 02-23-2021 Unknown


我想创建另一列'Origin Role',基于以下条件:

如果Task Role是Lead/任何其他值,那么跳过(在新列'Origin Role'中填入相同的Task Role);

否则,只有当Task Role是Unknown时:

- 考虑具有Task Role 'Unknown'的记录的Task Start Date,并查找最新的记录(如果有多个记录),其Task Finish Date小于或等于Task Start Date(未知角色)。

- 然后,在新列'Origin Role'中用最新角色的Task Role替代未知角色。

我的期望是:

df1:
Task Start Date | Task Finish Date | Task Role | Origin Role
01-01-2021 01-03-2021 Lead Lead
01-04-2021 02-02-2021 Team member Team Member
01-04-2021 02-23-2021 Unknown Lead

英文:

I have a dataframe with the following columns

df1:
Task Start Date  |  Task Finish Date  |  Task Role
01-01-2021           01-03-2021            Lead
01-04-2021           02-02-2021            Team member
01-04-2021           02-23-2021            Unknown

I want to create another column 'Origin Role' based on the following conditions:

if the Task Role is lead/anything then skip (fill the same Task Role into the new column ('Origin Role');

else only if Task Role is Unknown then:

  • consider the Task Start Date of the record with Task Role 'Unknown' and find the latest record(if there are multiple) with its Task Finish Date <=(less than or equal to) Task Start Date (unknown role).

  • Then fill in the Task role of the latest role in place of the unknown role in the new column 'Origin Role'.

My expectation:

df1:
Task Start Date  |  Task Finish Date  |  Task Role       | Origin Role
01-01-2021           01-03-2021            Lead              Lead
01-04-2021           02-02-2021            Team member       Team Member
01-04-2021           02-23-2021            Unknown           Lead

答案1

得分: 2

代码部分已经被排除,以下是翻译好的内容:

Steps:

  • 通过replace将"Unknown"替换为NaN
  • 按日期排序数值
  • 在最近的前一个日期上合并,忽略"Unknown"
  • 使用这个值进行fillna

Output:

  Task Start Date Task Finish Date    Task Role  Origin Role
0      2021-01-01       2021-01-03         Lead         Lead
1      2021-01-04       2021-02-02  Team member  Team member
2      2021-01-04       2021-02-23      Unknown         Lead
英文:

One option is to use a merge_asof:

# ensure datetime
df[['Task Start Date', 'Task Finish Date']] = \
df[['Task Start Date', 'Task Finish Date']].apply(pd.to_datetime, dayfirst=False)


df['Origin Role'] = df['Task Role'].replace({'Unknown': np.nan}).fillna(
    pd.merge_asof(df['Task Start Date'].sort_values().reset_index(),
                  df[['Task Finish Date', 'Task Role']]
                    .loc[lambda d: d['Task Role'].ne('Unknown')]
                    .sort_values(by='Task Finish Date'),
                  left_on='Task Start Date', right_on='Task Finish Date',
                  ).set_index('index')['Task Role']
)

Steps:

  • replace "Unknown" by NaN
  • sort values by date
  • merge on the closest previous date, ignoring "Unknown"
  • fillna with this value

Output:

  Task Start Date Task Finish Date    Task Role  Origin Role
0      2021-01-01       2021-01-03         Lead         Lead
1      2021-01-04       2021-02-02  Team member  Team member
2      2021-01-04       2021-02-23      Unknown         Lead

huangapple
  • 本文由 发表于 2023年4月4日 03:16:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923032.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定