英文:
find the latest task from the dataframe columns based on the task role and replace the column value with the latest column value
问题
我有一个包含以下列的数据框:
df1:
Task Start Date | Task Finish Date | Task Role
01-01-2021 01-03-2021 Lead
01-04-2021 02-02-2021 Team member
01-04-2021 02-23-2021 Unknown
我想创建另一列'Origin Role',基于以下条件:
如果Task Role是Lead/任何其他值,那么跳过(在新列'Origin Role'中填入相同的Task Role);
否则,只有当Task Role是Unknown时:
- 考虑具有Task Role 'Unknown'的记录的Task Start Date,并查找最新的记录(如果有多个记录),其Task Finish Date小于或等于Task Start Date(未知角色)。
- 然后,在新列'Origin Role'中用最新角色的Task Role替代未知角色。
我的期望是:
df1:
Task Start Date | Task Finish Date | Task Role | Origin Role
01-01-2021 01-03-2021 Lead Lead
01-04-2021 02-02-2021 Team member Team Member
01-04-2021 02-23-2021 Unknown Lead
英文:
I have a dataframe with the following columns
df1:
Task Start Date | Task Finish Date | Task Role
01-01-2021 01-03-2021 Lead
01-04-2021 02-02-2021 Team member
01-04-2021 02-23-2021 Unknown
I want to create another column 'Origin Role' based on the following conditions:
if the Task Role is lead/anything then skip (fill the same Task Role into the new column ('Origin Role');
else only if Task Role is Unknown then:
-
consider the Task Start Date of the record with Task Role 'Unknown' and find the latest record(if there are multiple) with its Task Finish Date <=(less than or equal to) Task Start Date (unknown role).
-
Then fill in the Task role of the latest role in place of the unknown role in the new column 'Origin Role'.
My expectation:
df1:
Task Start Date | Task Finish Date | Task Role | Origin Role
01-01-2021 01-03-2021 Lead Lead
01-04-2021 02-02-2021 Team member Team Member
01-04-2021 02-23-2021 Unknown Lead
答案1
得分: 2
代码部分已经被排除,以下是翻译好的内容:
Steps:
Output:
Task Start Date Task Finish Date Task Role Origin Role
0 2021-01-01 2021-01-03 Lead Lead
1 2021-01-04 2021-02-02 Team member Team member
2 2021-01-04 2021-02-23 Unknown Lead
英文:
One option is to use a merge_asof
:
# ensure datetime
df[['Task Start Date', 'Task Finish Date']] = \
df[['Task Start Date', 'Task Finish Date']].apply(pd.to_datetime, dayfirst=False)
df['Origin Role'] = df['Task Role'].replace({'Unknown': np.nan}).fillna(
pd.merge_asof(df['Task Start Date'].sort_values().reset_index(),
df[['Task Finish Date', 'Task Role']]
.loc[lambda d: d['Task Role'].ne('Unknown')]
.sort_values(by='Task Finish Date'),
left_on='Task Start Date', right_on='Task Finish Date',
).set_index('index')['Task Role']
)
Steps:
replace
"Unknown" by NaN- sort values by date
- merge on the closest previous date, ignoring "Unknown"
fillna
with this value
Output:
Task Start Date Task Finish Date Task Role Origin Role
0 2021-01-01 2021-01-03 Lead Lead
1 2021-01-04 2021-02-02 Team member Team member
2 2021-01-04 2021-02-23 Unknown Lead
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论