如何在Python中通过ID查找过去的数值

huangapple go评论85阅读模式
英文:

How to find past values by ID in Python

问题

我有一个数据集,查看所有员工的历史记录。我试图实现的目标是,只有在上一个经理离职而没有被替代的情况下,查看员工的当前经理和以前的经理。要识别经理是否离职,您可以查看ManagerPositionNum列,这列是经理的唯一标识,如果其他经理具有相同的编号,那么他们正在填补空缺的职位。我正在使用pandas和numpy运行这个文件。

以下是我翻译的部分:

  1. 我有一个数据集,查看所有员工的历史记录。我试图实现的目标是,只有在上一个经理离职而没有被替代的情况下,查看员工的当前经理和以前的经理。要识别经理是否离职,您可以查看**ManagerPositionNum**列,这列是经理的唯一标识,如果其他经理具有相同的编号,那么他们正在填补空缺的职位。我正在使用pandasnumpy运行这个文件。
  2. 以下是我翻译的部分:

这是一个我翻译的示例:

  1. EmpID Date Job_Title ManagerName ManagerPositionNum
  2. 101 May 2021 Sales Rep John Doe 1111
  3. 101 June 2021 Sales Rep John Doe 1111
  4. 102 February 2022 Tech Support Mary Sue 2111
  5. 102 March 2022 Tech Support Mary Sue 2111
  6. 102 April 2022 Tech Support John Doe 2111
  7. 103 October 2022 HR Advisor Sarah Long 3111
  8. 103 November 2022 HR Advisor Michael Scott 4111
  9. 103 December 2022 HR Advisor John Doe 4111
  10. 103 December 2022 HR Advisor John Doe 4111

期望的输出:

  1. EmpID Date Job_Title ManagerName ManagerPositionNum Vacated Manager
  2. 101 May 2021 Sales Rep John Doe 1111
  3. 101 June 2021 Sales Rep John Doe 1111
  4. 102 February 2022 Tech Support Mary Sue 2111 Mary Sue
  5. 102 March 2022 Tech Support Mary Sue 2111 Mary Sue
  6. 102 April 2022 Tech Support John Doe 2111 Mary Sue
  7. 103 October 2022 HR Advisor Sarah Long 3111
  8. 103 November 2022 HR Advisor Michael Scott 4111
  9. 103 December 2022 HR Advisor John Doe 4111 Michael Scott
  10. 103 January 2023 HR Advisor John Doe 4111 Michael Scott

仅供澄清:

1111是John Doe的唯一标识

2111是Mary Sue的唯一标识

3111是Sarah Long的唯一标识

4111是Michael Scott的唯一标识

我翻译的代码部分:

  1. reportid = df.groupby('ManagerName')['ManagerPositionNum'].transform('first')
  2. m = ~df['ManagerPositionNum'].eq(reportid)
  3. df.loc[m, 'ManagerName']
英文:

I have a dataset that looks at all employees' history. The goal I am trying to get to is see an employees current manager and previous manager only if that previous manager has left without being replaced. To identify if a manager has left you look at the ManagerPositionNum this column is unique to the Manager and if another Manager has those numbers then they are filling in for a vacant role. I am running pandas and numpy with this file.

Here is a sample of what I have:

  1. EmpID Date Job_Title ManagerName ManagerPositionNum
  2. 101 May 2021 Sales Rep John Doe 1111
  3. 101 June 2021 Sales Rep John Doe 1111
  4. 102 February 2022 Tech Support Mary Sue 2111
  5. 102 March 2022 Tech Support Mary Sue 2111
  6. 102 April 2022 Tech Support John Doe 2111
  7. 103 October 2022 HR Advisor Sarah Long 3111
  8. 103 November 2022 HR Advisor Michael Scott 4111
  9. 103 December 2022 HR Advisor John Doe 4111
  10. 103 December 2022 HR Advisor John Doe 4111

Desired Output:

  1. EmpID Date Job_Title ManagerName ManagerPositionNum Vacated Manager
  2. 101 May 2021 Sales Rep John Doe 1111
  3. 101 June 2021 Sales Rep John Doe 1111
  4. 102 February 2022 Tech Support Mary Sue 2111 Mary Sue
  5. 102 March 2022 Tech Support Mary Sue 2111 Mary Sue
  6. 102 April 2022 Tech Support John Doe 2111 Mary Sue
  7. 103 October 2022 HR Advisor Sarah Long 3111
  8. 103 November 2022 HR Advisor Michael Scott 4111
  9. 103 December 2022 HR Advisor John Doe 4111 Michael Scott
  10. 103 January 2023 HR Advisor John Doe 4111 Michael Scott

Just for clarification:

1111 is unique to John Doe

2111 is unique to Mary Sue

3111 is unique to Sarah Long

4111 is unique to Michael Scott

Code I have tried:

  1. reportid = df.groupby('ManagerName')['ManagerPositionNum'].transform('first')m =
  2. ~df['ManagerPositionNum'].eq(reportid) df.loc[m,'ManagerName']

答案1

得分: 1

  1. 使用自定义的 [`groupby.apply`](https://pandas.pydata.org/docs/user_guide/groupby.html#flexible-apply)。在每个分组中识别倒数第二个唯一的经理,并 [`ffill`](https://pandas.pydata.org/docs/reference/api/pandas.Series.ffill.html) 他们的名称:

df['Vacated Manager'] = (df.groupby('EmpID', group_keys=False)['ManagerName']
.apply(lambda s: s.where(pd.factorize(s[::-1])[0][::-1]==1).ffill())
)

  1. 输出:

EmpID Date Job_Title ManagerName ManagerPositionNum Vacated Manager
0 101 May 2021 Sales Rep John Doe 1111 NaN
1 101 June 2021 Sales Rep John Doe 1111 NaN
2 102 February 2022 Tech Support Mary Sue 2111 Mary Sue
3 102 March 2022 Tech Support Mary Sue 2111 Mary Sue
4 102 April 2022 Tech Support John Doe 2111 Mary Sue
5 103 October 2022 HR Advisor Sarah Long 3111 NaN
6 103 November 2022 HR Advisor Michael Scott 4111 Michael Scott
7 103 December 2022 HR Advisor John Doe 4111 Michael Scott
8 103 December 2022 HR Advisor John Doe 4111 Michael Scott

  1. <details>
  2. <summary>英文:</summary>
  3. Use a custom [`groupby.apply`](https://pandas.pydata.org/docs/user_guide/groupby.html#flexible-apply). Identify the second to last unique manager and [`ffill`](https://pandas.pydata.org/docs/reference/api/pandas.Series.ffill.html) their name per group:

df['Vacated Manager'] = (df.groupby('EmpID', group_keys=False)['ManagerName']
.apply(lambda s: s.where(pd.factorize(s[::-1])[0][::-1]==1).ffill())
)

  1. Output:

EmpID Date Job_Title ManagerName ManagerPositionNum Vacated Manager
0 101 May 2021 Sales Rep John Doe 1111 NaN
1 101 June 2021 Sales Rep John Doe 1111 NaN
2 102 February 2022 Tech Support Mary Sue 2111 Mary Sue
3 102 March 2022 Tech Support Mary Sue 2111 Mary Sue
4 102 April 2022 Tech Support John Doe 2111 Mary Sue
5 103 October 2022 HR Advisor Sarah Long 3111 NaN
6 103 November 2022 HR Advisor Michael Scott 4111 Michael Scott
7 103 December 2022 HR Advisor John Doe 4111 Michael Scott
8 103 December 2022 HR Advisor John Doe 4111 Michael Scott

  1. </details>

huangapple
  • 本文由 发表于 2023年4月13日 23:01:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006978.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定