选择 pandas 中的 user_id 行。

huangapple go评论99阅读模式
英文:

How to select user_id rows pandas

问题

  1. 用户ID 已下单 最早日期 最晚日期
  2. 1 1 2020914 20201024
  3. 2 0 2020914 NAT
  4. 3 1 202091 20201019
  5. 3 2 2021211 202131
英文:

How can I calculate the first visited date and the last visited date before an order was placed by the user?

  1. USER ID TYPE DATE
  2. 1 Visited September 14, 2020
  3. 1 Visited October 4, 2020
  4. 1 Visited October 24, 2020
  5. 1 Ordered November 1, 2020
  6. 2 Visited September 14, 2020
  7. 2 Visited October 1, 2020
  8. 3 Visited September 1, 2020
  9. 3 Visited October 4, 2020
  10. 3 Visited October 4, 2020
  11. 3 Visited October 19, 2020
  12. 3 Ordered January 1, 2021
  13. 3 Visited February 11, 2021
  14. 3 Visited February 24, 2021
  15. 3 Visited March 1, 2021
  16. 3 Ordered April 21, 2021

Expected Output:

  1. USER ID Ordered MIN DATE MAX DATE
  2. 1 1 September 14, 2020 October 24, 2020
  3. 2 0 September 14, 2020 NAT
  4. 3 1 September 1, 2020 October 19, 2020
  5. 3 2 February 11, 2021 March 1, 2021

答案1

得分: 1

以下是翻译好的部分:

  1. 尝试:
  2. df['DATE'] = pd.to_datetime(df['DATE'])
  3. df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())\
  4. .set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']\
  5. .unstack('TYPE')\
  6. .groupby(['USER ID', 'grp'], sort=False)\
  7. .agg(Ordered=('Ordered','count'),
  8. MIN_DATE=('Visited','first'),
  9. MAX_DATE=('Visited','last'))\
  10. .reset_index('grp', drop=True)\
  11. .reset_index()
  12. df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
  13. df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
  14. df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%Y年%m月%d日')
  15. df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%Y年%m月%d日')
  16. 输出:
  17. 用户ID 已订购 最小日期 最大日期
  18. 0 1 1 20200914 20201024
  19. 1 2 0 20200914 NaN
  20. 2 3 1 20200901 20201019
  21. 3 3 2 20210211 20210301
英文:

Try:

  1. df['DATE'] = pd.to_datetime(df['DATE'])
  2. df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())\
  3. .set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']\
  4. .unstack('TYPE')\
  5. .groupby(['USER ID', 'grp'], sort=False)\
  6. .agg(Ordered=('Ordered','count'),
  7. MIN_DATE=('Visited','first'),
  8. MAX_DATE=('Visited','last'))\
  9. .reset_index('grp', drop=True)\
  10. .reset_index()
  11. df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
  12. df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
  13. df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
  14. df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')

Output:

  1. USER ID Ordered MIN_DATE MAX_DATE
  2. 0 1 1 September 14, 2020 October 24, 2020
  3. 1 2 0 September 14, 2020 NaN
  4. 2 3 1 September 01, 2020 October 19, 2020
  5. 3 3 2 February 11, 2021 March 01, 2021

huangapple
  • 本文由 发表于 2023年3月23日 09:07:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75818479.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定