Pandas按类型分组查找低于当前行日期的日期

huangapple go评论90阅读模式
英文:

Pandas finding dates lower than current row' date group by type

问题

我有一个带有id、number和date的pandas数据帧。我想创建一个新列,如下所示。基本上,按类型分组并查找在当前id的日期之间的两年内的日期。

  1. id type date
  2. 1 a 2023-06-18
  3. 2 a 2022-06-18
  4. 3 a 2021-06-18
  5. 4 b 2023-06-18
  6. 5 b 2020-06-18
  7. 6 c 2023-06-18
  8. id type date past_records_in_2_years
  9. 1 a 2023-06-18 2
  10. 2 a 2022-06-18 1
  11. 3 a 2021-06-18 0
  12. 4 b 2023-06-18 0
  13. 5 b 2020-06-18 0
  14. 6 c 2023-06-18 0

我尝试使用for循环,但是我有100万+行,所以花费太多时间。

  1. for i in range(len(df)):
  2. temp = df[df['type'] == df.loc[i]['type']].reset_index(drop=True)
  3. if len(temp) > 1:
  4. past_dates = 0
  5. for j in range(len(temp)):
  6. if (temp.loc[j]['date'] - df.loc[i]['date']) / np.timedelta64(1, 'Y') < 3:
  7. past_dates += 1
  8. if past_dates >= 2:
  9. df[i]['date'] = 1
  10. else:
  11. df[i]['date'] = 0
  12. else:
  13. df[i]['date'] = 0

注意: 你的现有代码有一些问题,因为它尝试在DataFrame上直接进行更改,但是更好的方法是使用apply函数或者向DataFrame添加一个新列来实现你的目标。如果需要更多帮助,可以提出具体问题。

英文:

I have pandas dataframe with id, number and date. I want to create a new column as below. Basically, groupby type and find the dates which are between two years of current id's date.

  1. id type date
  2. 1 a 2023-06-18
  3. 2 a 2022-06-18
  4. 3 a 2021-06-18
  5. 4 b 2023-06-18
  6. 5 b 2020-06-18
  7. 6 c 2023-06-18
  8. id type date past_records_in_2_years
  9. 1 a 2023-06-18 2
  10. 2 a 2022-06-18 1
  11. 3 a 2021-06-18 0
  12. 4 b 2023-06-18 0
  13. 5 b 2020-06-18 0
  14. 6 c 2023-06-18 0

I tried using for loop but I have 1M+ rows so it is taking too much time.

  1. for i in range(len(df)):
  2. temp = df[df[&#39;type&#39;] == df.loc[i][&#39;type&#39;]].reset_index(drop = True)
  3. if len(temp) &gt; 1:
  4. past_dates = 0
  5. for j in range(len(temp)):
  6. if (temp.loc[j][&#39;date&#39;] - df.loc[i][&#39;date&#39;]) / np.timedelta64(1, &#39;Y&#39;) &lt; 3:
  7. past_dates += 1
  8. if past_dates &gt;= 2:
  9. df[i][&#39;date&#39;] = 1
  10. else:
  11. df[i][&#39;date&#39;] = 0
  12. else:
  13. df[i][&#39;date&#39;] = 0

答案1

得分: 1

pandas merge 和 filter 应该足够:

  1. other = pd.DataFrame({'type': df['type'],
  2. 'present': df.date,
  3. 'two_yrs_ahead': df.date.add(pd.DateOffset(years=2))})
  4. (df.merge(
  5. other,
  6. on='type'
  7. )
  8. .assign(counts=lambda f: f.date.gt(f.present) &amp;
  9. f.date.le(f.two_yrs_ahead))
  10. .groupby(df.columns.tolist())
  11. .counts
  12. .sum()
  13. )
  14. id type date
  15. 1 a 2023-06-18 2
  16. 2 a 2022-06-18 1
  17. 3 a 2021-06-18 0
  18. 4 b 2023-06-18 0
  19. 5 b 2020-06-18 0
  20. 6 c 2023-06-18 0
  21. Name: counts, dtype: int64
英文:

pandas merge and filter should suffice:

  1. other = pd.DataFrame({&#39;type&#39;: df[&#39;type&#39;],
  2. &#39;present&#39;:df.date,
  3. &#39;two_yrs_ahead&#39; : df.date.add(pd.DateOffset(years=2))})
  4. (df.merge(
  5. other,
  6. on = &#39;type&#39;
  7. )
  8. .assign(counts = lambda f: f.date.gt(f.present) &amp;
  9. f.date.le(f.two_yrs_ahead))
  10. .groupby(df.columns.tolist())
  11. .counts
  12. .sum()
  13. )
  14. id type date
  15. 1 a 2023-06-18 2
  16. 2 a 2022-06-18 1
  17. 3 a 2021-06-18 0
  18. 4 b 2023-06-18 0
  19. 5 b 2020-06-18 0
  20. 6 c 2023-06-18 0
  21. Name: counts, dtype: int64

huangapple
  • 本文由 发表于 2023年6月19日 10:27:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76503279.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定