如何在多年内获取起始日期和结束日期的唯一周数 – Pandas

huangapple go评论98阅读模式
英文:

how to get a unique week number for start and end dates in multi years - Pandas

问题

我有一个数据框,其中两列表示数据记录的开始和结束日期。有多个年份。我的目标是为每一行分配一个新的列,该列表示数据记录的时间步长。由于我也有位置列,因此这些周中的一些将重复。

  1. import pandas as pd
  2. dates = pd.date_range(start='2021-11-11', periods=20, freq='W')
  3. df = pd.DataFrame({
  4. 'start_date': np.repeat(dates, 5),
  5. 'end_date': np.repeat(dates + pd.DateOffset(days=6), 5),
  6. 'country': ['USA', 'Canada', 'UK', 'Australia', 'Russia'] * 20
  7. })
  8. df = df.sort_values("start_date")
  9. start_date end_date country
  10. 0 2021-11-14 2021-11-20 USA
  11. 1 2021-11-14 2021-11-20 Canada
  12. 2 2021-11-14 2021-11-20 UK
  13. 3 2021-11-14 2021-11-20 Australia
  14. 4 2021-11-14 2021-11-20 Russia

我可以使用 isocalendar().week 获取周数,但它会给出相应年份的周数。例如,如果 2021-11-142021-11-20 是数据框中的第一周,它应该得到 1。它可能跳过下一周,并且有另一条记录从 2021-11-27 开始。对于我来说,这样的时间步长应该是数据框中的第二周。

英文:

I have a dataframe where two of the columns represent the start and end date of the data record. There are multiple years. My goal is to assign a new column that represents the time step of the data record in each row. Since I have a location columns as well, some of these weeks will be repeating.

  1. import pandas as pd
  2. dates = pd.date_range(start='2021-11-11', periods=20, freq='W')
  3. df = pd.DataFrame({
  4. 'start_date': np.repeat(dates, 5),
  5. 'end_date': np.repeat(dates + pd.DateOffset(days=6), 5),
  6. 'country': ['USA', 'Canada', 'UK', 'Australia', 'Russia'] * 20
  7. })
  8. df = df.sort_values("start_date")
  9. start_date end_date country
  10. 0 2021-11-14 2021-11-20 USA
  11. 1 2021-11-14 2021-11-20 Canada
  12. 2 2021-11-14 2021-11-20 UK
  13. 3 2021-11-14 2021-11-20 Australia
  14. 4 2021-11-14 2021-11-20 Russia

I can get the week number using isocalendar().week, but it is giving the week number of the corresponding year. For instance, if 2021-11-14 and 2021-11-20 is the first week in the data frame, it should get 1. It may skip the next week, and have another record starting from 2021-11-27. Such time step should be the second week for me in the data frame.

答案1

得分: 2

理解的话,你可以使用 groupby_ngroup 方法:

  1. df['week'] = df.groupby(df['start_date']).ngroup().add(1)
  2. print(df)
  3. # 输出
  4. start_date end_date country week
  5. 0 2021-11-14 2021-11-20 USA 1
  6. 1 2021-11-14 2021-11-20 Canada 1
  7. 2 2021-11-14 2021-11-20 UK 1
  8. 3 2021-11-14 2021-11-20 Australia 1
  9. 4 2021-11-14 2021-11-20 Russia 1
  10. .. ... ... ... ...
  11. 98 2022-03-27 2022-04-02 Australia 20
  12. 95 2022-03-27 2022-04-02 USA 20
  13. 96 2022-03-27 2022-04-02 Canada 20
  14. 97 2022-03-27 2022-04-02 UK 20
  15. 99 2022-03-27 2022-04-02 Russia 20
  16. [100 rows x 4 columns]

另一种方法是使用 pd.factorize (如果数据框已按 start_date 值排序):

  1. df['week'] = pd.factorize(df['start_date'])[0] + 1
英文:

IIUC, you can use groupby_ngroup:

  1. df['week'] = df.groupby(df['start_date']).ngroup().add(1)
  2. print(df)
  3. # Output
  4. start_date end_date country week
  5. 0 2021-11-14 2021-11-20 USA 1
  6. 1 2021-11-14 2021-11-20 Canada 1
  7. 2 2021-11-14 2021-11-20 UK 1
  8. 3 2021-11-14 2021-11-20 Australia 1
  9. 4 2021-11-14 2021-11-20 Russia 1
  10. .. ... ... ... ...
  11. 98 2022-03-27 2022-04-02 Australia 20
  12. 95 2022-03-27 2022-04-02 USA 20
  13. 96 2022-03-27 2022-04-02 Canada 20
  14. 97 2022-03-27 2022-04-02 UK 20
  15. 99 2022-03-27 2022-04-02 Russia 20
  16. [100 rows x 4 columns]

Alternative with pd.factorize (IF the dataframe is already sorted by start_date value:

  1. df['week'] = pd.factorize(df['start_date'])[0] + 1

答案2

得分: 1

你可以使用 dt.week 并从中减去最小的 week 值。为了确保第一周始终标记为1,你可以在减法结果上加1。

  1. df['start_date'] = pd.to_datetime(df['start_date'])
  2. df['time_step'] = df['start_date'].dt.week - df['start_date'].dt.week.min() + 1
英文:

You can use dt.week and subtract the minimum week from it. In order to ensure the first week is always labeled 1, you can add 1 to the subtraction.

  1. df['start_date'] = pd.to_datetime(df['start_date'])
  2. df['time_step'] = df['start_date'].dt.week - df['start_date'].dt.week.min() + 1

huangapple
  • 本文由 发表于 2023年6月19日 22:12:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76507449.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定